Learning Addition with GPT
Inspired by Andrej Karpathy’s nanoGPT and his excellent YouTube series1, I decided to train my own transformer model on a simple dataset. Additionally, I aimed to calculate precisely how well the model performs. This is not trivial when working with text, as evaluations often rely on a so-called vibe check, which is inherently subjective. A natural choice for objective evaluation is to train the model to generate text representing equations in the form $x + y = z$....
Transformers Dashboard
Since the publication of the now famous 2017 paper Attention is All You Need1, many large language models based on the transformer architecture have emerged. Fortunately, some studies 2 3 have compiled extensive data on many published models, including the dimensions of their transformers. Much like my experience learning about CNNs and their increasing complexity, I wanted to analyze LLM transformers. Which models are the largest? What is the optimal size for the feed-forward layer?...
Glyph generation
Imagine a simple square grid with nine dots like the one above. What if you were challenged to draw as many unique shapes or “glyphs” as possible by connecting these dots? At first glance, it seems straightforward, but let’s dive deeper into the complexity. Each of the nine dots can be connected in pairs, forming lines that are the strokes of our glyphs. Calculating all possible pairings, we find there are $(9\times8)/2=36$ unique strokes....
Some thoughts on training LeNet
Since my last blog post Training LeNet on Armenian script, I have made some significant improvement to the training process. Model simplification The model takes as input a mean and standard deviation for normalizing pixel intensities. These values are calibrated on the training set before initiating the gradient descent loop for adjusting the weights and biases. To make things simpler, I hardcoded those parameters. This way the dependency between model and training set only happens in the gradient descent loop....
Diffusion models and time reversal
I recently spent some time reading about the algorithms behind Stable Diffusion and similar image generation models. They have been linked with an interesting 40-years-old result on diffusion processes1. In short, this result states that there exists an explicit path from an initial probability distribution $p_0$ to a random noise (a normal distribution), and that this path can be reversed. One application of this concept is sampling : we can draw a sample from a random noise and use the backward diffusion to obtain a sample from $p_0$....
Training LeNet-5 on Armenian script
Following Tinkering with Tesseract, I wanted to gain a better understanding of how OCR systems work. So, I decided to start with building my own character recognition engine using PyTorch. The code is available at v4nn4/hynet. Generating a dataset First, we visualize the alphabet in our target font, Mk_Parz_U-Italic : from PIL import Image, ImageDraw, ImageFont import matplotlib.pyplot as plt caps = range(0x531, 0x557) smalls = range(0x561, 0x588) letters = [f"{chr(a)}{chr(b)}" for (a, b) in zip(caps, smalls)] letters = [" "....
Tinkering with Tesseract
I have recently been experimenting with Tesseract, an Optical Character Recognition (OCR) engine developed by Google. My primary objective was to extract text from scans of a 1920s Armenian newspaper and execute search queries on it. Terms like պատերազմ (war) or Ֆրանսիա (France) for instance are likely to be discovered within the document. Some initial observations on the document : Image segmentation : there are a lot of different text blocks in the raw document, and distinguishing between them might be challenging....