Learning Addition with GPT

Inspired by Andrej Karpathy’s nanoGPT and his excellent YouTube series1, I decided to train my own transformer model on a simple dataset. Additionally, I aimed to calculate precisely how well the model performs. This is not trivial when working with text, as evaluations often rely on a so-called vibe check, which is inherently subjective. A natural choice for objective evaluation is to train the model to generate text representing equations in the form $x + y = z$....

January 11, 2025 · 7 min · 1348 words · v4nn4