Named Entity Recognition using structured generation

Structured generation is a method that enforces the output format of a language model. The idea is pretty smart and consists in representing the desired format (e.g. JSON) as a Finite State Machine (FSM) and iteratively masking model probabilities to guide token generation. In the following post, we will use the outlines library to perform Named Entity Recognition (NER) over the book Dune by Frank Herbert. Our goal is to extract characters, locations, organizations, and hopefully be able to infer clusters from their interaction in the text....

January 25, 2025 · 7 min · 1475 words · v4nn4

Learning addition with GPT

Inspired by Andrej Karpathy’s nanoGPT and his excellent YouTube series1, I decided to train my own transformer model on a simple dataset. Additionally, I aimed to calculate precisely how well the model performs. This is not trivial when working with text, as evaluations often rely on a so-called vibe check, which is inherently subjective. A natural choice for objective evaluation is to train the model to generate text representing equations in the form $x + y = z$....

January 11, 2025 · 7 min · 1348 words · v4nn4