You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
2.1 KiB
2.1 KiB
| title | status | type | priority | created_at | updated_at |
|---|---|---|---|---|---|
| edu: write chapter on creating and training a simple LLM | completed | feature | low | 2026-03-10T23:30:00Z | 2026-03-16T02:32:26Z |
Background
From edu/TODO.md: Hands-on: Creating and training a simple LLM.
A practical course on building a small language model from scratch in Rust, covering tokenisation, the Transformer architecture, and a training loop. The goal is deep understanding rather than production scale.
Content outline (suggested)
Part 1 — What is a Language Model?
- Predicting the next token: the core task
- Tokenisation: BPE, byte-level, character-level — pick character-level for simplicity
- Exercise 1: Build a character-level tokeniser in Rust
Part 2 — The Transformer Architecture
- Embeddings and positional encoding
- Self-attention: queries, keys, values — the attention formula
- Multi-head attention
- Feed-forward sublayers, residual connections, layer norm
- Exercise 2: Implement a single attention head in Rust (no ML framework)
Part 3 — Assembling the Model
- Stacking Transformer blocks into a decoder-only LM
- Using
candle(Hugging Face's Rust ML framework) for tensor ops and autodiff - Exercise 3: Define a small GPT-like model in
candle
Part 4 — Training
- Cross-entropy loss for next-token prediction
- The training loop: forward pass, loss, backward pass, optimizer step
- Exercise 4: Train on a small text corpus (e.g., Shakespeare or a short book)
- Exercise 5: Sample from the model and observe output quality vs. training steps
Part 5 — Reflection
- What limits this model? Scale, data, compute
- Pointers to real LLM training (GPT-2, LLaMA, Mistral)
- Further reading
File to create
edu/src/llm-from-scratch.md- Add to
edu/src/SUMMARY.mdunder the# Machine Learningsection
Summary of Changes
All 14 sections written with full educational content covering language modeling basics, the Transformer architecture, model assembly with candle, training, and reflection. ~1900 lines of content including code examples, ASCII diagrams, and exercises.