--- # edu-u2w7 title: 'edu: write chapter on creating and training a simple LLM' status: in-progress type: feature priority: low created_at: 2026-03-10T23:30:00Z updated_at: 2026-03-13T22:01:43Z --- ## Background From `edu/TODO.md`: Hands-on: Creating and training a simple LLM. A practical course on building a small language model from scratch in Rust, covering tokenisation, the Transformer architecture, and a training loop. The goal is deep understanding rather than production scale. ## Content outline (suggested) ### Part 1 — What is a Language Model? 1. Predicting the next token: the core task 2. Tokenisation: BPE, byte-level, character-level — pick character-level for simplicity 3. Exercise 1: Build a character-level tokeniser in Rust ### Part 2 — The Transformer Architecture 4. Embeddings and positional encoding 5. Self-attention: queries, keys, values — the attention formula 6. Multi-head attention 7. Feed-forward sublayers, residual connections, layer norm 8. Exercise 2: Implement a single attention head in Rust (no ML framework) ### Part 3 — Assembling the Model 9. Stacking Transformer blocks into a decoder-only LM 10. Using `candle` (Hugging Face's Rust ML framework) for tensor ops and autodiff 11. Exercise 3: Define a small GPT-like model in `candle` ### Part 4 — Training 12. Cross-entropy loss for next-token prediction 13. The training loop: forward pass, loss, backward pass, optimizer step 14. Exercise 4: Train on a small text corpus (e.g., Shakespeare or a short book) 15. Exercise 5: Sample from the model and observe output quality vs. training steps ### Part 5 — Reflection 16. What limits this model? Scale, data, compute 17. Pointers to real LLM training (GPT-2, LLaMA, Mistral) 18. Further reading ## File to create - `edu/src/llm-from-scratch.md` - Add to `edu/src/SUMMARY.md` under the `# Machine Learning` section