You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
51 lines
1.8 KiB
Markdown
51 lines
1.8 KiB
Markdown
---
|
|
# edu-u2w7
|
|
title: 'edu: write chapter on creating and training a simple LLM'
|
|
status: todo
|
|
type: task
|
|
priority: low
|
|
created_at: 2026-03-10T23:30:00Z
|
|
updated_at: 2026-03-10T23:30:00Z
|
|
---
|
|
|
|
## Background
|
|
|
|
From `edu/TODO.md`: Hands-on: Creating and training a simple LLM.
|
|
|
|
A practical course on building a small language model from scratch in Rust, covering tokenisation, the Transformer architecture, and a training loop. The goal is deep understanding rather than production scale.
|
|
|
|
## Content outline (suggested)
|
|
|
|
### Part 1 — What is a Language Model?
|
|
1. Predicting the next token: the core task
|
|
2. Tokenisation: BPE, byte-level, character-level — pick character-level for simplicity
|
|
3. Exercise 1: Build a character-level tokeniser in Rust
|
|
|
|
### Part 2 — The Transformer Architecture
|
|
4. Embeddings and positional encoding
|
|
5. Self-attention: queries, keys, values — the attention formula
|
|
6. Multi-head attention
|
|
7. Feed-forward sublayers, residual connections, layer norm
|
|
8. Exercise 2: Implement a single attention head in Rust (no ML framework)
|
|
|
|
### Part 3 — Assembling the Model
|
|
9. Stacking Transformer blocks into a decoder-only LM
|
|
10. Using `candle` (Hugging Face's Rust ML framework) for tensor ops and autodiff
|
|
11. Exercise 3: Define a small GPT-like model in `candle`
|
|
|
|
### Part 4 — Training
|
|
12. Cross-entropy loss for next-token prediction
|
|
13. The training loop: forward pass, loss, backward pass, optimizer step
|
|
14. Exercise 4: Train on a small text corpus (e.g., Shakespeare or a short book)
|
|
15. Exercise 5: Sample from the model and observe output quality vs. training steps
|
|
|
|
### Part 5 — Reflection
|
|
16. What limits this model? Scale, data, compute
|
|
17. Pointers to real LLM training (GPT-2, LLaMA, Mistral)
|
|
18. Further reading
|
|
|
|
## File to create
|
|
|
|
- `edu/src/llm-from-scratch.md`
|
|
- Add to `edu/src/SUMMARY.md` under the `# Machine Learning` section
|