docs(edu): outline simple LLM chapter and create section tickets [edu-u2w7]

Adds llm-from-scratch.md stub with 14 sections (GPT-1 style: character tokenisation, self-attention, transformer block, candle model, training loop, sampling). Creates beans edu-32xl through edu-9sb7 for each section.
4 months ago · 05ac10f5e3
parent 818444962c
commit 05ac10f5e3
17 changed files with 237 additions and 3 deletions
--- a/edu/.beans/edu-32xl--write-1-what-is-a-language-model.md
+++ b/edu/.beans/edu-32xl--write-1-what-is-a-language-model.md
@ -0,0 +1,11 @@
+---
+# edu-32xl
+title: 'Write §1: What is a language model?'
+status: todo
+type: task
+created_at: 2026-03-13T22:01:47Z
+updated_at: 2026-03-13T22:01:47Z
+parent: edu-u2w7
+---
+
+Next-token prediction as the core task. Intuitive framing: a model that guesses what comes next, trained on raw text. GPT-1 context. No code.
--- a/edu/.beans/edu-7do4--write-2-character-level-tokenisation.md
+++ b/edu/.beans/edu-7do4--write-2-character-level-tokenisation.md
@ -0,0 +1,11 @@
+---
+# edu-7do4
+title: 'Write §2: Character-level tokenisation'
+status: todo
+type: task
+created_at: 2026-03-13T22:01:48Z
+updated_at: 2026-03-13T22:01:48Z
+parent: edu-u2w7
+---
+
+Explain BPE vs byte-level vs character-level. Motivate character-level as the simplest choice for a from-scratch exercise. Show vocabulary construction.
--- a/edu/.beans/edu-9cnd--write-6-the-transformer-block.md
+++ b/edu/.beans/edu-9cnd--write-6-the-transformer-block.md
@ -0,0 +1,11 @@
+---
+# edu-9cnd
+title: 'Write §6: The Transformer block'
+status: todo
+type: task
+created_at: 2026-03-13T22:01:55Z
+updated_at: 2026-03-13T22:01:55Z
+parent: edu-u2w7
+---
+
+Attention sublayer + 2-layer feed-forward network + residual connections + layer norm. Describe the GPT-1 block layout. Diagrams encouraged.
--- a/edu/.beans/edu-9sb7--write-14-further-reading.md
+++ b/edu/.beans/edu-9sb7--write-14-further-reading.md
@ -0,0 +1,11 @@
+---
+# edu-9sb7
+title: 'Write §14: Further reading'
+status: todo
+type: task
+created_at: 2026-03-13T22:02:08Z
+updated_at: 2026-03-13T22:02:08Z
+parent: edu-u2w7
+---
+
+Curated pointers: Attention is All You Need paper, GPT-1 paper, Karpathy's nanoGPT, candle docs, The Illustrated Transformer blog post.
--- a/edu/.beans/edu-abdu--write-10-cross-entropy-loss-and-the-training-loop.md
+++ b/edu/.beans/edu-abdu--write-10-cross-entropy-loss-and-the-training-loop.md
@ -0,0 +1,11 @@
+---
+# edu-abdu
+title: 'Write §10: Cross-entropy loss and the training loop'
+status: todo
+type: task
+created_at: 2026-03-13T22:02:02Z
+updated_at: 2026-03-13T22:02:02Z
+parent: edu-u2w7
+---
+
+Next-token prediction loss: cross-entropy over the vocab. Adam optimiser. Training loop structure: batch → forward → loss → backward → step. No bells and whistles.
--- a/edu/.beans/edu-cw9v--write-4-embeddings-and-positional-encoding.md
+++ b/edu/.beans/edu-cw9v--write-4-embeddings-and-positional-encoding.md
@ -0,0 +1,11 @@
+---
+# edu-cw9v
+title: 'Write §4: Embeddings and positional encoding'
+status: todo
+type: task
+created_at: 2026-03-13T22:01:52Z
+updated_at: 2026-03-13T22:01:52Z
+parent: edu-u2w7
+---
+
+Token embedding table (vocab_size × d_model). Learned positional embeddings (GPT-1 style). Explain why position matters for attention.
--- a/edu/.beans/edu-hufe--write-7-exercise-2-implement-self-attention-in-rus.md
+++ b/edu/.beans/edu-hufe--write-7-exercise-2-implement-self-attention-in-rus.md
@ -0,0 +1,11 @@
+---
+# edu-hufe
+title: 'Write §7: Exercise 2 — implement self-attention in Rust'
+status: todo
+type: task
+created_at: 2026-03-13T22:01:56Z
+updated_at: 2026-03-13T22:01:56Z
+parent: edu-u2w7
+---
+
+Implement scaled dot-product attention using candle tensors. Single head, causal mask, softmax, output projection. Reader writes the core attention function.
--- a/edu/.beans/edu-i76z--write-12-exercise-5-sample-from-the-model.md
+++ b/edu/.beans/edu-i76z--write-12-exercise-5-sample-from-the-model.md
@ -0,0 +1,11 @@
+---
+# edu-i76z
+title: 'Write §12: Exercise 5 — sample from the model'
+status: todo
+type: task
+created_at: 2026-03-13T22:02:05Z
+updated_at: 2026-03-13T22:02:05Z
+parent: edu-u2w7
+---
+
+Temperature sampling and greedy decoding. Prompt the trained model and decode character-by-character. Compare output at different training checkpoints.
--- a/edu/.beans/edu-jybf--write-11-exercise-4-train-on-a-small-text-corpus.md
+++ b/edu/.beans/edu-jybf--write-11-exercise-4-train-on-a-small-text-corpus.md
@ -0,0 +1,11 @@
+---
+# edu-jybf
+title: 'Write §11: Exercise 4 — train on a small text corpus'
+status: todo
+type: task
+created_at: 2026-03-13T22:02:04Z
+updated_at: 2026-03-13T22:02:04Z
+parent: edu-u2w7
+---
+
+Use a small public-domain text (e.g. Shakespeare's sonnets or a children's book). Show data loading, batching with random windows, training loop, loss curve. Reader runs training and watches loss fall.
--- a/edu/.beans/edu-kkjc--write-13-what-limits-this-model.md
+++ b/edu/.beans/edu-kkjc--write-13-what-limits-this-model.md
@ -0,0 +1,11 @@
+---
+# edu-kkjc
+title: 'Write §13: What limits this model?'
+status: todo
+type: task
+created_at: 2026-03-13T22:02:07Z
+updated_at: 2026-03-13T22:02:07Z
+parent: edu-u2w7
+---
+
+Honest assessment: context length, data size, model capacity, compute. Explain why GPT-1 was a big deal in 2018 and what GPT-2/3/4 changed. No code.
--- a/edu/.beans/edu-s6mr--write-5-self-attention-queries-keys-and-values.md
+++ b/edu/.beans/edu-s6mr--write-5-self-attention-queries-keys-and-values.md
@ -0,0 +1,11 @@
+---
+# edu-s6mr
+title: 'Write §5: Self-attention — queries, keys, and values'
+status: todo
+type: task
+created_at: 2026-03-13T22:01:53Z
+updated_at: 2026-03-13T22:01:53Z
+parent: edu-u2w7
+---
+
+Derive the scaled dot-product attention formula from first principles. Single-head attention only (GPT-1 simplicity). Causal masking explained here.
--- a/edu/.beans/edu-tufd--write-3-exercise-1-build-a-character-level-tokenis.md
+++ b/edu/.beans/edu-tufd--write-3-exercise-1-build-a-character-level-tokenis.md
@ -0,0 +1,11 @@
+---
+# edu-tufd
+title: 'Write §3: Exercise 1 — build a character-level tokeniser in Rust'
+status: todo
+type: task
+created_at: 2026-03-13T22:01:50Z
+updated_at: 2026-03-13T22:01:50Z
+parent: edu-u2w7
+---
+
+Implement encode/decode over a fixed character vocabulary. Read a text file, build vocab, encode to integers, decode back. No external crates.
--- a/edu/.beans/edu-u2w7--edu-write-chapter-on-creating-and-training-a-simpl.md
+++ b/edu/.beans/edu-u2w7--edu-write-chapter-on-creating-and-training-a-simpl.md
@ -1,11 +1,11 @@
 ---
 # edu-u2w7
 title: 'edu: write chapter on creating and training a simple LLM'
-status: todo
-type: task
+status: in-progress
+type: feature
 priority: low
 created_at: 2026-03-10T23:30:00Z
-updated_at: 2026-03-10T23:30:00Z
+updated_at: 2026-03-13T22:01:43Z
 ---

 ## Background
--- a/edu/.beans/edu-ujs5--write-9-exercise-3-define-the-gpt-1-style-model-in.md
+++ b/edu/.beans/edu-ujs5--write-9-exercise-3-define-the-gpt-1-style-model-in.md
@ -0,0 +1,11 @@
+---
+# edu-ujs5
+title: 'Write §9: Exercise 3 — define the GPT-1-style model in candle'
+status: todo
+type: task
+created_at: 2026-03-13T22:02:00Z
+updated_at: 2026-03-13T22:02:00Z
+parent: edu-u2w7
+---
+
+Full model struct in candle: embedding, N transformer blocks, layer norm, unembedding. Hyperparams close to GPT-1 mini (e.g. 2–4 layers, d_model=128). Reader assembles the forward pass.
--- a/edu/.beans/edu-vqxk--write-8-a-decoder-only-lm-stacking-blocks-and-the.md
+++ b/edu/.beans/edu-vqxk--write-8-a-decoder-only-lm-stacking-blocks-and-the.md
@ -0,0 +1,11 @@
+---
+# edu-vqxk
+title: 'Write §8: A decoder-only LM — stacking blocks and the causal mask'
+status: todo
+type: task
+created_at: 2026-03-13T22:01:58Z
+updated_at: 2026-03-13T22:01:58Z
+parent: edu-u2w7
+---
+
+Explain how N transformer blocks are stacked. Causal mask ensures each position only attends to past tokens. Tie weights to the unembedding matrix (GPT-1 style). Final linear + softmax for logits.
--- a/edu/src/SUMMARY.md
+++ b/edu/src/SUMMARY.md
@ -25,3 +25,4 @@
 # Machine Learning

 - [Training a Game AI Through Self-Play](ml-self-play.md)
+- [Building a Simple LLM from Scratch](llm-from-scratch.md)
--- a/edu/src/llm-from-scratch.md
+++ b/edu/src/llm-from-scratch.md
@ -0,0 +1,79 @@
+# Building a Simple LLM from Scratch
+
+A hands-on course building a small GPT-1-style language model in Rust — from raw text to a trained, sampling transformer.
+
+---
+
+## Part 1 — Language Modeling Basics
+
+### §1 What is a Language Model?
+
+🚧 *To be written — see [edu-32xl]*
+
+### §2 Character-Level Tokenisation
+
+🚧 *To be written — see [edu-7do4]*
+
+### §3 Exercise 1: Build a Character-Level Tokeniser in Rust
+
+🚧 *To be written — see [edu-tufd]*
+
+---
+
+## Part 2 — The Transformer Architecture
+
+### §4 Embeddings and Positional Encoding
+
+🚧 *To be written — see [edu-cw9v]*
+
+### §5 Self-Attention: Queries, Keys, and Values
+
+🚧 *To be written — see [edu-s6mr]*
+
+### §6 The Transformer Block
+
+🚧 *To be written — see [edu-9cnd]*
+
+### §7 Exercise 2: Implement Self-Attention in Rust
+
+🚧 *To be written — see [edu-hufe]*
+
+---
+
+## Part 3 — Assembling the Model
+
+### §8 A Decoder-Only LM: Stacking Blocks and the Causal Mask
+
+🚧 *To be written — see [edu-vqxk]*
+
+### §9 Exercise 3: Define the GPT-1-Style Model in `candle`
+
+🚧 *To be written — see [edu-ujs5]*
+
+---
+
+## Part 4 — Training
+
+### §10 Cross-Entropy Loss and the Training Loop
+
+🚧 *To be written — see [edu-abdu]*
+
+### §11 Exercise 4: Train on a Small Text Corpus
+
+🚧 *To be written — see [edu-jybf]*
+
+### §12 Exercise 5: Sample from the Model
+
+🚧 *To be written — see [edu-i76z]*
+
+---
+
+## Part 5 — Reflection
+
+### §13 What Limits This Model?
+
+🚧 *To be written — see [edu-kkjc]*
+
+### §14 Further Reading
+
+🚧 *To be written — see [edu-9sb7]*