docs(edu): outline simple LLM chapter and create section tickets [edu-u2w7]

Adds llm-from-scratch.md stub with 14 sections (GPT-1 style: character tokenisation, self-attention, transformer block, candle model, training loop, sampling). Creates beans edu-32xl through edu-9sb7 for each section.
4 months ago · 05ac10f5e3
parent 818444962c
commit 05ac10f5e3
17 changed files with 237 additions and 3 deletions
--- a/edu/.beans/edu-32xl--write-1-what-is-a-language-model.md
+++ b/edu/.beans/edu-32xl--write-1-what-is-a-language-model.md
@ -0,0 +1,11 @@
 ---
 # edu-32xl
 title: 'Write §1: What is a language model?'
 status: todo
 type: task
 created_at: 2026-03-13T22:01:47Z
 updated_at: 2026-03-13T22:01:47Z
 parent: edu-u2w7
 ---
 Next-token prediction as the core task. Intuitive framing: a model that guesses what comes next, trained on raw text. GPT-1 context. No code.
--- a/edu/.beans/edu-7do4--write-2-character-level-tokenisation.md
+++ b/edu/.beans/edu-7do4--write-2-character-level-tokenisation.md
@ -0,0 +1,11 @@
 ---
 # edu-7do4
 title: 'Write §2: Character-level tokenisation'
 status: todo
 type: task
 created_at: 2026-03-13T22:01:48Z
 updated_at: 2026-03-13T22:01:48Z
 parent: edu-u2w7
 ---
 Explain BPE vs byte-level vs character-level. Motivate character-level as the simplest choice for a from-scratch exercise. Show vocabulary construction.
--- a/edu/.beans/edu-9cnd--write-6-the-transformer-block.md
+++ b/edu/.beans/edu-9cnd--write-6-the-transformer-block.md
@ -0,0 +1,11 @@
 ---
 # edu-9cnd
 title: 'Write §6: The Transformer block'
 status: todo
 type: task
 created_at: 2026-03-13T22:01:55Z
 updated_at: 2026-03-13T22:01:55Z
 parent: edu-u2w7
 ---
 Attention sublayer + 2-layer feed-forward network + residual connections + layer norm. Describe the GPT-1 block layout. Diagrams encouraged.
--- a/edu/.beans/edu-9sb7--write-14-further-reading.md
+++ b/edu/.beans/edu-9sb7--write-14-further-reading.md
@ -0,0 +1,11 @@
 ---
 # edu-9sb7
 title: 'Write §14: Further reading'
 status: todo
 type: task
 created_at: 2026-03-13T22:02:08Z
 updated_at: 2026-03-13T22:02:08Z
 parent: edu-u2w7
 ---
 Curated pointers: Attention is All You Need paper, GPT-1 paper, Karpathy's nanoGPT, candle docs, The Illustrated Transformer blog post.
--- a/edu/.beans/edu-abdu--write-10-cross-entropy-loss-and-the-training-loop.md
+++ b/edu/.beans/edu-abdu--write-10-cross-entropy-loss-and-the-training-loop.md
@ -0,0 +1,11 @@
 ---
 # edu-abdu
 title: 'Write §10: Cross-entropy loss and the training loop'
 status: todo
 type: task
 created_at: 2026-03-13T22:02:02Z
 updated_at: 2026-03-13T22:02:02Z
 parent: edu-u2w7
 ---
 Next-token prediction loss: cross-entropy over the vocab. Adam optimiser. Training loop structure: batch → forward → loss → backward → step. No bells and whistles.
--- a/edu/.beans/edu-cw9v--write-4-embeddings-and-positional-encoding.md
+++ b/edu/.beans/edu-cw9v--write-4-embeddings-and-positional-encoding.md
@ -0,0 +1,11 @@
 ---
 # edu-cw9v
 title: 'Write §4: Embeddings and positional encoding'
 status: todo
 type: task
 created_at: 2026-03-13T22:01:52Z
 updated_at: 2026-03-13T22:01:52Z
 parent: edu-u2w7
 ---
 Token embedding table (vocab_size × d_model). Learned positional embeddings (GPT-1 style). Explain why position matters for attention.
--- a/edu/.beans/edu-hufe--write-7-exercise-2-implement-self-attention-in-rus.md
+++ b/edu/.beans/edu-hufe--write-7-exercise-2-implement-self-attention-in-rus.md
@ -0,0 +1,11 @@
 ---
 # edu-hufe
 title: 'Write §7: Exercise 2 — implement self-attention in Rust'
 status: todo
 type: task
 created_at: 2026-03-13T22:01:56Z
 updated_at: 2026-03-13T22:01:56Z
 parent: edu-u2w7
 ---
 Implement scaled dot-product attention using candle tensors. Single head, causal mask, softmax, output projection. Reader writes the core attention function.
--- a/edu/.beans/edu-i76z--write-12-exercise-5-sample-from-the-model.md
+++ b/edu/.beans/edu-i76z--write-12-exercise-5-sample-from-the-model.md
@ -0,0 +1,11 @@
 ---
 # edu-i76z
 title: 'Write §12: Exercise 5 — sample from the model'
 status: todo
 type: task
 created_at: 2026-03-13T22:02:05Z
 updated_at: 2026-03-13T22:02:05Z
 parent: edu-u2w7
 ---
 Temperature sampling and greedy decoding. Prompt the trained model and decode character-by-character. Compare output at different training checkpoints.
--- a/edu/.beans/edu-jybf--write-11-exercise-4-train-on-a-small-text-corpus.md
+++ b/edu/.beans/edu-jybf--write-11-exercise-4-train-on-a-small-text-corpus.md
@ -0,0 +1,11 @@
 ---
 # edu-jybf
 title: 'Write §11: Exercise 4 — train on a small text corpus'
 status: todo
 type: task
 created_at: 2026-03-13T22:02:04Z
 updated_at: 2026-03-13T22:02:04Z
 parent: edu-u2w7
 ---
 Use a small public-domain text (e.g. Shakespeare's sonnets or a children's book). Show data loading, batching with random windows, training loop, loss curve. Reader runs training and watches loss fall.
--- a/edu/.beans/edu-kkjc--write-13-what-limits-this-model.md
+++ b/edu/.beans/edu-kkjc--write-13-what-limits-this-model.md
@ -0,0 +1,11 @@
 ---
 # edu-kkjc
 title: 'Write §13: What limits this model?'
 status: todo
 type: task
 created_at: 2026-03-13T22:02:07Z
 updated_at: 2026-03-13T22:02:07Z
 parent: edu-u2w7
 ---
 Honest assessment: context length, data size, model capacity, compute. Explain why GPT-1 was a big deal in 2018 and what GPT-2/3/4 changed. No code.
--- a/edu/.beans/edu-s6mr--write-5-self-attention-queries-keys-and-values.md
+++ b/edu/.beans/edu-s6mr--write-5-self-attention-queries-keys-and-values.md
@ -0,0 +1,11 @@
 ---
 # edu-s6mr
 title: 'Write §5: Self-attention — queries, keys, and values'
 status: todo
 type: task
 created_at: 2026-03-13T22:01:53Z
 updated_at: 2026-03-13T22:01:53Z
 parent: edu-u2w7
 ---
 Derive the scaled dot-product attention formula from first principles. Single-head attention only (GPT-1 simplicity). Causal masking explained here.
--- a/edu/.beans/edu-tufd--write-3-exercise-1-build-a-character-level-tokenis.md
+++ b/edu/.beans/edu-tufd--write-3-exercise-1-build-a-character-level-tokenis.md
@ -0,0 +1,11 @@
 ---
 # edu-tufd
 title: 'Write §3: Exercise 1 — build a character-level tokeniser in Rust'
 status: todo
 type: task
 created_at: 2026-03-13T22:01:50Z
 updated_at: 2026-03-13T22:01:50Z
 parent: edu-u2w7
 ---
 Implement encode/decode over a fixed character vocabulary. Read a text file, build vocab, encode to integers, decode back. No external crates.
--- a/edu/.beans/edu-u2w7--edu-write-chapter-on-creating-and-training-a-simpl.md
+++ b/edu/.beans/edu-u2w7--edu-write-chapter-on-creating-and-training-a-simpl.md
@ -1,11 +1,11 @@
 ---
 # edu-u2w7
 title: 'edu: write chapter on creating and training a simple LLM'
-status: todo
+status: in-progress
-type: task
+type: feature
 priority: low
 created_at: 2026-03-10T23:30:00Z
-updated_at: 2026-03-10T23:30:00Z
+updated_at: 2026-03-13T22:01:43Z
 ---
 ## Background
--- a/edu/.beans/edu-ujs5--write-9-exercise-3-define-the-gpt-1-style-model-in.md
+++ b/edu/.beans/edu-ujs5--write-9-exercise-3-define-the-gpt-1-style-model-in.md
@ -0,0 +1,11 @@
 ---
 # edu-ujs5
 title: 'Write §9: Exercise 3 — define the GPT-1-style model in candle'
 status: todo
 type: task
 created_at: 2026-03-13T22:02:00Z
 updated_at: 2026-03-13T22:02:00Z
 parent: edu-u2w7
 ---
 Full model struct in candle: embedding, N transformer blocks, layer norm, unembedding. Hyperparams close to GPT-1 mini (e.g. 2–4 layers, d_model=128). Reader assembles the forward pass.
--- a/edu/.beans/edu-vqxk--write-8-a-decoder-only-lm-stacking-blocks-and-the.md
+++ b/edu/.beans/edu-vqxk--write-8-a-decoder-only-lm-stacking-blocks-and-the.md
@ -0,0 +1,11 @@
 ---
 # edu-vqxk
 title: 'Write §8: A decoder-only LM — stacking blocks and the causal mask'
 status: todo
 type: task
 created_at: 2026-03-13T22:01:58Z
 updated_at: 2026-03-13T22:01:58Z
 parent: edu-u2w7
 ---
 Explain how N transformer blocks are stacked. Causal mask ensures each position only attends to past tokens. Tie weights to the unembedding matrix (GPT-1 style). Final linear + softmax for logits.
--- a/edu/src/SUMMARY.md
+++ b/edu/src/SUMMARY.md
@ -25,3 +25,4 @@
 # Machine Learning
 - [Training a Game AI Through Self-Play](ml-self-play.md)
 - [Building a Simple LLM from Scratch](llm-from-scratch.md)
--- a/edu/src/llm-from-scratch.md
+++ b/edu/src/llm-from-scratch.md
@ -0,0 +1,79 @@
 # Building a Simple LLM from Scratch
 A hands-on course building a small GPT-1-style language model in Rust — from raw text to a trained, sampling transformer.
 ---
 ## Part 1 — Language Modeling Basics
 ### §1 What is a Language Model?
 🚧 *To be written — see [edu-32xl]*
 ### §2 Character-Level Tokenisation
 🚧 *To be written — see [edu-7do4]*
 ### §3 Exercise 1: Build a Character-Level Tokeniser in Rust
 🚧 *To be written — see [edu-tufd]*
 ---
 ## Part 2 — The Transformer Architecture
 ### §4 Embeddings and Positional Encoding
 🚧 *To be written — see [edu-cw9v]*
 ### §5 Self-Attention: Queries, Keys, and Values
 🚧 *To be written — see [edu-s6mr]*
 ### §6 The Transformer Block
 🚧 *To be written — see [edu-9cnd]*
 ### §7 Exercise 2: Implement Self-Attention in Rust
 🚧 *To be written — see [edu-hufe]*
 ---
 ## Part 3 — Assembling the Model
 ### §8 A Decoder-Only LM: Stacking Blocks and the Causal Mask
 🚧 *To be written — see [edu-vqxk]*
 ### §9 Exercise 3: Define the GPT-1-Style Model in `candle`
 🚧 *To be written — see [edu-ujs5]*
 ---
 ## Part 4 — Training
 ### §10 Cross-Entropy Loss and the Training Loop
 🚧 *To be written — see [edu-abdu]*
 ### §11 Exercise 4: Train on a Small Text Corpus
 🚧 *To be written — see [edu-jybf]*
 ### §12 Exercise 5: Sample from the Model
 🚧 *To be written — see [edu-i76z]*
 ---
 ## Part 5 — Reflection
 ### §13 What Limits This Model?
 🚧 *To be written — see [edu-kkjc]*
 ### §14 Further Reading
 🚧 *To be written — see [edu-9sb7]*