docs(edu): outline simple LLM chapter and create section tickets [edu-u2w7]
Adds llm-from-scratch.md stub with 14 sections (GPT-1 style: character tokenisation, self-attention, transformer block, candle model, training loop, sampling). Creates beans edu-32xl through edu-9sb7 for each section.main
parent
818444962c
commit
05ac10f5e3
@ -0,0 +1,11 @@
|
|||||||
|
---
|
||||||
|
# edu-32xl
|
||||||
|
title: 'Write §1: What is a language model?'
|
||||||
|
status: todo
|
||||||
|
type: task
|
||||||
|
created_at: 2026-03-13T22:01:47Z
|
||||||
|
updated_at: 2026-03-13T22:01:47Z
|
||||||
|
parent: edu-u2w7
|
||||||
|
---
|
||||||
|
|
||||||
|
Next-token prediction as the core task. Intuitive framing: a model that guesses what comes next, trained on raw text. GPT-1 context. No code.
|
||||||
@ -0,0 +1,11 @@
|
|||||||
|
---
|
||||||
|
# edu-7do4
|
||||||
|
title: 'Write §2: Character-level tokenisation'
|
||||||
|
status: todo
|
||||||
|
type: task
|
||||||
|
created_at: 2026-03-13T22:01:48Z
|
||||||
|
updated_at: 2026-03-13T22:01:48Z
|
||||||
|
parent: edu-u2w7
|
||||||
|
---
|
||||||
|
|
||||||
|
Explain BPE vs byte-level vs character-level. Motivate character-level as the simplest choice for a from-scratch exercise. Show vocabulary construction.
|
||||||
@ -0,0 +1,11 @@
|
|||||||
|
---
|
||||||
|
# edu-9cnd
|
||||||
|
title: 'Write §6: The Transformer block'
|
||||||
|
status: todo
|
||||||
|
type: task
|
||||||
|
created_at: 2026-03-13T22:01:55Z
|
||||||
|
updated_at: 2026-03-13T22:01:55Z
|
||||||
|
parent: edu-u2w7
|
||||||
|
---
|
||||||
|
|
||||||
|
Attention sublayer + 2-layer feed-forward network + residual connections + layer norm. Describe the GPT-1 block layout. Diagrams encouraged.
|
||||||
@ -0,0 +1,11 @@
|
|||||||
|
---
|
||||||
|
# edu-9sb7
|
||||||
|
title: 'Write §14: Further reading'
|
||||||
|
status: todo
|
||||||
|
type: task
|
||||||
|
created_at: 2026-03-13T22:02:08Z
|
||||||
|
updated_at: 2026-03-13T22:02:08Z
|
||||||
|
parent: edu-u2w7
|
||||||
|
---
|
||||||
|
|
||||||
|
Curated pointers: Attention is All You Need paper, GPT-1 paper, Karpathy's nanoGPT, candle docs, The Illustrated Transformer blog post.
|
||||||
@ -0,0 +1,11 @@
|
|||||||
|
---
|
||||||
|
# edu-abdu
|
||||||
|
title: 'Write §10: Cross-entropy loss and the training loop'
|
||||||
|
status: todo
|
||||||
|
type: task
|
||||||
|
created_at: 2026-03-13T22:02:02Z
|
||||||
|
updated_at: 2026-03-13T22:02:02Z
|
||||||
|
parent: edu-u2w7
|
||||||
|
---
|
||||||
|
|
||||||
|
Next-token prediction loss: cross-entropy over the vocab. Adam optimiser. Training loop structure: batch → forward → loss → backward → step. No bells and whistles.
|
||||||
@ -0,0 +1,11 @@
|
|||||||
|
---
|
||||||
|
# edu-hufe
|
||||||
|
title: 'Write §7: Exercise 2 — implement self-attention in Rust'
|
||||||
|
status: todo
|
||||||
|
type: task
|
||||||
|
created_at: 2026-03-13T22:01:56Z
|
||||||
|
updated_at: 2026-03-13T22:01:56Z
|
||||||
|
parent: edu-u2w7
|
||||||
|
---
|
||||||
|
|
||||||
|
Implement scaled dot-product attention using candle tensors. Single head, causal mask, softmax, output projection. Reader writes the core attention function.
|
||||||
@ -0,0 +1,11 @@
|
|||||||
|
---
|
||||||
|
# edu-i76z
|
||||||
|
title: 'Write §12: Exercise 5 — sample from the model'
|
||||||
|
status: todo
|
||||||
|
type: task
|
||||||
|
created_at: 2026-03-13T22:02:05Z
|
||||||
|
updated_at: 2026-03-13T22:02:05Z
|
||||||
|
parent: edu-u2w7
|
||||||
|
---
|
||||||
|
|
||||||
|
Temperature sampling and greedy decoding. Prompt the trained model and decode character-by-character. Compare output at different training checkpoints.
|
||||||
@ -0,0 +1,11 @@
|
|||||||
|
---
|
||||||
|
# edu-jybf
|
||||||
|
title: 'Write §11: Exercise 4 — train on a small text corpus'
|
||||||
|
status: todo
|
||||||
|
type: task
|
||||||
|
created_at: 2026-03-13T22:02:04Z
|
||||||
|
updated_at: 2026-03-13T22:02:04Z
|
||||||
|
parent: edu-u2w7
|
||||||
|
---
|
||||||
|
|
||||||
|
Use a small public-domain text (e.g. Shakespeare's sonnets or a children's book). Show data loading, batching with random windows, training loop, loss curve. Reader runs training and watches loss fall.
|
||||||
@ -0,0 +1,11 @@
|
|||||||
|
---
|
||||||
|
# edu-kkjc
|
||||||
|
title: 'Write §13: What limits this model?'
|
||||||
|
status: todo
|
||||||
|
type: task
|
||||||
|
created_at: 2026-03-13T22:02:07Z
|
||||||
|
updated_at: 2026-03-13T22:02:07Z
|
||||||
|
parent: edu-u2w7
|
||||||
|
---
|
||||||
|
|
||||||
|
Honest assessment: context length, data size, model capacity, compute. Explain why GPT-1 was a big deal in 2018 and what GPT-2/3/4 changed. No code.
|
||||||
@ -0,0 +1,11 @@
|
|||||||
|
---
|
||||||
|
# edu-s6mr
|
||||||
|
title: 'Write §5: Self-attention — queries, keys, and values'
|
||||||
|
status: todo
|
||||||
|
type: task
|
||||||
|
created_at: 2026-03-13T22:01:53Z
|
||||||
|
updated_at: 2026-03-13T22:01:53Z
|
||||||
|
parent: edu-u2w7
|
||||||
|
---
|
||||||
|
|
||||||
|
Derive the scaled dot-product attention formula from first principles. Single-head attention only (GPT-1 simplicity). Causal masking explained here.
|
||||||
@ -0,0 +1,11 @@
|
|||||||
|
---
|
||||||
|
# edu-tufd
|
||||||
|
title: 'Write §3: Exercise 1 — build a character-level tokeniser in Rust'
|
||||||
|
status: todo
|
||||||
|
type: task
|
||||||
|
created_at: 2026-03-13T22:01:50Z
|
||||||
|
updated_at: 2026-03-13T22:01:50Z
|
||||||
|
parent: edu-u2w7
|
||||||
|
---
|
||||||
|
|
||||||
|
Implement encode/decode over a fixed character vocabulary. Read a text file, build vocab, encode to integers, decode back. No external crates.
|
||||||
@ -0,0 +1,11 @@
|
|||||||
|
---
|
||||||
|
# edu-vqxk
|
||||||
|
title: 'Write §8: A decoder-only LM — stacking blocks and the causal mask'
|
||||||
|
status: todo
|
||||||
|
type: task
|
||||||
|
created_at: 2026-03-13T22:01:58Z
|
||||||
|
updated_at: 2026-03-13T22:01:58Z
|
||||||
|
parent: edu-u2w7
|
||||||
|
---
|
||||||
|
|
||||||
|
Explain how N transformer blocks are stacked. Causal mask ensures each position only attends to past tokens. Tie weights to the unembedding matrix (GPT-1 style). Final linear + softmax for logits.
|
||||||
@ -0,0 +1,79 @@
|
|||||||
|
# Building a Simple LLM from Scratch
|
||||||
|
|
||||||
|
A hands-on course building a small GPT-1-style language model in Rust — from raw text to a trained, sampling transformer.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Part 1 — Language Modeling Basics
|
||||||
|
|
||||||
|
### §1 What is a Language Model?
|
||||||
|
|
||||||
|
🚧 *To be written — see [edu-32xl]*
|
||||||
|
|
||||||
|
### §2 Character-Level Tokenisation
|
||||||
|
|
||||||
|
🚧 *To be written — see [edu-7do4]*
|
||||||
|
|
||||||
|
### §3 Exercise 1: Build a Character-Level Tokeniser in Rust
|
||||||
|
|
||||||
|
🚧 *To be written — see [edu-tufd]*
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Part 2 — The Transformer Architecture
|
||||||
|
|
||||||
|
### §4 Embeddings and Positional Encoding
|
||||||
|
|
||||||
|
🚧 *To be written — see [edu-cw9v]*
|
||||||
|
|
||||||
|
### §5 Self-Attention: Queries, Keys, and Values
|
||||||
|
|
||||||
|
🚧 *To be written — see [edu-s6mr]*
|
||||||
|
|
||||||
|
### §6 The Transformer Block
|
||||||
|
|
||||||
|
🚧 *To be written — see [edu-9cnd]*
|
||||||
|
|
||||||
|
### §7 Exercise 2: Implement Self-Attention in Rust
|
||||||
|
|
||||||
|
🚧 *To be written — see [edu-hufe]*
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Part 3 — Assembling the Model
|
||||||
|
|
||||||
|
### §8 A Decoder-Only LM: Stacking Blocks and the Causal Mask
|
||||||
|
|
||||||
|
🚧 *To be written — see [edu-vqxk]*
|
||||||
|
|
||||||
|
### §9 Exercise 3: Define the GPT-1-Style Model in `candle`
|
||||||
|
|
||||||
|
🚧 *To be written — see [edu-ujs5]*
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Part 4 — Training
|
||||||
|
|
||||||
|
### §10 Cross-Entropy Loss and the Training Loop
|
||||||
|
|
||||||
|
🚧 *To be written — see [edu-abdu]*
|
||||||
|
|
||||||
|
### §11 Exercise 4: Train on a Small Text Corpus
|
||||||
|
|
||||||
|
🚧 *To be written — see [edu-jybf]*
|
||||||
|
|
||||||
|
### §12 Exercise 5: Sample from the Model
|
||||||
|
|
||||||
|
🚧 *To be written — see [edu-i76z]*
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Part 5 — Reflection
|
||||||
|
|
||||||
|
### §13 What Limits This Model?
|
||||||
|
|
||||||
|
🚧 *To be written — see [edu-kkjc]*
|
||||||
|
|
||||||
|
### §14 Further Reading
|
||||||
|
|
||||||
|
🚧 *To be written — see [edu-9sb7]*
|
||||||
Loading…
Reference in New Issue