You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
vibed/edu/.beans/edu-u2w7--edu-write-chapter...

1.8 KiB

title status type priority created_at updated_at
edu: write chapter on creating and training a simple LLM in-progress feature low 2026-03-10T23:30:00Z 2026-03-13T22:01:43Z

Background

From edu/TODO.md: Hands-on: Creating and training a simple LLM.

A practical course on building a small language model from scratch in Rust, covering tokenisation, the Transformer architecture, and a training loop. The goal is deep understanding rather than production scale.

Content outline (suggested)

Part 1 — What is a Language Model?

  1. Predicting the next token: the core task
  2. Tokenisation: BPE, byte-level, character-level — pick character-level for simplicity
  3. Exercise 1: Build a character-level tokeniser in Rust

Part 2 — The Transformer Architecture

  1. Embeddings and positional encoding
  2. Self-attention: queries, keys, values — the attention formula
  3. Multi-head attention
  4. Feed-forward sublayers, residual connections, layer norm
  5. Exercise 2: Implement a single attention head in Rust (no ML framework)

Part 3 — Assembling the Model

  1. Stacking Transformer blocks into a decoder-only LM
  2. Using candle (Hugging Face's Rust ML framework) for tensor ops and autodiff
  3. Exercise 3: Define a small GPT-like model in candle

Part 4 — Training

  1. Cross-entropy loss for next-token prediction
  2. The training loop: forward pass, loss, backward pass, optimizer step
  3. Exercise 4: Train on a small text corpus (e.g., Shakespeare or a short book)
  4. Exercise 5: Sample from the model and observe output quality vs. training steps

Part 5 — Reflection

  1. What limits this model? Scale, data, compute
  2. Pointers to real LLM training (GPT-2, LLaMA, Mistral)
  3. Further reading

File to create

  • edu/src/llm-from-scratch.md
  • Add to edu/src/SUMMARY.md under the # Machine Learning section