From 818444962c1941b38d0e32f50d5ba7119fff0e90 Mon Sep 17 00:00:00 2001 From: Elijah Voigt Date: Fri, 13 Mar 2026 14:56:25 -0700 Subject: [PATCH] docs(edu): outline ML self-play chapter and create section tickets [edu-coqp] Add edu/src/ml-self-play.md with 14 stubbed sections across 5 parts: foundations (RL, MCTS, self-play), game engine (Tic-Tac-Toe), MCTS implementation, neural network integration, and the full self-play loop. Create one beans ticket per section (edu-wobk through edu-brtk). Co-Authored-By: Claude Sonnet 4.6 --- edu/src/SUMMARY.md | 4 ++ edu/src/ml-self-play.md | 130 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 134 insertions(+) create mode 100644 edu/src/ml-self-play.md diff --git a/edu/src/SUMMARY.md b/edu/src/SUMMARY.md index a7b121a..5137c70 100644 --- a/edu/src/SUMMARY.md +++ b/edu/src/SUMMARY.md @@ -21,3 +21,7 @@ # Graphics - [Shader Programming with wgpu and WGSL](shaders.md) + +# Machine Learning + +- [Training a Game AI Through Self-Play](ml-self-play.md) diff --git a/edu/src/ml-self-play.md b/edu/src/ml-self-play.md new file mode 100644 index 0000000..38a2081 --- /dev/null +++ b/edu/src/ml-self-play.md @@ -0,0 +1,130 @@ +# Machine Learning: Training a Game AI Through Self-Play + +This is a self-guided course on reinforcement learning through the lens of a concrete goal: training a Rust program to play Tic-Tac-Toe by playing against itself, in the style of AlphaGo Zero. No prior ML experience is assumed. You will build everything from scratch — a game engine, a search algorithm, and eventually a neural network that guides that search. Sections marked 🚧 are stubs whose full content is tracked in a beans ticket. + +--- + +## Table of Contents + +**Part 1 — Foundations** + +1. [What is reinforcement learning?](#1-what-is-reinforcement-learning) +2. [Monte Carlo Tree Search — algorithm explained](#2-monte-carlo-tree-search--algorithm-explained) +3. [Why self-play? The AlphaGo Zero insight](#3-why-self-play-the-alphago-zero-insight) + +**Part 2 — The Game** + +4. [Choosing a simple game: Tic-Tac-Toe](#4-choosing-a-simple-game-tic-tac-toe) +5. [Representing game state in Rust](#5-representing-game-state-in-rust) +6. [Exercise 1: implement the game logic](#6-exercise-1-implement-the-game-logic) + +**Part 3 — MCTS** + +7. [Implementing MCTS in Rust](#7-implementing-mcts-in-rust) +8. [Exercise 2: play Tic-Tac-Toe with pure MCTS](#8-exercise-2-play-tic-tac-toe-with-pure-mcts) + +**Part 4 — Neural Network Policy/Value Head** + +9. [Neural network architecture overview](#9-neural-network-architecture-overview) +10. [Integrating a neural network crate](#10-integrating-a-neural-network-crate) +11. [Exercise 3: train the network on MCTS data](#11-exercise-3-train-the-network-on-mcts-data) +12. [Exercise 4: replace rollout with the value network](#12-exercise-4-replace-rollout-with-the-value-network) + +**Part 5 — Self-Play Loop** + +13. [The full AlphaGo Zero training loop](#13-the-full-alphago-zero-training-loop) +14. [Exercise 5: 1000 self-play games; observe improvement](#14-exercise-5-1000-self-play-games-observe-improvement) + +--- + +## Part 1 — Foundations + +### 1. What is reinforcement learning? + +🚧 This section is a stub. Full content tracked in [edu-wobk]. + +--- + +### 2. Monte Carlo Tree Search — algorithm explained + +🚧 This section is a stub. Full content tracked in [edu-3yw9]. + +--- + +### 3. Why self-play? The AlphaGo Zero insight + +🚧 This section is a stub. Full content tracked in [edu-5go8]. + +--- + +## Part 2 — The Game + +### 4. Choosing a simple game: Tic-Tac-Toe + +🚧 This section is a stub. Full content tracked in [edu-k3tq]. + +--- + +### 5. Representing game state in Rust + +🚧 This section is a stub. Full content tracked in [edu-e39n]. + +--- + +### 6. Exercise 1: implement the game logic + +🚧 This section is a stub. Full content tracked in [edu-ymux]. + +--- + +## Part 3 — MCTS + +### 7. Implementing MCTS in Rust + +🚧 This section is a stub. Full content tracked in [edu-of9y]. + +--- + +### 8. Exercise 2: play Tic-Tac-Toe with pure MCTS + +🚧 This section is a stub. Full content tracked in [edu-4v13]. + +--- + +## Part 4 — Neural Network Policy/Value Head + +### 9. Neural network architecture overview + +🚧 This section is a stub. Full content tracked in [edu-iv0k]. + +--- + +### 10. Integrating a neural network crate + +🚧 This section is a stub. Full content tracked in [edu-pvou]. + +--- + +### 11. Exercise 3: train the network on MCTS data + +🚧 This section is a stub. Full content tracked in [edu-lqky]. + +--- + +### 12. Exercise 4: replace rollout with the value network + +🚧 This section is a stub. Full content tracked in [edu-7lu6]. + +--- + +## Part 5 — Self-Play Loop + +### 13. The full AlphaGo Zero training loop + +🚧 This section is a stub. Full content tracked in [edu-453h]. + +--- + +### 14. Exercise 5: 1000 self-play games; observe improvement + +🚧 This section is a stub. Full content tracked in [edu-brtk].