You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
2.3 KiB
2.3 KiB
| title | status | type | priority | created_at | updated_at |
|---|---|---|---|---|---|
| edu: write Machine Learning chapter (self-play game AI, Alpha Go Zero style) | completed | feature | low | 2026-03-10T23:30:01Z | 2026-03-16T01:40:56Z |
Background
From edu/TODO.md: Hands-on: Machine Learning; training a computer to play a game by playing against itself (a-la Alpha Go Zero).
A self-play reinforcement learning course. The practical focus is implementing a simplified version of the MCTS + neural network self-play loop in Rust, targeting a simple deterministic two-player game (e.g., Tic-Tac-Toe or Connect Four).
Content outline (suggested)
Part 1 — Foundations
- What is reinforcement learning? (state, action, reward, policy, value)
- Monte Carlo Tree Search (MCTS) — algorithm explained step by step
- Why self-play? The AlphaGo Zero insight
Part 2 — The Game
- Choosing a simple game: Tic-Tac-Toe as the learning vehicle
- Representing game state in Rust
- Exercise 1: Implement the game logic (move generation, win detection, terminal states)
Part 3 — MCTS
- Implementing MCTS in Rust (selection, expansion, simulation, backpropagation)
- Exercise 2: Play Tic-Tac-Toe with pure MCTS (no neural network)
Part 4 — Neural Network Policy/Value Head
- Overview of the network architecture (shared trunk + policy head + value head)
- Integrating a neural network crate (e.g.,
tch-rsorcandle) - Exercise 3: Train the network on MCTS-generated data
- Exercise 4: Replace MCTS simulation with the learned value function
Part 5 — Self-Play Loop
- The full Alpha Go Zero training loop: generate data → train → evaluate → repeat
- Exercise 5: Run 1000 self-play games and observe the policy improving
File to create
edu/src/ml-self-play.md- Add to
edu/src/SUMMARY.mdunder a# Machine Learningsection