You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
vibed/edu/src/ml-self-play.md

3.6 KiB

Machine Learning: Training a Game AI Through Self-Play

This is a self-guided course on reinforcement learning through the lens of a concrete goal: training a Rust program to play Tic-Tac-Toe by playing against itself, in the style of AlphaGo Zero. No prior ML experience is assumed. You will build everything from scratch — a game engine, a search algorithm, and eventually a neural network that guides that search. Sections marked 🚧 are stubs whose full content is tracked in a beans ticket.


Table of Contents

Part 1 — Foundations

  1. What is reinforcement learning?
  2. Monte Carlo Tree Search — algorithm explained
  3. Why self-play? The AlphaGo Zero insight

Part 2 — The Game

  1. Choosing a simple game: Tic-Tac-Toe
  2. Representing game state in Rust
  3. Exercise 1: implement the game logic

Part 3 — MCTS

  1. Implementing MCTS in Rust
  2. Exercise 2: play Tic-Tac-Toe with pure MCTS

Part 4 — Neural Network Policy/Value Head

  1. Neural network architecture overview
  2. Integrating a neural network crate
  3. Exercise 3: train the network on MCTS data
  4. Exercise 4: replace rollout with the value network

Part 5 — Self-Play Loop

  1. The full AlphaGo Zero training loop
  2. Exercise 5: 1000 self-play games; observe improvement

Part 1 — Foundations

1. What is reinforcement learning?

🚧 This section is a stub. Full content tracked in [edu-wobk].


2. Monte Carlo Tree Search — algorithm explained

🚧 This section is a stub. Full content tracked in [edu-3yw9].


3. Why self-play? The AlphaGo Zero insight

🚧 This section is a stub. Full content tracked in [edu-5go8].


Part 2 — The Game

4. Choosing a simple game: Tic-Tac-Toe

🚧 This section is a stub. Full content tracked in [edu-k3tq].


5. Representing game state in Rust

🚧 This section is a stub. Full content tracked in [edu-e39n].


6. Exercise 1: implement the game logic

🚧 This section is a stub. Full content tracked in [edu-ymux].


Part 3 — MCTS

7. Implementing MCTS in Rust

🚧 This section is a stub. Full content tracked in [edu-of9y].


8. Exercise 2: play Tic-Tac-Toe with pure MCTS

🚧 This section is a stub. Full content tracked in [edu-4v13].


Part 4 — Neural Network Policy/Value Head

9. Neural network architecture overview

🚧 This section is a stub. Full content tracked in [edu-iv0k].


10. Integrating a neural network crate

🚧 This section is a stub. Full content tracked in [edu-pvou].


11. Exercise 3: train the network on MCTS data

🚧 This section is a stub. Full content tracked in [edu-lqky].


12. Exercise 4: replace rollout with the value network

🚧 This section is a stub. Full content tracked in [edu-7lu6].


Part 5 — Self-Play Loop

13. The full AlphaGo Zero training loop

🚧 This section is a stub. Full content tracked in [edu-453h].


14. Exercise 5: 1000 self-play games; observe improvement

🚧 This section is a stub. Full content tracked in [edu-brtk].