3.6 KiB
Machine Learning: Training a Game AI Through Self-Play
This is a self-guided course on reinforcement learning through the lens of a concrete goal: training a Rust program to play Tic-Tac-Toe by playing against itself, in the style of AlphaGo Zero. No prior ML experience is assumed. You will build everything from scratch — a game engine, a search algorithm, and eventually a neural network that guides that search. Sections marked 🚧 are stubs whose full content is tracked in a beans ticket.
Table of Contents
Part 1 — Foundations
- What is reinforcement learning?
- Monte Carlo Tree Search — algorithm explained
- Why self-play? The AlphaGo Zero insight
Part 2 — The Game
- Choosing a simple game: Tic-Tac-Toe
- Representing game state in Rust
- Exercise 1: implement the game logic
Part 3 — MCTS
Part 4 — Neural Network Policy/Value Head
- Neural network architecture overview
- Integrating a neural network crate
- Exercise 3: train the network on MCTS data
- Exercise 4: replace rollout with the value network
Part 5 — Self-Play Loop
Part 1 — Foundations
1. What is reinforcement learning?
🚧 This section is a stub. Full content tracked in [edu-wobk].
2. Monte Carlo Tree Search — algorithm explained
🚧 This section is a stub. Full content tracked in [edu-3yw9].
3. Why self-play? The AlphaGo Zero insight
🚧 This section is a stub. Full content tracked in [edu-5go8].
Part 2 — The Game
4. Choosing a simple game: Tic-Tac-Toe
🚧 This section is a stub. Full content tracked in [edu-k3tq].
5. Representing game state in Rust
🚧 This section is a stub. Full content tracked in [edu-e39n].
6. Exercise 1: implement the game logic
🚧 This section is a stub. Full content tracked in [edu-ymux].
Part 3 — MCTS
7. Implementing MCTS in Rust
🚧 This section is a stub. Full content tracked in [edu-of9y].
8. Exercise 2: play Tic-Tac-Toe with pure MCTS
🚧 This section is a stub. Full content tracked in [edu-4v13].
Part 4 — Neural Network Policy/Value Head
9. Neural network architecture overview
🚧 This section is a stub. Full content tracked in [edu-iv0k].
10. Integrating a neural network crate
🚧 This section is a stub. Full content tracked in [edu-pvou].
11. Exercise 3: train the network on MCTS data
🚧 This section is a stub. Full content tracked in [edu-lqky].
12. Exercise 4: replace rollout with the value network
🚧 This section is a stub. Full content tracked in [edu-7lu6].
Part 5 — Self-Play Loop
13. The full AlphaGo Zero training loop
🚧 This section is a stub. Full content tracked in [edu-453h].
14. Exercise 5: 1000 self-play games; observe improvement
🚧 This section is a stub. Full content tracked in [edu-brtk].