# Machine Learning: Training a Game AI Through Self-Play This is a self-guided course on reinforcement learning through the lens of a concrete goal: training a Rust program to play Tic-Tac-Toe by playing against itself, in the style of AlphaGo Zero. No prior ML experience is assumed. You will build everything from scratch — a game engine, a search algorithm, and eventually a neural network that guides that search. Sections marked 🚧 are stubs whose full content is tracked in a beans ticket. --- ## Table of Contents **Part 1 — Foundations** 1. [What is reinforcement learning?](#1-what-is-reinforcement-learning) 2. [Monte Carlo Tree Search — algorithm explained](#2-monte-carlo-tree-search--algorithm-explained) 3. [Why self-play? The AlphaGo Zero insight](#3-why-self-play-the-alphago-zero-insight) **Part 2 — The Game** 4. [Choosing a simple game: Tic-Tac-Toe](#4-choosing-a-simple-game-tic-tac-toe) 5. [Representing game state in Rust](#5-representing-game-state-in-rust) 6. [Exercise 1: implement the game logic](#6-exercise-1-implement-the-game-logic) **Part 3 — MCTS** 7. [Implementing MCTS in Rust](#7-implementing-mcts-in-rust) 8. [Exercise 2: play Tic-Tac-Toe with pure MCTS](#8-exercise-2-play-tic-tac-toe-with-pure-mcts) **Part 4 — Neural Network Policy/Value Head** 9. [Neural network architecture overview](#9-neural-network-architecture-overview) 10. [Integrating a neural network crate](#10-integrating-a-neural-network-crate) 11. [Exercise 3: train the network on MCTS data](#11-exercise-3-train-the-network-on-mcts-data) 12. [Exercise 4: replace rollout with the value network](#12-exercise-4-replace-rollout-with-the-value-network) **Part 5 — Self-Play Loop** 13. [The full AlphaGo Zero training loop](#13-the-full-alphago-zero-training-loop) 14. [Exercise 5: 1000 self-play games; observe improvement](#14-exercise-5-1000-self-play-games-observe-improvement) --- ## Part 1 — Foundations ### 1. What is reinforcement learning? 🚧 This section is a stub. Full content tracked in [edu-wobk]. --- ### 2. Monte Carlo Tree Search — algorithm explained 🚧 This section is a stub. Full content tracked in [edu-3yw9]. --- ### 3. Why self-play? The AlphaGo Zero insight 🚧 This section is a stub. Full content tracked in [edu-5go8]. --- ## Part 2 — The Game ### 4. Choosing a simple game: Tic-Tac-Toe 🚧 This section is a stub. Full content tracked in [edu-k3tq]. --- ### 5. Representing game state in Rust 🚧 This section is a stub. Full content tracked in [edu-e39n]. --- ### 6. Exercise 1: implement the game logic 🚧 This section is a stub. Full content tracked in [edu-ymux]. --- ## Part 3 — MCTS ### 7. Implementing MCTS in Rust 🚧 This section is a stub. Full content tracked in [edu-of9y]. --- ### 8. Exercise 2: play Tic-Tac-Toe with pure MCTS 🚧 This section is a stub. Full content tracked in [edu-4v13]. --- ## Part 4 — Neural Network Policy/Value Head ### 9. Neural network architecture overview 🚧 This section is a stub. Full content tracked in [edu-iv0k]. --- ### 10. Integrating a neural network crate 🚧 This section is a stub. Full content tracked in [edu-pvou]. --- ### 11. Exercise 3: train the network on MCTS data 🚧 This section is a stub. Full content tracked in [edu-lqky]. --- ### 12. Exercise 4: replace rollout with the value network 🚧 This section is a stub. Full content tracked in [edu-7lu6]. --- ## Part 5 — Self-Play Loop ### 13. The full AlphaGo Zero training loop 🚧 This section is a stub. Full content tracked in [edu-453h]. --- ### 14. Exercise 5: 1000 self-play games; observe improvement 🚧 This section is a stub. Full content tracked in [edu-brtk].