# Machine Learning: Training a Game AI Through Self-Play

This is a self-guided course on reinforcement learning through the lens of a concrete goal: training a Rust program to play Tic-Tac-Toe by playing against itself, in the style of AlphaGo Zero. No prior ML experience is assumed. You will build everything from scratch — a game engine, a search algorithm, and eventually a neural network that guides that search. Sections marked 🚧 are stubs whose full content is tracked in a beans ticket.

---

## Table of Contents

**Part 1 — Foundations**

1. [What is reinforcement learning?](#1-what-is-reinforcement-learning)
2. [Monte Carlo Tree Search — algorithm explained](#2-monte-carlo-tree-search--algorithm-explained)
3. [Why self-play? The AlphaGo Zero insight](#3-why-self-play-the-alphago-zero-insight)

**Part 2 — The Game**

4. [Choosing a simple game: Tic-Tac-Toe](#4-choosing-a-simple-game-tic-tac-toe)
5. [Representing game state in Rust](#5-representing-game-state-in-rust)
6. [Exercise 1: implement the game logic](#6-exercise-1-implement-the-game-logic)

**Part 3 — MCTS**

7. [Implementing MCTS in Rust](#7-implementing-mcts-in-rust)
8. [Exercise 2: play Tic-Tac-Toe with pure MCTS](#8-exercise-2-play-tic-tac-toe-with-pure-mcts)

**Part 4 — Neural Network Policy/Value Head**

9. [Neural network architecture overview](#9-neural-network-architecture-overview)
10. [Integrating a neural network crate](#10-integrating-a-neural-network-crate)
11. [Exercise 3: train the network on MCTS data](#11-exercise-3-train-the-network-on-mcts-data)
12. [Exercise 4: replace rollout with the value network](#12-exercise-4-replace-rollout-with-the-value-network)

**Part 5 — Self-Play Loop**

13. [The full AlphaGo Zero training loop](#13-the-full-alphago-zero-training-loop)
14. [Exercise 5: 1000 self-play games; observe improvement](#14-exercise-5-1000-self-play-games-observe-improvement)

---

## Part 1 — Foundations

### 1. What is reinforcement learning?

🚧 This section is a stub. Full content tracked in [edu-wobk].

---

### 2. Monte Carlo Tree Search — algorithm explained

🚧 This section is a stub. Full content tracked in [edu-3yw9].

---

### 3. Why self-play? The AlphaGo Zero insight

🚧 This section is a stub. Full content tracked in [edu-5go8].

---

## Part 2 — The Game

### 4. Choosing a simple game: Tic-Tac-Toe

🚧 This section is a stub. Full content tracked in [edu-k3tq].

---

### 5. Representing game state in Rust

🚧 This section is a stub. Full content tracked in [edu-e39n].

---

### 6. Exercise 1: implement the game logic

🚧 This section is a stub. Full content tracked in [edu-ymux].

---

## Part 3 — MCTS

### 7. Implementing MCTS in Rust

🚧 This section is a stub. Full content tracked in [edu-of9y].

---

### 8. Exercise 2: play Tic-Tac-Toe with pure MCTS

🚧 This section is a stub. Full content tracked in [edu-4v13].

---

## Part 4 — Neural Network Policy/Value Head

### 9. Neural network architecture overview

🚧 This section is a stub. Full content tracked in [edu-iv0k].

---

### 10. Integrating a neural network crate

🚧 This section is a stub. Full content tracked in [edu-pvou].

---

### 11. Exercise 3: train the network on MCTS data

🚧 This section is a stub. Full content tracked in [edu-lqky].

---

### 12. Exercise 4: replace rollout with the value network

🚧 This section is a stub. Full content tracked in [edu-7lu6].

---

## Part 5 — Self-Play Loop

### 13. The full AlphaGo Zero training loop

🚧 This section is a stub. Full content tracked in [edu-453h].

---

### 14. Exercise 5: 1000 self-play games; observe improvement

🚧 This section is a stub. Full content tracked in [edu-brtk].