You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

3.6 KiB

Raw Blame History

Machine Learning: Training a Game AI Through Self-Play

This is a self-guided course on reinforcement learning through the lens of a concrete goal: training a Rust program to play Tic-Tac-Toe by playing against itself, in the style of AlphaGo Zero. No prior ML experience is assumed. You will build everything from scratch — a game engine, a search algorithm, and eventually a neural network that guides that search. Sections marked 🚧 are stubs whose full content is tracked in a beans ticket.

Part 1 — Foundations

What is reinforcement learning?
Monte Carlo Tree Search — algorithm explained
Why self-play? The AlphaGo Zero insight

Part 2 — The Game

Choosing a simple game: Tic-Tac-Toe
Representing game state in Rust
Exercise 1: implement the game logic

Part 3 — MCTS

Implementing MCTS in Rust
Exercise 2: play Tic-Tac-Toe with pure MCTS

Part 4 — Neural Network Policy/Value Head

Neural network architecture overview
Integrating a neural network crate
Exercise 3: train the network on MCTS data
Exercise 4: replace rollout with the value network

Part 5 — Self-Play Loop

The full AlphaGo Zero training loop
Exercise 5: 1000 self-play games; observe improvement

Part 1 — Foundations

1. What is reinforcement learning?

🚧 This section is a stub. Full content tracked in [edu-wobk].

2. Monte Carlo Tree Search — algorithm explained

🚧 This section is a stub. Full content tracked in [edu-3yw9].

3. Why self-play? The AlphaGo Zero insight

🚧 This section is a stub. Full content tracked in [edu-5go8].

Part 2 — The Game

4. Choosing a simple game: Tic-Tac-Toe

🚧 This section is a stub. Full content tracked in [edu-k3tq].

5. Representing game state in Rust

🚧 This section is a stub. Full content tracked in [edu-e39n].

6. Exercise 1: implement the game logic

🚧 This section is a stub. Full content tracked in [edu-ymux].

Part 3 — MCTS

7. Implementing MCTS in Rust

🚧 This section is a stub. Full content tracked in [edu-of9y].

8. Exercise 2: play Tic-Tac-Toe with pure MCTS

🚧 This section is a stub. Full content tracked in [edu-4v13].

Part 4 — Neural Network Policy/Value Head

9. Neural network architecture overview

🚧 This section is a stub. Full content tracked in [edu-iv0k].

10. Integrating a neural network crate

🚧 This section is a stub. Full content tracked in [edu-pvou].

11. Exercise 3: train the network on MCTS data

🚧 This section is a stub. Full content tracked in [edu-lqky].

12. Exercise 4: replace rollout with the value network

🚧 This section is a stub. Full content tracked in [edu-7lu6].

Part 5 — Self-Play Loop

13. The full AlphaGo Zero training loop

🚧 This section is a stub. Full content tracked in [edu-453h].

14. Exercise 5: 1000 self-play games; observe improvement

🚧 This section is a stub. Full content tracked in [edu-brtk].

3.6 KiB Raw Blame History