You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
47 lines
1.8 KiB
Markdown
47 lines
1.8 KiB
Markdown
---
|
|
# edu-coqp
|
|
title: 'edu: write Machine Learning chapter (self-play game AI, Alpha Go Zero style)'
|
|
status: todo
|
|
type: task
|
|
priority: low
|
|
created_at: 2026-03-10T23:30:01Z
|
|
updated_at: 2026-03-10T23:30:01Z
|
|
---
|
|
|
|
## Background
|
|
|
|
From `edu/TODO.md`: Hands-on: Machine Learning; training a computer to play a game by playing against itself (a-la Alpha Go Zero).
|
|
|
|
A self-play reinforcement learning course. The practical focus is implementing a simplified version of the MCTS + neural network self-play loop in Rust, targeting a simple deterministic two-player game (e.g., Tic-Tac-Toe or Connect Four).
|
|
|
|
## Content outline (suggested)
|
|
|
|
### Part 1 — Foundations
|
|
1. What is reinforcement learning? (state, action, reward, policy, value)
|
|
2. Monte Carlo Tree Search (MCTS) — algorithm explained step by step
|
|
3. Why self-play? The AlphaGo Zero insight
|
|
|
|
### Part 2 — The Game
|
|
4. Choosing a simple game: Tic-Tac-Toe as the learning vehicle
|
|
5. Representing game state in Rust
|
|
6. Exercise 1: Implement the game logic (move generation, win detection, terminal states)
|
|
|
|
### Part 3 — MCTS
|
|
7. Implementing MCTS in Rust (selection, expansion, simulation, backpropagation)
|
|
8. Exercise 2: Play Tic-Tac-Toe with pure MCTS (no neural network)
|
|
|
|
### Part 4 — Neural Network Policy/Value Head
|
|
9. Overview of the network architecture (shared trunk + policy head + value head)
|
|
10. Integrating a neural network crate (e.g., `tch-rs` or `candle`)
|
|
11. Exercise 3: Train the network on MCTS-generated data
|
|
12. Exercise 4: Replace MCTS simulation with the learned value function
|
|
|
|
### Part 5 — Self-Play Loop
|
|
13. The full Alpha Go Zero training loop: generate data → train → evaluate → repeat
|
|
14. Exercise 5: Run 1000 self-play games and observe the policy improving
|
|
|
|
## File to create
|
|
|
|
- `edu/src/ml-self-play.md`
|
|
- Add to `edu/src/SUMMARY.md` under a `# Machine Learning` section
|