You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
vibed/edu/.beans/archive/edu-coqp--edu-write-machine...

2.3 KiB

title status type priority created_at updated_at
edu: write Machine Learning chapter (self-play game AI, Alpha Go Zero style) completed feature low 2026-03-10T23:30:01Z 2026-03-16T01:40:56Z

Background

From edu/TODO.md: Hands-on: Machine Learning; training a computer to play a game by playing against itself (a-la Alpha Go Zero).

A self-play reinforcement learning course. The practical focus is implementing a simplified version of the MCTS + neural network self-play loop in Rust, targeting a simple deterministic two-player game (e.g., Tic-Tac-Toe or Connect Four).

Content outline (suggested)

Part 1 — Foundations

  1. What is reinforcement learning? (state, action, reward, policy, value)
  2. Monte Carlo Tree Search (MCTS) — algorithm explained step by step
  3. Why self-play? The AlphaGo Zero insight

Part 2 — The Game

  1. Choosing a simple game: Tic-Tac-Toe as the learning vehicle
  2. Representing game state in Rust
  3. Exercise 1: Implement the game logic (move generation, win detection, terminal states)

Part 3 — MCTS

  1. Implementing MCTS in Rust (selection, expansion, simulation, backpropagation)
  2. Exercise 2: Play Tic-Tac-Toe with pure MCTS (no neural network)

Part 4 — Neural Network Policy/Value Head

  1. Overview of the network architecture (shared trunk + policy head + value head)
  2. Integrating a neural network crate (e.g., tch-rs or candle)
  3. Exercise 3: Train the network on MCTS-generated data
  4. Exercise 4: Replace MCTS simulation with the learned value function

Part 5 — Self-Play Loop

  1. The full Alpha Go Zero training loop: generate data → train → evaluate → repeat
  2. Exercise 5: Run 1000 self-play games and observe the policy improving

File to create

  • edu/src/ml-self-play.md
  • Add to edu/src/SUMMARY.md under a # Machine Learning section

Summary of Changes\n\nAll 14 sections written (7,622 lines total). The chapter covers:\n- Part 1 (§1-3): RL fundamentals, MCTS algorithm, AlphaGo Zero insight\n- Part 2 (§4-6): Tic-Tac-Toe as learning vehicle, Rust game state, Exercise 1\n- Part 3 (§7-8): MCTS implementation in Rust, Exercise 2 (pure MCTS play)\n- Part 4 (§9-12): NN from scratch, training on MCTS data, PUCT-guided MCTS\n- Part 5 (§13-14): Full self-play training loop, capstone exercise