chore(edu): update bean statuses and write self-play chapter content
Archive self-play beans, update shader/LLM parent beans to completed, and add self-play chapter content to ml-self-play.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>main
parent
0690327296
commit
814d1f50cb
@ -1,11 +1,14 @@
|
||||
---
|
||||
# edu-3yw9
|
||||
title: 'Write §2: Monte Carlo Tree Search — algorithm explained'
|
||||
status: todo
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T22:48:46Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Step-by-step walkthrough of MCTS: selection (UCB1), expansion, simulation/rollout, backpropagation. Include a worked example on a small game tree.
|
||||
|
||||
## Summary of Changes\n\nWrote full content for §2 covering MCTS algorithm: the four phases (selection, expansion, simulation, backpropagation), UCB1/UCT formula, worked ASCII tree example, and strengths/limitations.
|
||||
@ -0,0 +1,14 @@
|
||||
---
|
||||
# edu-453h
|
||||
title: 'Write §13: The full AlphaGo Zero training loop'
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-16T01:35:06Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Reading lesson: generate → train → evaluate → promote. Discuss the ELO-based model selection step and why it matters.
|
||||
|
||||
## Summary of Changes\n\nWrote full content for §13 covering the complete AlphaGo Zero training loop: self-play with temperature, training, evaluation gate, and the iterative improvement cycle with complete Rust implementation.
|
||||
@ -1,11 +1,16 @@
|
||||
---
|
||||
# edu-4v13
|
||||
title: 'Write §8: Exercise 2 — play Tic-Tac-Toe with pure MCTS'
|
||||
status: todo
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T23:04:14Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Exercise: wire MCTS to the game logic from Exercise 1 and run a match. Show sample output, discuss iteration count vs strength.
|
||||
|
||||
## Summary of Changes
|
||||
|
||||
Wrote full content for §8: Exercise 2 with MCTS vs MCTS tournament (100 games, iteration count experiments), human vs MCTS CLI game, experimentation prompts, and readiness checklist.
|
||||
@ -1,11 +1,14 @@
|
||||
---
|
||||
# edu-5go8
|
||||
title: 'Write §3: Why self-play? The AlphaGo Zero insight'
|
||||
status: todo
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T22:51:31Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Explain the key insight: a capable engine can be its own teacher. Historical context (AlphaGo vs AlphaGo Zero) and why the approach generalises.
|
||||
|
||||
## Summary of Changes\n\nWrote full content for §3 covering the evolution from Deep Blue to AlphaGo Zero, the self-play insight, the virtuous training cycle, and how we'll adapt the approach for Tic-Tac-Toe.
|
||||
@ -0,0 +1,16 @@
|
||||
---
|
||||
# edu-7lu6
|
||||
title: 'Write §12: Exercise 4 — replace rollout with the value network'
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-16T01:31:54Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Exercise: substitute random rollout in MCTS with a neural-network value estimate; compare strength before and after.
|
||||
|
||||
## Summary of Changes
|
||||
|
||||
Wrote full content for §12: Exercise 4 covering PUCT formula, replacing random rollouts with value network evaluation, adding policy priors to MCTS nodes, modified MCTS code, and pure vs network-guided MCTS comparison.
|
||||
@ -0,0 +1,14 @@
|
||||
---
|
||||
# edu-brtk
|
||||
title: 'Write §14: Exercise 5 — 1000 self-play games; observe improvement'
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-16T01:40:32Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Capstone exercise: run the full self-play loop for 1000 games; plot win-rate over iterations; discuss what worked and what didn't.
|
||||
|
||||
## Summary of Changes\n\nWrote full content for §14: Exercise 5 — the capstone exercise running 1000 self-play games, tracking diagnostics, validating convergence to perfect play, and providing next steps for further learning.
|
||||
@ -0,0 +1,16 @@
|
||||
---
|
||||
# edu-e39n
|
||||
title: 'Write §5: Representing game state in Rust'
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T22:56:30Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Reading lesson: design of Board, Player, Move types. Discuss representation trade-offs (bitboard vs array). Show the full type definitions.
|
||||
|
||||
## Summary of Changes
|
||||
|
||||
Wrote full content for §5 covering Rust game state representation: Player enum, GameState struct, board indexing, Display impl, move generation, winner checking, and immutable apply_move design.
|
||||
@ -0,0 +1,16 @@
|
||||
---
|
||||
# edu-iv0k
|
||||
title: 'Write §9: Neural network architecture overview'
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T23:07:35Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Conceptual lesson: shared convolutional trunk, policy head (move probabilities), value head (win probability). Diagrams encouraged. No code yet.
|
||||
|
||||
## Summary of Changes
|
||||
|
||||
Wrote full content for §9 covering neural network fundamentals from scratch: neurons/weights/biases, layers, forward pass, training intuition, and the dual-headed policy+value architecture for game AI with concrete TTT dimensions.
|
||||
@ -0,0 +1,16 @@
|
||||
---
|
||||
# edu-k3tq
|
||||
title: 'Write §4: Choosing a simple game — Tic-Tac-Toe'
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T22:53:54Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Explain why Tic-Tac-Toe is ideal: small state space, deterministic, zero-sum, easily verifiable. Foreshadow how the same approach scales to Go/Chess.
|
||||
|
||||
## Summary of Changes
|
||||
|
||||
Wrote full content for §4 covering why Tic-Tac-Toe is the ideal learning vehicle: suitable game properties, game tree size, known optimal solution as validation target, and comparison with alternatives.
|
||||
@ -0,0 +1,14 @@
|
||||
---
|
||||
# edu-lqky
|
||||
title: 'Write §11: Exercise 3 — train the network on MCTS data'
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-16T00:32:55Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Exercise: generate training examples (state, policy vector, value) from pure MCTS self-play; run one training epoch; log loss.
|
||||
|
||||
## Summary of Changes\n\nWrote full content for §11: Exercise 3 covering MCTS data generation pipeline, TrainingExample struct, mini-batch SGD training loop, loss tracking, network evaluation, and experimentation prompts.
|
||||
@ -1,11 +1,14 @@
|
||||
---
|
||||
# edu-of9y
|
||||
title: 'Write §7: Implementing MCTS in Rust'
|
||||
status: todo
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T23:01:59Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Walk through selection (UCB1 formula), expansion, simulation (random rollout), backpropagation. Show Rust code for the node structure and the four phases.
|
||||
|
||||
## Summary of Changes\n\nWrote full content for §7 covering MCTS implementation in Rust: arena-allocated node structure, all four phases implemented and explained, UCT calculation, main loop, and move selection.
|
||||
@ -0,0 +1,14 @@
|
||||
---
|
||||
# edu-pvou
|
||||
title: 'Write §10: Integrating a neural network crate'
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T23:11:44Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Reading lesson: evaluate tch-rs vs candle for this use case; show how to define and initialise the network; basic forward-pass usage.
|
||||
|
||||
## Summary of Changes\n\nWrote full content for §10 covering from-scratch neural network implementation in Rust: Layer struct, forward pass with ReLU/softmax/tanh, illegal move masking, Xavier initialization, backpropagation, SGD training, and complete compilable code.
|
||||
@ -0,0 +1,14 @@
|
||||
---
|
||||
# edu-wobk
|
||||
title: 'Write §1: What is reinforcement learning?'
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T22:46:18Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Cover: state, action, reward, policy, value function. Intuitive explanation with a game-playing example. No code.
|
||||
|
||||
## Summary of Changes\n\nWrote full content for §1 covering RL fundamentals: agent/environment loop, state/action/reward/policy/value concepts, contrast with supervised/unsupervised learning, and why RL fits games.
|
||||
@ -1,11 +1,14 @@
|
||||
---
|
||||
# edu-ymux
|
||||
title: 'Write §6: Exercise 1 — implement Tic-Tac-Toe game logic'
|
||||
status: todo
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T22:58:06Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Hands-on exercise: move generation, win detection, terminal-state check, displaying the board. Include starter code and expected test output.
|
||||
|
||||
## Summary of Changes\n\nWrote full content for §6: Exercise 1 with project setup instructions, 8 unit test specifications with collapsible solutions, a random-game main function, and a readiness checklist.
|
||||
@ -1,11 +0,0 @@
|
||||
---
|
||||
# edu-453h
|
||||
title: 'Write §13: The full AlphaGo Zero training loop'
|
||||
status: todo
|
||||
type: task
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Reading lesson: generate → train → evaluate → promote. Discuss the ELO-based model selection step and why it matters.
|
||||
@ -1,11 +0,0 @@
|
||||
---
|
||||
# edu-7lu6
|
||||
title: 'Write §12: Exercise 4 — replace rollout with the value network'
|
||||
status: todo
|
||||
type: task
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Exercise: substitute random rollout in MCTS with a neural-network value estimate; compare strength before and after.
|
||||
@ -1,11 +0,0 @@
|
||||
---
|
||||
# edu-brtk
|
||||
title: 'Write §14: Exercise 5 — 1000 self-play games; observe improvement'
|
||||
status: todo
|
||||
type: task
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Capstone exercise: run the full self-play loop for 1000 games; plot win-rate over iterations; discuss what worked and what didn't.
|
||||
@ -1,11 +0,0 @@
|
||||
---
|
||||
# edu-e39n
|
||||
title: 'Write §5: Representing game state in Rust'
|
||||
status: todo
|
||||
type: task
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Reading lesson: design of Board, Player, Move types. Discuss representation trade-offs (bitboard vs array). Show the full type definitions.
|
||||
@ -1,11 +0,0 @@
|
||||
---
|
||||
# edu-iv0k
|
||||
title: 'Write §9: Neural network architecture overview'
|
||||
status: todo
|
||||
type: task
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Conceptual lesson: shared convolutional trunk, policy head (move probabilities), value head (win probability). Diagrams encouraged. No code yet.
|
||||
@ -1,11 +0,0 @@
|
||||
---
|
||||
# edu-k3tq
|
||||
title: 'Write §4: Choosing a simple game — Tic-Tac-Toe'
|
||||
status: todo
|
||||
type: task
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Explain why Tic-Tac-Toe is ideal: small state space, deterministic, zero-sum, easily verifiable. Foreshadow how the same approach scales to Go/Chess.
|
||||
@ -1,11 +0,0 @@
|
||||
---
|
||||
# edu-lqky
|
||||
title: 'Write §11: Exercise 3 — train the network on MCTS data'
|
||||
status: todo
|
||||
type: task
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Exercise: generate training examples (state, policy vector, value) from pure MCTS self-play; run one training epoch; log loss.
|
||||
@ -1,11 +0,0 @@
|
||||
---
|
||||
# edu-pvou
|
||||
title: 'Write §10: Integrating a neural network crate'
|
||||
status: todo
|
||||
type: task
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Reading lesson: evaluate tch-rs vs candle for this use case; show how to define and initialise the network; basic forward-pass usage.
|
||||
@ -1,11 +0,0 @@
|
||||
---
|
||||
# edu-wobk
|
||||
title: 'Write §1: What is reinforcement learning?'
|
||||
status: todo
|
||||
type: task
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Cover: state, action, reward, policy, value function. Intuitive explanation with a game-playing example. No code.
|
||||
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue