chore(edu): archive completed beans and add self-play chapter section tickets [edu-coqp]

Archives completed/scrapped beans from previous chapters (markov, vector
db, compiler). Adds section beans for the self-play ML chapter (edu-coqp).
main
Elijah Voigt 3 months ago
parent 05ac10f5e3
commit b2e46a00c1

@ -0,0 +1,11 @@
---
# edu-3yw9
title: 'Write §2: Monte Carlo Tree Search — algorithm explained'
status: todo
type: task
created_at: 2026-03-13T20:03:17Z
updated_at: 2026-03-13T20:03:17Z
parent: edu-coqp
---
Step-by-step walkthrough of MCTS: selection (UCB1), expansion, simulation/rollout, backpropagation. Include a worked example on a small game tree.

@ -0,0 +1,11 @@
---
# edu-453h
title: 'Write §13: The full AlphaGo Zero training loop'
status: todo
type: task
created_at: 2026-03-13T20:03:17Z
updated_at: 2026-03-13T20:03:17Z
parent: edu-coqp
---
Reading lesson: generate → train → evaluate → promote. Discuss the ELO-based model selection step and why it matters.

@ -0,0 +1,11 @@
---
# edu-4v13
title: 'Write §8: Exercise 2 — play Tic-Tac-Toe with pure MCTS'
status: todo
type: task
created_at: 2026-03-13T20:03:17Z
updated_at: 2026-03-13T20:03:17Z
parent: edu-coqp
---
Exercise: wire MCTS to the game logic from Exercise 1 and run a match. Show sample output, discuss iteration count vs strength.

@ -0,0 +1,11 @@
---
# edu-5go8
title: 'Write §3: Why self-play? The AlphaGo Zero insight'
status: todo
type: task
created_at: 2026-03-13T20:03:17Z
updated_at: 2026-03-13T20:03:17Z
parent: edu-coqp
---
Explain the key insight: a capable engine can be its own teacher. Historical context (AlphaGo vs AlphaGo Zero) and why the approach generalises.

@ -0,0 +1,11 @@
---
# edu-7lu6
title: 'Write §12: Exercise 4 — replace rollout with the value network'
status: todo
type: task
created_at: 2026-03-13T20:03:17Z
updated_at: 2026-03-13T20:03:17Z
parent: edu-coqp
---
Exercise: substitute random rollout in MCTS with a neural-network value estimate; compare strength before and after.

@ -0,0 +1,11 @@
---
# edu-brtk
title: 'Write §14: Exercise 5 — 1000 self-play games; observe improvement'
status: todo
type: task
created_at: 2026-03-13T20:03:17Z
updated_at: 2026-03-13T20:03:17Z
parent: edu-coqp
---
Capstone exercise: run the full self-play loop for 1000 games; plot win-rate over iterations; discuss what worked and what didn't.

@ -1,11 +1,11 @@
---
# edu-coqp
title: 'edu: write Machine Learning chapter (self-play game AI, Alpha Go Zero style)'
status: todo
type: task
status: in-progress
type: feature
priority: low
created_at: 2026-03-10T23:30:01Z
updated_at: 2026-03-10T23:30:01Z
updated_at: 2026-03-13T20:03:44Z
---
## Background

@ -0,0 +1,11 @@
---
# edu-e39n
title: 'Write §5: Representing game state in Rust'
status: todo
type: task
created_at: 2026-03-13T20:03:17Z
updated_at: 2026-03-13T20:03:17Z
parent: edu-coqp
---
Reading lesson: design of Board, Player, Move types. Discuss representation trade-offs (bitboard vs array). Show the full type definitions.

@ -0,0 +1,11 @@
---
# edu-iv0k
title: 'Write §9: Neural network architecture overview'
status: todo
type: task
created_at: 2026-03-13T20:03:17Z
updated_at: 2026-03-13T20:03:17Z
parent: edu-coqp
---
Conceptual lesson: shared convolutional trunk, policy head (move probabilities), value head (win probability). Diagrams encouraged. No code yet.

@ -0,0 +1,11 @@
---
# edu-k3tq
title: 'Write §4: Choosing a simple game — Tic-Tac-Toe'
status: todo
type: task
created_at: 2026-03-13T20:03:17Z
updated_at: 2026-03-13T20:03:17Z
parent: edu-coqp
---
Explain why Tic-Tac-Toe is ideal: small state space, deterministic, zero-sum, easily verifiable. Foreshadow how the same approach scales to Go/Chess.

@ -0,0 +1,11 @@
---
# edu-lqky
title: 'Write §11: Exercise 3 — train the network on MCTS data'
status: todo
type: task
created_at: 2026-03-13T20:03:17Z
updated_at: 2026-03-13T20:03:17Z
parent: edu-coqp
---
Exercise: generate training examples (state, policy vector, value) from pure MCTS self-play; run one training epoch; log loss.

@ -0,0 +1,11 @@
---
# edu-of9y
title: 'Write §7: Implementing MCTS in Rust'
status: todo
type: task
created_at: 2026-03-13T20:03:17Z
updated_at: 2026-03-13T20:03:17Z
parent: edu-coqp
---
Walk through selection (UCB1 formula), expansion, simulation (random rollout), backpropagation. Show Rust code for the node structure and the four phases.

@ -0,0 +1,11 @@
---
# edu-pvou
title: 'Write §10: Integrating a neural network crate'
status: todo
type: task
created_at: 2026-03-13T20:03:17Z
updated_at: 2026-03-13T20:03:17Z
parent: edu-coqp
---
Reading lesson: evaluate tch-rs vs candle for this use case; show how to define and initialise the network; basic forward-pass usage.

@ -0,0 +1,11 @@
---
# edu-wobk
title: 'Write §1: What is reinforcement learning?'
status: todo
type: task
created_at: 2026-03-13T20:03:17Z
updated_at: 2026-03-13T20:03:17Z
parent: edu-coqp
---
Cover: state, action, reward, policy, value function. Intuitive explanation with a game-playing example. No code.

@ -0,0 +1,11 @@
---
# edu-ymux
title: 'Write §6: Exercise 1 — implement Tic-Tac-Toe game logic'
status: todo
type: task
created_at: 2026-03-13T20:03:17Z
updated_at: 2026-03-13T20:03:17Z
parent: edu-coqp
---
Hands-on exercise: move generation, win detection, terminal-state check, displaying the board. Include starter code and expected test output.
Loading…
Cancel
Save