Compare commits
4 Commits
b2e46a00c1
...
814d1f50cb
| Author | SHA1 | Date |
|---|---|---|
|
|
814d1f50cb | 3 months ago |
|
|
0690327296 | 3 months ago |
|
|
bc4cc23c42 | 3 months ago |
|
|
fb9fd518c3 | 3 months ago |
@ -1,11 +1,14 @@
|
||||
---
|
||||
# edu-3yw9
|
||||
title: 'Write §2: Monte Carlo Tree Search — algorithm explained'
|
||||
status: todo
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T22:48:46Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Step-by-step walkthrough of MCTS: selection (UCB1), expansion, simulation/rollout, backpropagation. Include a worked example on a small game tree.
|
||||
|
||||
## Summary of Changes\n\nWrote full content for §2 covering MCTS algorithm: the four phases (selection, expansion, simulation, backpropagation), UCB1/UCT formula, worked ASCII tree example, and strengths/limitations.
|
||||
@ -0,0 +1,14 @@
|
||||
---
|
||||
# edu-453h
|
||||
title: 'Write §13: The full AlphaGo Zero training loop'
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-16T01:35:06Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Reading lesson: generate → train → evaluate → promote. Discuss the ELO-based model selection step and why it matters.
|
||||
|
||||
## Summary of Changes\n\nWrote full content for §13 covering the complete AlphaGo Zero training loop: self-play with temperature, training, evaluation gate, and the iterative improvement cycle with complete Rust implementation.
|
||||
@ -1,11 +1,16 @@
|
||||
---
|
||||
# edu-4v13
|
||||
title: 'Write §8: Exercise 2 — play Tic-Tac-Toe with pure MCTS'
|
||||
status: todo
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T23:04:14Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Exercise: wire MCTS to the game logic from Exercise 1 and run a match. Show sample output, discuss iteration count vs strength.
|
||||
|
||||
## Summary of Changes
|
||||
|
||||
Wrote full content for §8: Exercise 2 with MCTS vs MCTS tournament (100 games, iteration count experiments), human vs MCTS CLI game, experimentation prompts, and readiness checklist.
|
||||
@ -1,11 +1,14 @@
|
||||
---
|
||||
# edu-5go8
|
||||
title: 'Write §3: Why self-play? The AlphaGo Zero insight'
|
||||
status: todo
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T22:51:31Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Explain the key insight: a capable engine can be its own teacher. Historical context (AlphaGo vs AlphaGo Zero) and why the approach generalises.
|
||||
|
||||
## Summary of Changes\n\nWrote full content for §3 covering the evolution from Deep Blue to AlphaGo Zero, the self-play insight, the virtuous training cycle, and how we'll adapt the approach for Tic-Tac-Toe.
|
||||
@ -0,0 +1,16 @@
|
||||
---
|
||||
# edu-7lu6
|
||||
title: 'Write §12: Exercise 4 — replace rollout with the value network'
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-16T01:31:54Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Exercise: substitute random rollout in MCTS with a neural-network value estimate; compare strength before and after.
|
||||
|
||||
## Summary of Changes
|
||||
|
||||
Wrote full content for §12: Exercise 4 covering PUCT formula, replacing random rollouts with value network evaluation, adding policy priors to MCTS nodes, modified MCTS code, and pure vs network-guided MCTS comparison.
|
||||
@ -0,0 +1,14 @@
|
||||
---
|
||||
# edu-brtk
|
||||
title: 'Write §14: Exercise 5 — 1000 self-play games; observe improvement'
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-16T01:40:32Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Capstone exercise: run the full self-play loop for 1000 games; plot win-rate over iterations; discuss what worked and what didn't.
|
||||
|
||||
## Summary of Changes\n\nWrote full content for §14: Exercise 5 — the capstone exercise running 1000 self-play games, tracking diagnostics, validating convergence to perfect play, and providing next steps for further learning.
|
||||
@ -0,0 +1,16 @@
|
||||
---
|
||||
# edu-e39n
|
||||
title: 'Write §5: Representing game state in Rust'
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T22:56:30Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Reading lesson: design of Board, Player, Move types. Discuss representation trade-offs (bitboard vs array). Show the full type definitions.
|
||||
|
||||
## Summary of Changes
|
||||
|
||||
Wrote full content for §5 covering Rust game state representation: Player enum, GameState struct, board indexing, Display impl, move generation, winner checking, and immutable apply_move design.
|
||||
@ -0,0 +1,16 @@
|
||||
---
|
||||
# edu-iv0k
|
||||
title: 'Write §9: Neural network architecture overview'
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T23:07:35Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Conceptual lesson: shared convolutional trunk, policy head (move probabilities), value head (win probability). Diagrams encouraged. No code yet.
|
||||
|
||||
## Summary of Changes
|
||||
|
||||
Wrote full content for §9 covering neural network fundamentals from scratch: neurons/weights/biases, layers, forward pass, training intuition, and the dual-headed policy+value architecture for game AI with concrete TTT dimensions.
|
||||
@ -0,0 +1,16 @@
|
||||
---
|
||||
# edu-k3tq
|
||||
title: 'Write §4: Choosing a simple game — Tic-Tac-Toe'
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T22:53:54Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Explain why Tic-Tac-Toe is ideal: small state space, deterministic, zero-sum, easily verifiable. Foreshadow how the same approach scales to Go/Chess.
|
||||
|
||||
## Summary of Changes
|
||||
|
||||
Wrote full content for §4 covering why Tic-Tac-Toe is the ideal learning vehicle: suitable game properties, game tree size, known optimal solution as validation target, and comparison with alternatives.
|
||||
@ -0,0 +1,14 @@
|
||||
---
|
||||
# edu-lqky
|
||||
title: 'Write §11: Exercise 3 — train the network on MCTS data'
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-16T00:32:55Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Exercise: generate training examples (state, policy vector, value) from pure MCTS self-play; run one training epoch; log loss.
|
||||
|
||||
## Summary of Changes\n\nWrote full content for §11: Exercise 3 covering MCTS data generation pipeline, TrainingExample struct, mini-batch SGD training loop, loss tracking, network evaluation, and experimentation prompts.
|
||||
@ -1,11 +1,14 @@
|
||||
---
|
||||
# edu-of9y
|
||||
title: 'Write §7: Implementing MCTS in Rust'
|
||||
status: todo
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T23:01:59Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Walk through selection (UCB1 formula), expansion, simulation (random rollout), backpropagation. Show Rust code for the node structure and the four phases.
|
||||
|
||||
## Summary of Changes\n\nWrote full content for §7 covering MCTS implementation in Rust: arena-allocated node structure, all four phases implemented and explained, UCT calculation, main loop, and move selection.
|
||||
@ -0,0 +1,14 @@
|
||||
---
|
||||
# edu-pvou
|
||||
title: 'Write §10: Integrating a neural network crate'
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T23:11:44Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Reading lesson: evaluate tch-rs vs candle for this use case; show how to define and initialise the network; basic forward-pass usage.
|
||||
|
||||
## Summary of Changes\n\nWrote full content for §10 covering from-scratch neural network implementation in Rust: Layer struct, forward pass with ReLU/softmax/tanh, illegal move masking, Xavier initialization, backpropagation, SGD training, and complete compilable code.
|
||||
@ -0,0 +1,14 @@
|
||||
---
|
||||
# edu-wobk
|
||||
title: 'Write §1: What is reinforcement learning?'
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T22:46:18Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Cover: state, action, reward, policy, value function. Intuitive explanation with a game-playing example. No code.
|
||||
|
||||
## Summary of Changes\n\nWrote full content for §1 covering RL fundamentals: agent/environment loop, state/action/reward/policy/value concepts, contrast with supervised/unsupervised learning, and why RL fits games.
|
||||
@ -1,11 +1,14 @@
|
||||
---
|
||||
# edu-ymux
|
||||
title: 'Write §6: Exercise 1 — implement Tic-Tac-Toe game logic'
|
||||
status: todo
|
||||
status: completed
|
||||
type: task
|
||||
priority: normal
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T22:58:06Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Hands-on exercise: move generation, win detection, terminal-state check, displaying the board. Include starter code and expected test output.
|
||||
|
||||
## Summary of Changes\n\nWrote full content for §6: Exercise 1 with project setup instructions, 8 unit test specifications with collapsible solutions, a random-game main function, and a readiness checklist.
|
||||
@ -1,11 +0,0 @@
|
||||
---
|
||||
# edu-453h
|
||||
title: 'Write §13: The full AlphaGo Zero training loop'
|
||||
status: todo
|
||||
type: task
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Reading lesson: generate → train → evaluate → promote. Discuss the ELO-based model selection step and why it matters.
|
||||
@ -1,11 +0,0 @@
|
||||
---
|
||||
# edu-7lu6
|
||||
title: 'Write §12: Exercise 4 — replace rollout with the value network'
|
||||
status: todo
|
||||
type: task
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Exercise: substitute random rollout in MCTS with a neural-network value estimate; compare strength before and after.
|
||||
@ -1,11 +0,0 @@
|
||||
---
|
||||
# edu-brtk
|
||||
title: 'Write §14: Exercise 5 — 1000 self-play games; observe improvement'
|
||||
status: todo
|
||||
type: task
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Capstone exercise: run the full self-play loop for 1000 games; plot win-rate over iterations; discuss what worked and what didn't.
|
||||
@ -1,11 +0,0 @@
|
||||
---
|
||||
# edu-e39n
|
||||
title: 'Write §5: Representing game state in Rust'
|
||||
status: todo
|
||||
type: task
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Reading lesson: design of Board, Player, Move types. Discuss representation trade-offs (bitboard vs array). Show the full type definitions.
|
||||
@ -1,11 +0,0 @@
|
||||
---
|
||||
# edu-iv0k
|
||||
title: 'Write §9: Neural network architecture overview'
|
||||
status: todo
|
||||
type: task
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Conceptual lesson: shared convolutional trunk, policy head (move probabilities), value head (win probability). Diagrams encouraged. No code yet.
|
||||
@ -1,11 +0,0 @@
|
||||
---
|
||||
# edu-k3tq
|
||||
title: 'Write §4: Choosing a simple game — Tic-Tac-Toe'
|
||||
status: todo
|
||||
type: task
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Explain why Tic-Tac-Toe is ideal: small state space, deterministic, zero-sum, easily verifiable. Foreshadow how the same approach scales to Go/Chess.
|
||||
@ -1,11 +0,0 @@
|
||||
---
|
||||
# edu-lqky
|
||||
title: 'Write §11: Exercise 3 — train the network on MCTS data'
|
||||
status: todo
|
||||
type: task
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Exercise: generate training examples (state, policy vector, value) from pure MCTS self-play; run one training epoch; log loss.
|
||||
@ -1,11 +0,0 @@
|
||||
---
|
||||
# edu-pvou
|
||||
title: 'Write §10: Integrating a neural network crate'
|
||||
status: todo
|
||||
type: task
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Reading lesson: evaluate tch-rs vs candle for this use case; show how to define and initialise the network; basic forward-pass usage.
|
||||
@ -1,11 +0,0 @@
|
||||
---
|
||||
# edu-wobk
|
||||
title: 'Write §1: What is reinforcement learning?'
|
||||
status: todo
|
||||
type: task
|
||||
created_at: 2026-03-13T20:03:17Z
|
||||
updated_at: 2026-03-13T20:03:17Z
|
||||
parent: edu-coqp
|
||||
---
|
||||
|
||||
Cover: state, action, reward, policy, value function. Intuitive explanation with a game-playing example. No code.
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue