--- # edu-s6mr title: 'Write §5: Self-attention — queries, keys, and values' status: completed type: task priority: normal created_at: 2026-03-13T22:01:53Z updated_at: 2026-03-16T02:30:26Z parent: edu-u2w7 --- Derive the scaled dot-product attention formula from first principles. Single-head attention only (GPT-1 simplicity). Causal masking explained here.