--- # edu-vqxk title: 'Write §8: A decoder-only LM — stacking blocks and the causal mask' status: completed type: task priority: normal created_at: 2026-03-13T22:01:58Z updated_at: 2026-03-16T02:30:26Z parent: edu-u2w7 --- Explain how N transformer blocks are stacked. Causal mask ensures each position only attends to past tokens. Tie weights to the unembedding matrix (GPT-1 style). Final linear + softmax for logits.