You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
402 B
402 B
| title | status | type | created_at | updated_at | parent |
|---|---|---|---|---|---|
| Write §8: A decoder-only LM — stacking blocks and the causal mask | todo | task | 2026-03-13T22:01:58Z | 2026-03-13T22:01:58Z | edu-u2w7 |
Explain how N transformer blocks are stacked. Causal mask ensures each position only attends to past tokens. Tie weights to the unembedding matrix (GPT-1 style). Final linear + softmax for logits.