You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
vibed/edu/.beans/edu-vqxk--write-8-a-decoder...

424 B

title status type priority created_at updated_at parent
Write §8: A decoder-only LM — stacking blocks and the causal mask completed task normal 2026-03-13T22:01:58Z 2026-03-16T02:30:26Z edu-u2w7

Explain how N transformer blocks are stacked. Causal mask ensures each position only attends to past tokens. Tie weights to the unembedding matrix (GPT-1 style). Final linear + softmax for logits.