docs(edu): write markov §3 transition probabilities and matrices [44ebe7]

Replaces the stub with full lesson content: - Defines the transition matrix P[i][j] and stochastic matrix constraints - Derives multi-step probabilities via π_k = π₀ · P^k - Works through the 2-state weather chain by hand (1- and 2-step calculations) - Bridges to §9 stationary distributions
5 months ago · 79c21df6a0
parent 36bf848b63
commit 79c21df6a0
1 changed files with 66 additions and 1 deletions
--- a/edu/markov.md
+++ b/edu/markov.md
@ -52,7 +52,72 @@ Every Markov chain consists of a finite (or countably infinite) set of **states*

 The rules governing how a Markov chain moves are captured in a **transition matrix** *P*, where *P[i][j]* is the probability of moving from state *i* to state *j* in one step. This section covers how to construct *P*, the constraints it must satisfy (rows sum to 1), and how to use matrix multiplication to compute multi-step probabilities.

-> 🚧 This section is a stub — see nbd ticket `44ebe7`
+**Defining the transition matrix.** Label the states 0, 1, …, *n*−1. The transition matrix *P* is an *n* × *n* array where entry *P[i][j]* gives the probability of moving to state *j* on the very next step, given that you are currently in state *i*. Because these are probabilities of mutually exclusive, exhaustive outcomes (from state *i* you must go *somewhere*), every row must sum to exactly 1 and every entry must lie between 0 and 1 inclusive. A matrix satisfying these two constraints is called a **stochastic matrix** (or row-stochastic matrix). Each row is itself a probability distribution over the next state.
+
+**The stochastic-matrix constraints, stated precisely.** For an *n*-state chain:
+
+```
+P[i][j] >= 0        for all i, j
+sum_j P[i][j] = 1   for every row i
+```
+
+A zero entry means the transition is impossible; a one means it is certain. Columns have no such constraint — column sums need not equal 1.
+
+**Multi-step probabilities via matrix multiplication.** Suppose you start in state *i* at time 0. After one step the probability of being in state *j* is *P[i][j]*. After *two* steps you pass through some intermediate state *k*, so:
+
+```
+P^2[i][j] = sum_k  P[i][k] * P[k][j]
+```
+
+This is exactly the (*i*, *j*) entry of *P* × *P* = *P*². In general, the probability of going from state *i* to state *j* in exactly *k* steps is the (*i*, *j*) entry of *P*^k. If you encode your current uncertainty as a **row vector** π₀ — a probability distribution over all states — then after *k* steps your updated distribution is:
+
+```
+π_k = π₀ · P^k
+```
+
+Each right-multiplication by *P* advances the clock one tick and blends probabilities according to the transition rules.
+
+**Worked example — a two-state weather chain.** Consider a model with two states: *Sunny* (state 0) and *Rainy* (state 1). From data:
+
+- If today is Sunny, tomorrow is Sunny with probability 0.8 and Rainy with probability 0.2.
+- If today is Rainy, tomorrow is Sunny with probability 0.4 and Rainy with probability 0.6.
+
+Writing this as a matrix:
+
+```
+        Sunny  Rainy
+Sunny [  0.8    0.2 ]
+Rainy [  0.4    0.6 ]
+```
+
+Row 0 sums to 1.0; row 1 sums to 1.0. All entries are non-negative. *P* is a valid stochastic matrix.
+
+*One step.* Start with certainty in Sunny: π₀ = [1, 0].
+
+```
+π₁ = π₀ · P = [1, 0] · [[0.8, 0.2], [0.4, 0.6]] = [0.8, 0.2]
+```
+
+Tomorrow: 80 % Sunny, 20 % Rainy.
+
+*Two steps.* Apply *P* again:
+
+```
+π₂ = π₁ · P = [0.8, 0.2] · [[0.8, 0.2], [0.4, 0.6]]
+             = [0.8*0.8 + 0.2*0.4,  0.8*0.2 + 0.2*0.6]
+             = [0.72, 0.28]
+```
+
+Equivalently, compute *P*² once and read off the row for state 0:
+
+```
+P^2 = [[0.8*0.8 + 0.2*0.4,  0.8*0.2 + 0.2*0.6],
+       [0.4*0.8 + 0.6*0.4,  0.4*0.2 + 0.6*0.6]]
+    = [[0.72, 0.28],
+       [0.56, 0.44]]
+```
+
+*P*²[0] = [0.72, 0.28] — matching the step-by-step result. Starting from Rainy gives *P*²[1] = [0.56, 0.44]; the two rows are already noticeably closer to each other than the original [0.8, 0.2] vs [0.4, 0.6]. As *k* grows, both rows converge toward the same limiting vector — the **stationary distribution** that Section 9 analyses in depth. The matrix-multiplication perspective makes this convergence precise and computable.

 ---