Replaces the stub with full lesson content:
- Defines the transition matrix P[i][j] and stochastic matrix constraints
- Derives multi-step probabilities via π_k = π₀ · P^k
- Works through the 2-state weather chain by hand (1- and 2-step calculations)
- Bridges to §9 stationary distributions
@ -52,7 +52,72 @@ Every Markov chain consists of a finite (or countably infinite) set of **states*
The rules governing how a Markov chain moves are captured in a **transition matrix***P*, where *P[i][j]* is the probability of moving from state *i* to state *j* in one step. This section covers how to construct *P*, the constraints it must satisfy (rows sum to 1), and how to use matrix multiplication to compute multi-step probabilities.
> 🚧 This section is a stub — see nbd ticket `44ebe7`
**Defining the transition matrix.** Label the states 0, 1, …, *n*−1. The transition matrix *P* is an *n*×*n* array where entry *P[i][j]* gives the probability of moving to state *j* on the very next step, given that you are currently in state *i*. Because these are probabilities of mutually exclusive, exhaustive outcomes (from state *i* you must go *somewhere*), every row must sum to exactly 1 and every entry must lie between 0 and 1 inclusive. A matrix satisfying these two constraints is called a **stochastic matrix** (or row-stochastic matrix). Each row is itself a probability distribution over the next state.
**The stochastic-matrix constraints, stated precisely.** For an *n*-state chain:
```
P[i][j] >= 0 for all i, j
sum_j P[i][j] = 1 for every row i
```
A zero entry means the transition is impossible; a one means it is certain. Columns have no such constraint — column sums need not equal 1.
**Multi-step probabilities via matrix multiplication.** Suppose you start in state *i* at time 0. After one step the probability of being in state *j* is *P[i][j]*. After *two* steps you pass through some intermediate state *k*, so:
```
P^2[i][j] = sum_k P[i][k] * P[k][j]
```
This is exactly the (*i*, *j*) entry of *P*×*P* = *P*². In general, the probability of going from state *i* to state *j* in exactly *k* steps is the (*i*, *j*) entry of *P*^k. If you encode your current uncertainty as a **row vector** π₀ — a probability distribution over all states — then after *k* steps your updated distribution is:
```
π_k = π₀ · P^k
```
Each right-multiplication by *P* advances the clock one tick and blends probabilities according to the transition rules.
**Worked example — a two-state weather chain.** Consider a model with two states: *Sunny* (state 0) and *Rainy* (state 1). From data:
- If today is Sunny, tomorrow is Sunny with probability 0.8 and Rainy with probability 0.2.
- If today is Rainy, tomorrow is Sunny with probability 0.4 and Rainy with probability 0.6.
Writing this as a matrix:
```
Sunny Rainy
Sunny [ 0.8 0.2 ]
Rainy [ 0.4 0.6 ]
```
Row 0 sums to 1.0; row 1 sums to 1.0. All entries are non-negative. *P* is a valid stochastic matrix.
*One step.* Start with certainty in Sunny: π₀ = [1, 0].
Equivalently, compute *P*² once and read off the row for state 0:
```
P^2 = [[0.8*0.8 + 0.2*0.4, 0.8*0.2 + 0.2*0.6],
[0.4*0.8 + 0.6*0.4, 0.4*0.2 + 0.6*0.6]]
= [[0.72, 0.28],
[0.56, 0.44]]
```
*P*²[0] = [0.72, 0.28] — matching the step-by-step result. Starting from Rainy gives *P*²[1] = [0.56, 0.44]; the two rows are already noticeably closer to each other than the original [0.8, 0.2] vs [0.4, 0.6]. As *k* grows, both rows converge toward the same limiting vector — the **stationary distribution** that Section 9 analyses in depth. The matrix-multiplication perspective makes this convergence precise and computable.