Write Section 1 of edu/markov.md: 'What Is a Markov Chain?'\n\nLearning objectives:\n- Define the Markov property (memorylessness)\n- Give 3–4 concrete real-world examples (weather, board games, web surfing, genetics)\n- Explain why the Markov property is a useful modelling assumption\n- Introduce the notation P(Xₙ₊₁ = s | X₀, …, Xₙ) = P(Xₙ₊₁ = s | Xₙ)\n\nContent to produce:\n- 3–5 paragraphs of prose\n- At least one illustrative example worked through informally\n- No code in this section\n\nTarget: replace the stub in edu/markov.md §1
@ -36,7 +36,31 @@ This document is a self-guided course on Markov chains. It is organized into fou
A Markov chain is a mathematical model describing a sequence of events where the probability of each event depends only on the state reached in the previous event — not on the full history. This "memoryless" property is called the **Markov property**. You will learn where Markov chains appear in the real world and develop intuition for why the memoryless property is both a useful simplification and a meaningful assumption.
> 🚧 This section is a stub — see nbd ticket `fbf323`
**The core idea: only now matters.** Imagine you are tracking today's weather. Intuitively, you might think yesterday's weather, the week before, and the entire season all influence what tomorrow will bring. A Markov chain says: forget all of that. Given that you know *today's* weather, knowledge of every earlier day adds nothing to your prediction of tomorrow. The present state captures everything relevant from the past. This is the Markov property — colloquially "memorylessness" — and it is a surprisingly powerful modelling assumption.
Formally, let *X₀, X₁, X₂, …* be a sequence of random variables each taking values in some set of **states**. The sequence is a Markov chain if, for every time step *n* and every state *s*:
```
P(Xₙ₊₁ = s | X₀, X₁, …, Xₙ) = P(Xₙ₊₁ = s | Xₙ)
```
The left-hand side conditions on the entire history up to step *n*. The right-hand side conditions on *only* the current state *Xₙ*. The equation says these two quantities are always equal: no matter how you got to the current state, your distribution over the next state is the same.
**A worked example — the weather model.** Suppose a city has two kinds of days: Sunny and Rainy. You observe that:
- After a Sunny day, there is an 80 % chance of another Sunny day and a 20 % chance of Rain.
- After a Rainy day, there is a 40 % chance of Sun and a 60 % chance of Rain.
Under the Markov assumption, these two rules are *all* you need. If today is Sunny, the chance of rain tomorrow is 20 % — regardless of whether the preceding week was a drought or a monsoon. The model is simple because it deliberately ignores deep history, and it is useful precisely because that history often adds little predictive power once you know the current state.
**Where Markov chains appear in the real world.** Once you recognise the Markov property you will spot it everywhere:
- *Board games.* In Snakes and Ladders the only thing that matters is which square you are on right now. The sequence of rolls that brought you there is irrelevant — your future depends only on your current position.
- *Web surfing (PageRank).* Google's original PageRank algorithm modelled a hypothetical random web surfer who, at each page, clicks a link chosen uniformly at random. Where the surfer goes next depends only on the current page, not the path taken to reach it. The long-run fraction of time spent at each page is the page's rank.
- *Genetics.* In simple population-genetics models, the number of copies of a gene in the current generation determines the distribution of copies in the next generation. The frequencies in prior generations, once summarised in the current count, carry no additional information.
- *Text generation.* Given the last word (or last few words) of a sentence, the probability of the next word can be estimated from a corpus. The full sentence history is ignored — only the recent context matters. Sections 6–8 of this course build exactly this kind of model in Rust.
**Why the Markov property is a useful assumption.** Real systems are almost never perfectly memoryless — yesterday's weather genuinely does carry a whisper of information beyond today's. So why use Markov models? Because they strike a remarkable balance between tractability and expressiveness. A model that conditions on the entire history is usually intractable; one that ignores history entirely is too crude. The Markov property is the sweet spot: it allows rigorous mathematical analysis (stationary distributions, convergence theorems, efficient simulation) while still capturing the essential dynamics of many real processes. When the assumption is too crude, you can extend it by enlarging the state space — for instance, tracking the last *two* days of weather instead of one — and the Markov property holds again at that richer level of description.