--- # edu-twtl title: §3 Vector Similarity status: completed type: task priority: normal created_at: 2026-03-10T23:30:01Z updated_at: 2026-03-10T23:30:01Z --- ## §3 Vector Similarity — Stub to fill File: `edu/src/vector-db.md`, section `### 3. Vector Similarity` Replace this stub line with full content: > Once you have two vectors, how do you measure how alike they are? [...] 🚧 Full content tracked in [nbd:99e1d9]. This is a **reading lesson with inline math** — no Rust code. Target 400–600 words. Bold lead phrases, inline math using Unicode (not LaTeX). Include a small worked example with concrete 3D numbers. ## Learning objectives - Know the three main similarity/distance functions: cosine similarity, dot product, Euclidean distance - Understand the formula and geometric meaning of each - Know the relationship between cosine similarity and cosine distance (what `vector_distance_cos` actually returns) - Know when each metric is appropriate - Understand why normalised vectors simplify the choice ## Content to write **Cosine similarity.** Formula: cos(θ) = (a · b) / (‖a‖ · ‖b‖). Range −1 to 1 (1 = same direction, 0 = orthogonal, −1 = opposite). Measures the angle between vectors, ignoring magnitude. Ideal for text embeddings: a short and long document on the same topic produce vectors that differ in magnitude but not direction. **Cosine distance.** 1 − cosine_similarity. Range 0 to 2. This is what sqlite-vec's `vector_distance_cos` returns (0 = identical, 2 = fully opposite). Clarify the naming: the function name says "cos" but returns a *distance*, not a similarity — smaller is more similar. **Dot product.** Formula: a · b = Σᵢ aᵢbᵢ. For unit-normalised vectors, dot product equals cosine similarity (since ‖a‖ = ‖b‖ = 1 cancels out). For unnormalised vectors, it conflates magnitude and angle. Some models are trained specifically for maximum inner product search (MIPS) — their documentation will say so. **Euclidean (L2) distance.** Formula: ‖a − b‖ = √(Σᵢ (aᵢ − bᵢ)²). Range 0 to ∞. Sensitive to vector magnitude. Appropriate for low-dimensional geometric/tabular data where absolute coordinate values carry meaning. **When to use each.** Text and sentence embeddings: cosine (or dot product if model outputs unit vectors, which many do). Follow the model card's recommendation when specified. Low-dimensional geometric features: L2. **Worked example.** Use vectors a = [1, 0, 1] and b = [1, 1, 0]. Compute all three by hand and show the arithmetic step by step. Cosine similarity = 0.5, L2 distance ≈ 1.414, dot product = 1. This concretises the formulas before the reader sees them in SQL queries.