|
|
+++
|
|
|
title = "§3 Vector Similarity"
|
|
|
priority = 5
|
|
|
status = "done"
|
|
|
ticket_type = "task"
|
|
|
dependencies = []
|
|
|
+++
|
|
|
## §3 Vector Similarity — Stub to fill
|
|
|
|
|
|
File: `edu/src/vector-db.md`, section `### 3. Vector Similarity`
|
|
|
|
|
|
Replace this stub line with full content:
|
|
|
> Once you have two vectors, how do you measure how alike they are? [...] 🚧 Full content tracked in [nbd:99e1d9].
|
|
|
|
|
|
This is a **reading lesson with inline math** — no Rust code. Target 400–600 words. Bold lead phrases, inline math using Unicode (not LaTeX). Include a small worked example with concrete 3D numbers.
|
|
|
|
|
|
## Learning objectives
|
|
|
|
|
|
- Know the three main similarity/distance functions: cosine similarity, dot product, Euclidean distance
|
|
|
- Understand the formula and geometric meaning of each
|
|
|
- Know the relationship between cosine similarity and cosine distance (what `vector_distance_cos` actually returns)
|
|
|
- Know when each metric is appropriate
|
|
|
- Understand why normalised vectors simplify the choice
|
|
|
|
|
|
## Content to write
|
|
|
|
|
|
**Cosine similarity.** Formula: cos(θ) = (a · b) / (‖a‖ · ‖b‖). Range −1 to 1 (1 = same direction, 0 = orthogonal, −1 = opposite). Measures the angle between vectors, ignoring magnitude. Ideal for text embeddings: a short and long document on the same topic produce vectors that differ in magnitude but not direction.
|
|
|
|
|
|
**Cosine distance.** 1 − cosine_similarity. Range 0 to 2. This is what sqlite-vec's `vector_distance_cos` returns (0 = identical, 2 = fully opposite). Clarify the naming: the function name says "cos" but returns a *distance*, not a similarity — smaller is more similar.
|
|
|
|
|
|
**Dot product.** Formula: a · b = Σᵢ aᵢbᵢ. For unit-normalised vectors, dot product equals cosine similarity (since ‖a‖ = ‖b‖ = 1 cancels out). For unnormalised vectors, it conflates magnitude and angle. Some models are trained specifically for maximum inner product search (MIPS) — their documentation will say so.
|
|
|
|
|
|
**Euclidean (L2) distance.** Formula: ‖a − b‖ = √(Σᵢ (aᵢ − bᵢ)²). Range 0 to ∞. Sensitive to vector magnitude. Appropriate for low-dimensional geometric/tabular data where absolute coordinate values carry meaning.
|
|
|
|
|
|
**When to use each.** Text and sentence embeddings: cosine (or dot product if model outputs unit vectors, which many do). Follow the model card's recommendation when specified. Low-dimensional geometric features: L2.
|
|
|
|
|
|
**Worked example.** Use vectors a = [1, 0, 1] and b = [1, 1, 0]. Compute all three by hand and show the arithmetic step by step. Cosine similarity = 0.5, L2 distance ≈ 1.414, dot product = 1. This concretises the formulas before the reader sees them in SQL queries. |