You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
vibed/edu/.beans/archive/edu-paqf--8-exercise-2-k-ne...

2.8 KiB

title status type priority created_at updated_at
§8 Exercise 2: K-Nearest Neighbor Search completed task normal 2026-03-10T23:30:00Z 2026-03-10T23:30:00Z

§8 Exercise 2 — K-Nearest Neighbor Search — Stub to fill

File: edu/src/vector-db.md, section ### 8. Exercise 2 — K-Nearest Neighbor Search

Replace this stub line with the full exercise:

Goal: Use vector_top_k and vector_distance_cos [...] 🚧 Full content tracked in [nbd:5674ce].

Follow the exercise format from edu/src/markov.md.

Prerequisites (established in §7)

Reader has the vec-demo project and has 6 rows in the items table: cat, dog, car, truck, python, rust with 3-dimensional embeddings.

Goal

Given a query vector, use vector_top_k to find the 3 most similar items, join with the items table to retrieve labels and exact cosine distances, and display the results ranked by distance.

Steps to cover

Step 1 — Introduce vector_top_k. Explain that this is a table-valued function (TVF) that returns row IDs of approximate nearest neighbours without a full table scan. Syntax:

SELECT i.rowid FROM vector_top_k('items', vector(?), ?) i

The first argument is the table name (string literal), second is the query vector, third is k. Returns rowid values only — join to get other columns.

Step 2 — Full KNN query. Show the complete query combining the TVF with a JOIN and exact distance computation:

SELECT items.id, items.label, vector_distance_cos(items.embedding, vector(?)) AS dist
FROM vector_top_k('items', vector(?), ?) AS knn
JOIN items ON items.rowid = knn.rowid
ORDER BY dist ASC

Note: the query vector must be passed twice — once for vector_top_k (index traversal) and once for vector_distance_cos (exact distance). Both are the same JSON array string.

Step 3 — Run three queries and print results.

Query vectors to use:

  • [0.85, 0.15, 0.25] → should be nearest cat and dog (animal cluster)
  • [0.15, 0.85, 0.15] → should be nearest car and truck (vehicle cluster)
  • [0.1, 0.05, 0.92] → should be nearest rust and python (language cluster)

Expected output format:

Query: [0.85, 0.15, 0.25]
  1. cat    dist=0.0023
  2. dog    dist=0.0089
  3. python dist=0.1834

Step 4 — Explain ANN vs. exact search. For 6 rows, vector_top_k falls back to exact search anyway — the HNSW index has too few nodes to offer a shortcut. Note that at scale (millions of rows), it returns approximate results; some true nearest neighbours may be missed. vector_distance_cos always gives the exact distance for any specific pair.

Reference solution

Full main.rs inside <details><summary>Show full solution</summary>. The solution should re-run setup from §7 (create table, insert data) then run the three KNN queries.