You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
78 lines
2.7 KiB
Markdown
78 lines
2.7 KiB
Markdown
---
|
|
# edu-pdeo
|
|
title: '§7 Exercise 1: Storing and Retrieving Vectors'
|
|
status: completed
|
|
type: task
|
|
priority: normal
|
|
created_at: 2026-03-10T23:29:59Z
|
|
updated_at: 2026-03-10T23:29:59Z
|
|
---
|
|
|
|
## §7 Exercise 1 — Storing and Retrieving Vectors — Stub to fill
|
|
|
|
File: `edu/src/vector-db.md`, section `### 7. Exercise 1 — Storing and Retrieving Vectors`
|
|
|
|
Replace this stub line with the full exercise:
|
|
> **Goal:** Insert a small set of labelled vectors [...] 🚧 Full content tracked in [nbd:081a55].
|
|
|
|
Follow the exercise format from `edu/src/markov.md`: Goal, Setup, Starter Code skeleton, numbered Steps, Reference Solution in `<details><summary>Show full solution</summary>`.
|
|
|
|
## Prerequisites (established in §6)
|
|
|
|
Reader has the `vec-demo` project with `libsql = "0.9"` and `tokio`. The `main` function opens a local connection via `Builder::new_local("vectors.db").build().await?` and has already created the `items` table (`id INTEGER PRIMARY KEY, label TEXT NOT NULL, embedding F32_BLOB(3) NOT NULL`) and the HNSW index.
|
|
|
|
## Goal
|
|
|
|
Insert 6 labelled 3-dimensional vectors, then SELECT all rows and print each label alongside its deserialized `Vec<f32>`.
|
|
|
|
## Vectors to use
|
|
|
|
| id | label | embedding |
|
|
|---|---|---|
|
|
| 1 | "cat" | [0.9, 0.1, 0.2] |
|
|
| 2 | "dog" | [0.8, 0.2, 0.3] |
|
|
| 3 | "car" | [0.1, 0.9, 0.1] |
|
|
| 4 | "truck" | [0.2, 0.8, 0.2] |
|
|
| 5 | "python" | [0.15, 0.1, 0.95] |
|
|
| 6 | "rust" | [0.1, 0.05, 0.9] |
|
|
|
|
Hand-crafted so animals cluster near [high, low, low], vehicles near [low, high, low], and programming languages near [low, low, high]. The §8 KNN exercise uses these clusters to verify correct nearest-neighbour results.
|
|
|
|
## Steps to cover
|
|
|
|
**Step 1 — Formatting a vector for INSERT.** Explain that `vector(?)` in SQL accepts a JSON array string. Show how to format a `Vec<f32>` in Rust:
|
|
|
|
```rust
|
|
fn vec_to_json(v: &[f32]) -> String {
|
|
format!("[{}]", v.iter().map(|x| x.to_string()).collect::<Vec<_>>().join(","))
|
|
}
|
|
```
|
|
|
|
**Step 2 — Inserting rows.** Use `INSERT OR IGNORE` so re-running the program is idempotent:
|
|
|
|
```sql
|
|
INSERT OR IGNORE INTO items (id, label, embedding) VALUES (?, ?, vector(?))
|
|
```
|
|
|
|
Loop over a `Vec<(i64, &str, Vec<f32>)>` and call `conn.execute` for each row, passing id, label, and the JSON string as parameters.
|
|
|
|
**Step 3 — Selecting and deserialising.** Query all rows:
|
|
|
|
```sql
|
|
SELECT id, label, vector_extract(embedding) FROM items ORDER BY id
|
|
```
|
|
|
|
`vector_extract` returns a JSON array string (e.g. `"[0.9,0.1,0.2]"`). Add `serde_json = "1"` to Cargo.toml and parse it: `serde_json::from_str::<Vec<f32>>(&json_str)?`.
|
|
|
|
**Step 4 — Print results.** Format output as:
|
|
|
|
```
|
|
1 cat [0.9, 0.1, 0.2]
|
|
2 dog [0.8, 0.2, 0.3]
|
|
...
|
|
```
|
|
|
|
## Cargo.toml additions
|
|
|
|
Add `serde_json = "1"` for JSON array parsing.
|