docs(edu): write §7 exercise 1 storing vectors for vector-db course [081a55]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
main
Elijah Voigt 3 months ago
parent 297f2d6d2f
commit 0462586d88

@ -349,7 +349,229 @@ You now have a working local vector database. Exercises 1 through 5 build on thi
### 7. Exercise 1 — Storing and Retrieving Vectors ### 7. Exercise 1 — Storing and Retrieving Vectors
**Goal:** Insert a small set of labelled vectors into the `items` table created in §6, then retrieve them with a `SELECT` and deserialize the stored blob back into a Rust `Vec<f32>`. 🚧 Full content tracked in [nbd:081a55]. **Goal:** Insert 6 labelled 3-dimensional vectors into the `items` table created in §6, then `SELECT` all rows and print each label alongside its deserialized `Vec<f32>`.
#### The Dataset
We use a tiny hand-crafted set of 3D vectors so the results are easy to verify by inspection. The vectors are designed so that items in the same category cluster together — animals near `[high, low, low]`, vehicles near `[low, high, low]`, and programming languages near `[low, low, high]`:
| id | label | embedding |
|---|---|---|
| 1 | "cat" | [0.9, 0.1, 0.2] |
| 2 | "dog" | [0.8, 0.2, 0.3] |
| 3 | "car" | [0.1, 0.9, 0.1] |
| 4 | "truck" | [0.2, 0.8, 0.2] |
| 5 | "python" | [0.15, 0.1, 0.95] |
| 6 | "rust" | [0.1, 0.05, 0.9] |
In later exercises you will query these vectors to see how cosine distance naturally separates the three clusters.
#### Step 1 — Formatting a Vector for INSERT
sqlite-vec's `vector(?)` SQL function accepts a **JSON array string** — for example `"[0.9,0.1,0.2]"`. You pass this string as a text parameter and `vector()` converts it into the internal `F32_BLOB` format for storage.
A small helper keeps the conversion in one place:
```rust
fn vec_to_json(v: &[f32]) -> String {
format!("[{}]", v.iter().map(|x| x.to_string()).collect::<Vec<_>>().join(","))
}
```
Calling `vec_to_json(&[0.9, 0.1, 0.2])` returns the string `"[0.9,0.1,0.2]"`, ready to bind as a SQL parameter.
#### Step 2 — Inserting Rows
Use `INSERT OR IGNORE` so the program is **idempotent** — running it twice does not produce duplicate-key errors or duplicate data:
```sql
INSERT OR IGNORE INTO items (id, label, embedding) VALUES (?, ?, vector(?))
```
Define the dataset as a `Vec<(i64, &str, Vec<f32>)>` and loop over it:
```rust
let data: Vec<(i64, &str, Vec<f32>)> = vec![
(1, "cat", vec![0.9, 0.1, 0.2]),
(2, "dog", vec![0.8, 0.2, 0.3]),
(3, "car", vec![0.1, 0.9, 0.1]),
(4, "truck", vec![0.2, 0.8, 0.2]),
(5, "python", vec![0.15, 0.1, 0.95]),
(6, "rust", vec![0.1, 0.05, 0.9]),
];
for (id, label, embedding) in &data {
conn.execute(
"INSERT OR IGNORE INTO items (id, label, embedding) VALUES (?, ?, vector(?))",
libsql::params![*id, *label, vec_to_json(embedding)],
).await?;
}
println!("Inserted {} rows.", data.len());
```
#### Step 3 — Selecting and Deserializing
Query all rows back out. The `vector_extract` function converts the stored `F32_BLOB` back into a JSON array string that you can parse in Rust:
```sql
SELECT id, label, vector_extract(embedding) FROM items ORDER BY id
```
Add `serde_json` to your `Cargo.toml` dependencies for JSON parsing:
```toml
serde_json = "1"
```
Then fetch and deserialize:
```rust
let mut rows = conn
.query("SELECT id, label, vector_extract(embedding) FROM items ORDER BY id", ())
.await?;
while let Some(row) = rows.next().await? {
let id: i64 = row.get(0)?;
let label: String = row.get(1)?;
let json_str: String = row.get(2)?;
let embedding: Vec<f32> = serde_json::from_str(&json_str)?;
println!("{id:<3}{label:<10}{embedding:?}");
}
```
#### Step 4 — Expected Output
Running `cargo run` should print:
```
SQLite version: 3.46.0
Database ready.
Inserted 6 rows.
1 cat [0.9, 0.1, 0.2]
2 dog [0.8, 0.2, 0.3]
3 car [0.1, 0.9, 0.1]
4 truck [0.2, 0.8, 0.2]
5 python [0.15, 0.1, 0.95]
6 rust [0.1, 0.05, 0.9]
```
Every vector round-trips through the database intact: Rust `Vec<f32>` → JSON string → `vector()``F32_BLOB` storage → `vector_extract()` → JSON string → `serde_json` → Rust `Vec<f32>`.
#### Cargo.toml Additions
Your full `[dependencies]` section should now be:
```toml
[dependencies]
libsql = "0.9"
tokio = { version = "1", features = ["full"] }
serde_json = "1"
```
#### Reference Solution
<details><summary>Show full solution</summary>
**`Cargo.toml`** (dependencies only):
```toml
[dependencies]
libsql = "0.9"
tokio = { version = "1", features = ["full"] }
serde_json = "1"
```
**`src/main.rs`**:
```rust
use libsql::{Builder, Database};
/// Convert a float slice to a JSON array string for sqlite-vec's `vector()` function.
fn vec_to_json(v: &[f32]) -> String {
format!(
"[{}]",
v.iter()
.map(|x| x.to_string())
.collect::<Vec<_>>()
.join(",")
)
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// --- Open database ---
let db: Database = Builder::new_local("vectors.db").build().await?;
let conn = db.connect()?;
// Verify connection
let mut rows = conn.query("SELECT sqlite_version()", ()).await?;
if let Some(row) = rows.next().await? {
let version: String = row.get(0)?;
println!("SQLite version: {version}");
}
// --- Create table (from §6) ---
conn.execute(
"CREATE TABLE IF NOT EXISTS items (
id INTEGER PRIMARY KEY,
label TEXT NOT NULL,
embedding F32_BLOB(3) NOT NULL
)",
(),
)
.await?;
// --- Create HNSW index (from §6) ---
conn.execute(
"CREATE INDEX IF NOT EXISTS items_vec_idx
ON items (embedding)
USING libsql_vector_idx(embedding)",
(),
)
.await?;
println!("Database ready.");
// --- Insert 6 labelled vectors ---
let data: Vec<(i64, &str, Vec<f32>)> = vec![
(1, "cat", vec![0.9, 0.1, 0.2]),
(2, "dog", vec![0.8, 0.2, 0.3]),
(3, "car", vec![0.1, 0.9, 0.1]),
(4, "truck", vec![0.2, 0.8, 0.2]),
(5, "python", vec![0.15, 0.1, 0.95]),
(6, "rust", vec![0.1, 0.05, 0.9]),
];
for (id, label, embedding) in &data {
conn.execute(
"INSERT OR IGNORE INTO items (id, label, embedding) VALUES (?, ?, vector(?))",
libsql::params![*id, *label, vec_to_json(embedding)],
)
.await?;
}
println!("Inserted {} rows.", data.len());
// --- Select and deserialize ---
let mut rows = conn
.query(
"SELECT id, label, vector_extract(embedding) FROM items ORDER BY id",
(),
)
.await?;
while let Some(row) = rows.next().await? {
let id: i64 = row.get(0)?;
let label: String = row.get(1)?;
let json_str: String = row.get(2)?;
let embedding: Vec<f32> = serde_json::from_str(&json_str)?;
println!("{id:<3}{label:<10}{embedding:?}");
}
Ok(())
}
```
</details>
--- ---

Loading…
Cancel
Save