+++
title = "§12 Exercise 5: Retrieval-Augmented Generation"
priority = 5
status = "todo"
ticket_type = "task"
dependencies = []
+++
## §12 Exercise 5 — Retrieval-Augmented Generation — Stub to fill

File: `edu/src/vector-db.md`, section `### 12. Exercise 5 — Retrieval-Augmented Generation`

Replace this stub line with the full exercise:
> **Goal:** Combine vector search with a language model to build a retrieval-augmented generation (RAG) pipeline [...] 🚧 Full content tracked in [nbd:5ed295].

Follow the exercise format from `edu/src/markov.md`. This is the capstone exercise — it combines Turso vector search (§7–§8), fastembed (§9), and semantic search (§10), adding an LLM API call to ground answers in retrieved context.

## Goal

1. Store the 15-passage corpus from §10 in Turso
2. Accept a natural-language question
3. Retrieve the top-3 most relevant passages using vector KNN
4. Inject the passages into a prompt as context
5. Send the prompt to an OpenAI-compatible LLM API
6. Print the grounded answer

## Setup

```toml
[dependencies]
libsql = "0.9"
fastembed = "4"
reqwest = { version = "0.12", features = ["json"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
tokio = { version = "1", features = ["full"] }
```

API key from environment: `std::env::var("OPENAI_API_KEY")`. Tell the reader they can use any OpenAI-compatible provider (OpenAI, Groq, Together AI, or local Ollama with base URL `http://localhost:11434/v1` and model `llama3.2`).

## Steps to cover

**Step 1 — Retrieval function.** Reuse the semantic search logic from §10. Signature:

```rust
async fn retrieve(
    conn: &libsql::Connection,
    model: &TextEmbedding,
    query: &str,
    k: usize,
) -> Result<Vec<String>, Box<dyn std::error::Error>>
```

Returns the top-k passage texts ordered by cosine distance.

**Step 2 — Prompt construction.** Build a prompt string:

```
You are a helpful assistant. Answer the question using only the provided context.
If the context does not contain enough information, say so.

Context:
[passage 1]

[passage 2]

[passage 3]

Question: {question}

Answer:
```

**Step 3 — LLM API call.** POST to `https://api.openai.com/v1/chat/completions` with model `gpt-4o-mini`. Show the request/response structs with serde derives and return the `content` of the first choice message. Use `reqwest::Client` with a bearer token Authorization header.

**Step 4 — Wire it together and run.** Three example questions using the §10 corpus:
- `"How does Rust ensure memory safety?"` → should answer using Rust passages
- `"What is a black hole?"` → should answer using astronomy passages
- `"What is the Maillard reaction?"` → should answer using cooking passages

Print the retrieved passages first (so the reader can see what context was used), then the LLM's answer.

**Step 5 — Discussion: RAG patterns.** After the reference solution, add a prose section (not numbered steps) covering:
- Chunk size and overlap: why long documents are split into overlapping passages before embedding
- Re-ranking: a cross-encoder can re-rank the top-k ANN results for better precision
- Hybrid search: combining BM25 (keyword) and ANN (semantic) often outperforms either alone
- Context window limits: number of passages to inject depends on the model's context length and passage length

## Reference solution

Full `main.rs` inside `<details>`. Keep `retrieve`, `build_prompt`, and `call_llm` as clearly named separate functions. The `main` function should be a thin orchestrator.