+++ title = "§12 Exercise 5: Retrieval-Augmented Generation" priority = 5 status = "todo" ticket_type = "task" dependencies = [] +++ ## §12 Exercise 5 — Retrieval-Augmented Generation — Stub to fill File: `edu/src/vector-db.md`, section `### 12. Exercise 5 — Retrieval-Augmented Generation` Replace this stub line with the full exercise: > **Goal:** Combine vector search with a language model to build a retrieval-augmented generation (RAG) pipeline [...] 🚧 Full content tracked in [nbd:5ed295]. Follow the exercise format from `edu/src/markov.md`. This is the capstone exercise — it combines Turso vector search (§7–§8), fastembed (§9), and semantic search (§10), adding an LLM API call to ground answers in retrieved context. ## Goal 1. Store the 15-passage corpus from §10 in Turso 2. Accept a natural-language question 3. Retrieve the top-3 most relevant passages using vector KNN 4. Inject the passages into a prompt as context 5. Send the prompt to an OpenAI-compatible LLM API 6. Print the grounded answer ## Setup ```toml [dependencies] libsql = "0.9" fastembed = "4" reqwest = { version = "0.12", features = ["json"] } serde = { version = "1", features = ["derive"] } serde_json = "1" tokio = { version = "1", features = ["full"] } ``` API key from environment: `std::env::var("OPENAI_API_KEY")`. Tell the reader they can use any OpenAI-compatible provider (OpenAI, Groq, Together AI, or local Ollama with base URL `http://localhost:11434/v1` and model `llama3.2`). ## Steps to cover **Step 1 — Retrieval function.** Reuse the semantic search logic from §10. Signature: ```rust async fn retrieve( conn: &libsql::Connection, model: &TextEmbedding, query: &str, k: usize, ) -> Result, Box> ``` Returns the top-k passage texts ordered by cosine distance. **Step 2 — Prompt construction.** Build a prompt string: ``` You are a helpful assistant. Answer the question using only the provided context. If the context does not contain enough information, say so. Context: [passage 1] [passage 2] [passage 3] Question: {question} Answer: ``` **Step 3 — LLM API call.** POST to `https://api.openai.com/v1/chat/completions` with model `gpt-4o-mini`. Show the request/response structs with serde derives and return the `content` of the first choice message. Use `reqwest::Client` with a bearer token Authorization header. **Step 4 — Wire it together and run.** Three example questions using the §10 corpus: - `"How does Rust ensure memory safety?"` → should answer using Rust passages - `"What is a black hole?"` → should answer using astronomy passages - `"What is the Maillard reaction?"` → should answer using cooking passages Print the retrieved passages first (so the reader can see what context was used), then the LLM's answer. **Step 5 — Discussion: RAG patterns.** After the reference solution, add a prose section (not numbered steps) covering: - Chunk size and overlap: why long documents are split into overlapping passages before embedding - Re-ranking: a cross-encoder can re-rank the top-k ANN results for better precision - Hybrid search: combining BM25 (keyword) and ANN (semantic) often outperforms either alone - Context window limits: number of passages to inject depends on the model's context length and passage length ## Reference solution Full `main.rs` inside `
`. Keep `retrieve`, `build_prompt`, and `call_llm` as clearly named separate functions. The `main` function should be a thin orchestrator.