3.5 KiB
+++ title = "§12 Exercise 5: Retrieval-Augmented Generation" priority = 5 status = "todo" ticket_type = "task" dependencies = [] +++
§12 Exercise 5 — Retrieval-Augmented Generation — Stub to fill
File: edu/src/vector-db.md, section ### 12. Exercise 5 — Retrieval-Augmented Generation
Replace this stub line with the full exercise:
Goal: Combine vector search with a language model to build a retrieval-augmented generation (RAG) pipeline [...] 🚧 Full content tracked in [nbd:5ed295].
Follow the exercise format from edu/src/markov.md. This is the capstone exercise — it combines Turso vector search (§7–§8), fastembed (§9), and semantic search (§10), adding an LLM API call to ground answers in retrieved context.
Goal
- Store the 15-passage corpus from §10 in Turso
- Accept a natural-language question
- Retrieve the top-3 most relevant passages using vector KNN
- Inject the passages into a prompt as context
- Send the prompt to an OpenAI-compatible LLM API
- Print the grounded answer
Setup
[dependencies]
libsql = "0.9"
fastembed = "4"
reqwest = { version = "0.12", features = ["json"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
tokio = { version = "1", features = ["full"] }
API key from environment: std::env::var("OPENAI_API_KEY"). Tell the reader they can use any OpenAI-compatible provider (OpenAI, Groq, Together AI, or local Ollama with base URL http://localhost:11434/v1 and model llama3.2).
Steps to cover
Step 1 — Retrieval function. Reuse the semantic search logic from §10. Signature:
async fn retrieve(
conn: &libsql::Connection,
model: &TextEmbedding,
query: &str,
k: usize,
) -> Result<Vec<String>, Box<dyn std::error::Error>>
Returns the top-k passage texts ordered by cosine distance.
Step 2 — Prompt construction. Build a prompt string:
You are a helpful assistant. Answer the question using only the provided context.
If the context does not contain enough information, say so.
Context:
[passage 1]
[passage 2]
[passage 3]
Question: {question}
Answer:
Step 3 — LLM API call. POST to https://api.openai.com/v1/chat/completions with model gpt-4o-mini. Show the request/response structs with serde derives and return the content of the first choice message. Use reqwest::Client with a bearer token Authorization header.
Step 4 — Wire it together and run. Three example questions using the §10 corpus:
"How does Rust ensure memory safety?"→ should answer using Rust passages"What is a black hole?"→ should answer using astronomy passages"What is the Maillard reaction?"→ should answer using cooking passages
Print the retrieved passages first (so the reader can see what context was used), then the LLM's answer.
Step 5 — Discussion: RAG patterns. After the reference solution, add a prose section (not numbered steps) covering:
- Chunk size and overlap: why long documents are split into overlapping passages before embedding
- Re-ranking: a cross-encoder can re-rank the top-k ANN results for better precision
- Hybrid search: combining BM25 (keyword) and ANN (semantic) often outperforms either alone
- Context window limits: number of passages to inject depends on the model's context length and passage length
Reference solution
Full main.rs inside <details>. Keep retrieve, build_prompt, and call_llm as clearly named separate functions. The main function should be a thin orchestrator.