|
|
+++
|
|
|
title = "§12 Exercise 5: Retrieval-Augmented Generation"
|
|
|
priority = 5
|
|
|
status = "todo"
|
|
|
ticket_type = "task"
|
|
|
dependencies = []
|
|
|
+++
|
|
|
## §12 Exercise 5 — Retrieval-Augmented Generation — Stub to fill
|
|
|
|
|
|
File: `edu/src/vector-db.md`, section `### 12. Exercise 5 — Retrieval-Augmented Generation`
|
|
|
|
|
|
Replace this stub line with the full exercise:
|
|
|
> **Goal:** Combine vector search with a language model to build a retrieval-augmented generation (RAG) pipeline [...] 🚧 Full content tracked in [nbd:5ed295].
|
|
|
|
|
|
Follow the exercise format from `edu/src/markov.md`. This is the capstone exercise — it combines Turso vector search (§7–§8), fastembed (§9), and semantic search (§10), adding an LLM API call to ground answers in retrieved context.
|
|
|
|
|
|
## Goal
|
|
|
|
|
|
1. Store the 15-passage corpus from §10 in Turso
|
|
|
2. Accept a natural-language question
|
|
|
3. Retrieve the top-3 most relevant passages using vector KNN
|
|
|
4. Inject the passages into a prompt as context
|
|
|
5. Send the prompt to an OpenAI-compatible LLM API
|
|
|
6. Print the grounded answer
|
|
|
|
|
|
## Setup
|
|
|
|
|
|
```toml
|
|
|
[dependencies]
|
|
|
libsql = "0.9"
|
|
|
fastembed = "4"
|
|
|
reqwest = { version = "0.12", features = ["json"] }
|
|
|
serde = { version = "1", features = ["derive"] }
|
|
|
serde_json = "1"
|
|
|
tokio = { version = "1", features = ["full"] }
|
|
|
```
|
|
|
|
|
|
API key from environment: `std::env::var("OPENAI_API_KEY")`. Tell the reader they can use any OpenAI-compatible provider (OpenAI, Groq, Together AI, or local Ollama with base URL `http://localhost:11434/v1` and model `llama3.2`).
|
|
|
|
|
|
## Steps to cover
|
|
|
|
|
|
**Step 1 — Retrieval function.** Reuse the semantic search logic from §10. Signature:
|
|
|
|
|
|
```rust
|
|
|
async fn retrieve(
|
|
|
conn: &libsql::Connection,
|
|
|
model: &TextEmbedding,
|
|
|
query: &str,
|
|
|
k: usize,
|
|
|
) -> Result<Vec<String>, Box<dyn std::error::Error>>
|
|
|
```
|
|
|
|
|
|
Returns the top-k passage texts ordered by cosine distance.
|
|
|
|
|
|
**Step 2 — Prompt construction.** Build a prompt string:
|
|
|
|
|
|
```
|
|
|
You are a helpful assistant. Answer the question using only the provided context.
|
|
|
If the context does not contain enough information, say so.
|
|
|
|
|
|
Context:
|
|
|
[passage 1]
|
|
|
|
|
|
[passage 2]
|
|
|
|
|
|
[passage 3]
|
|
|
|
|
|
Question: {question}
|
|
|
|
|
|
Answer:
|
|
|
```
|
|
|
|
|
|
**Step 3 — LLM API call.** POST to `https://api.openai.com/v1/chat/completions` with model `gpt-4o-mini`. Show the request/response structs with serde derives and return the `content` of the first choice message. Use `reqwest::Client` with a bearer token Authorization header.
|
|
|
|
|
|
**Step 4 — Wire it together and run.** Three example questions using the §10 corpus:
|
|
|
- `"How does Rust ensure memory safety?"` → should answer using Rust passages
|
|
|
- `"What is a black hole?"` → should answer using astronomy passages
|
|
|
- `"What is the Maillard reaction?"` → should answer using cooking passages
|
|
|
|
|
|
Print the retrieved passages first (so the reader can see what context was used), then the LLM's answer.
|
|
|
|
|
|
**Step 5 — Discussion: RAG patterns.** After the reference solution, add a prose section (not numbered steps) covering:
|
|
|
- Chunk size and overlap: why long documents are split into overlapping passages before embedding
|
|
|
- Re-ranking: a cross-encoder can re-rank the top-k ANN results for better precision
|
|
|
- Hybrid search: combining BM25 (keyword) and ANN (semantic) often outperforms either alone
|
|
|
- Context window limits: number of passages to inject depends on the model's context length and passage length
|
|
|
|
|
|
## Reference solution
|
|
|
|
|
|
Full `main.rs` inside `<details>`. Keep `retrieve`, `build_prompt`, and `call_llm` as clearly named separate functions. The `main` function should be a thin orchestrator. |