You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
vibed/edu/.beans/edu-1oh8--12-exercise-5-ret...

94 lines
3.5 KiB
Markdown

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

---
# edu-1oh8
title: '§12 Exercise 5: Retrieval-Augmented Generation'
status: completed
type: task
priority: normal
created_at: 2026-03-10T23:30:01Z
updated_at: 2026-03-10T23:30:01Z
---
## §12 Exercise 5 — Retrieval-Augmented Generation — Stub to fill
File: `edu/src/vector-db.md`, section `### 12. Exercise 5 — Retrieval-Augmented Generation`
Replace this stub line with the full exercise:
> **Goal:** Combine vector search with a language model to build a retrieval-augmented generation (RAG) pipeline [...] 🚧 Full content tracked in [nbd:5ed295].
Follow the exercise format from `edu/src/markov.md`. This is the capstone exercise — it combines Turso vector search (§7§8), fastembed (§9), and semantic search (§10), adding an LLM API call to ground answers in retrieved context.
## Goal
1. Store the 15-passage corpus from §10 in Turso
2. Accept a natural-language question
3. Retrieve the top-3 most relevant passages using vector KNN
4. Inject the passages into a prompt as context
5. Send the prompt to an OpenAI-compatible LLM API
6. Print the grounded answer
## Setup
```toml
[dependencies]
libsql = "0.9"
fastembed = "4"
reqwest = { version = "0.12", features = ["json"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
tokio = { version = "1", features = ["full"] }
```
API key from environment: `std::env::var("OPENAI_API_KEY")`. Tell the reader they can use any OpenAI-compatible provider (OpenAI, Groq, Together AI, or local Ollama with base URL `http://localhost:11434/v1` and model `llama3.2`).
## Steps to cover
**Step 1 — Retrieval function.** Reuse the semantic search logic from §10. Signature:
```rust
async fn retrieve(
conn: &libsql::Connection,
model: &TextEmbedding,
query: &str,
k: usize,
) -> Result<Vec<String>, Box<dyn std::error::Error>>
```
Returns the top-k passage texts ordered by cosine distance.
**Step 2 — Prompt construction.** Build a prompt string:
```
You are a helpful assistant. Answer the question using only the provided context.
If the context does not contain enough information, say so.
Context:
[passage 1]
[passage 2]
[passage 3]
Question: {question}
Answer:
```
**Step 3 — LLM API call.** POST to `https://api.openai.com/v1/chat/completions` with model `gpt-4o-mini`. Show the request/response structs with serde derives and return the `content` of the first choice message. Use `reqwest::Client` with a bearer token Authorization header.
**Step 4 — Wire it together and run.** Three example questions using the §10 corpus:
- `"How does Rust ensure memory safety?"` → should answer using Rust passages
- `"What is a black hole?"` → should answer using astronomy passages
- `"What is the Maillard reaction?"` → should answer using cooking passages
Print the retrieved passages first (so the reader can see what context was used), then the LLM's answer.
**Step 5 — Discussion: RAG patterns.** After the reference solution, add a prose section (not numbered steps) covering:
- Chunk size and overlap: why long documents are split into overlapping passages before embedding
- Re-ranking: a cross-encoder can re-rank the top-k ANN results for better precision
- Hybrid search: combining BM25 (keyword) and ANN (semantic) often outperforms either alone
- Context window limits: number of passages to inject depends on the model's context length and passage length
## Reference solution
Full `main.rs` inside `<details>`. Keep `retrieve`, `build_prompt`, and `call_llm` as clearly named separate functions. The `main` function should be a thin orchestrator.