You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

3.5 KiB

+++ title = "§12 Exercise 5: Retrieval-Augmented Generation" priority = 5 status = "todo" ticket_type = "task" dependencies = [] +++

§12 Exercise 5 — Retrieval-Augmented Generation — Stub to fill

File: edu/src/vector-db.md, section ### 12. Exercise 5 — Retrieval-Augmented Generation

Replace this stub line with the full exercise:

Goal: Combine vector search with a language model to build a retrieval-augmented generation (RAG) pipeline [...] 🚧 Full content tracked in [nbd:5ed295].

Follow the exercise format from edu/src/markov.md. This is the capstone exercise — it combines Turso vector search (§7§8), fastembed (§9), and semantic search (§10), adding an LLM API call to ground answers in retrieved context.

Goal

  1. Store the 15-passage corpus from §10 in Turso
  2. Accept a natural-language question
  3. Retrieve the top-3 most relevant passages using vector KNN
  4. Inject the passages into a prompt as context
  5. Send the prompt to an OpenAI-compatible LLM API
  6. Print the grounded answer

Setup

[dependencies]
libsql = "0.9"
fastembed = "4"
reqwest = { version = "0.12", features = ["json"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
tokio = { version = "1", features = ["full"] }

API key from environment: std::env::var("OPENAI_API_KEY"). Tell the reader they can use any OpenAI-compatible provider (OpenAI, Groq, Together AI, or local Ollama with base URL http://localhost:11434/v1 and model llama3.2).

Steps to cover

Step 1 — Retrieval function. Reuse the semantic search logic from §10. Signature:

async fn retrieve(
    conn: &libsql::Connection,
    model: &TextEmbedding,
    query: &str,
    k: usize,
) -> Result<Vec<String>, Box<dyn std::error::Error>>

Returns the top-k passage texts ordered by cosine distance.

Step 2 — Prompt construction. Build a prompt string:

You are a helpful assistant. Answer the question using only the provided context.
If the context does not contain enough information, say so.

Context:
[passage 1]

[passage 2]

[passage 3]

Question: {question}

Answer:

Step 3 — LLM API call. POST to https://api.openai.com/v1/chat/completions with model gpt-4o-mini. Show the request/response structs with serde derives and return the content of the first choice message. Use reqwest::Client with a bearer token Authorization header.

Step 4 — Wire it together and run. Three example questions using the §10 corpus:

  • "How does Rust ensure memory safety?" → should answer using Rust passages
  • "What is a black hole?" → should answer using astronomy passages
  • "What is the Maillard reaction?" → should answer using cooking passages

Print the retrieved passages first (so the reader can see what context was used), then the LLM's answer.

Step 5 — Discussion: RAG patterns. After the reference solution, add a prose section (not numbered steps) covering:

  • Chunk size and overlap: why long documents are split into overlapping passages before embedding
  • Re-ranking: a cross-encoder can re-rank the top-k ANN results for better precision
  • Hybrid search: combining BM25 (keyword) and ANN (semantic) often outperforms either alone
  • Context window limits: number of passages to inject depends on the model's context length and passage length

Reference solution

Full main.rs inside <details>. Keep retrieve, build_prompt, and call_llm as clearly named separate functions. The main function should be a thin orchestrator.