4.1 KiB
+++ title = "§3 Compiler Architecture: The Pipeline" priority = 5 status = "todo" ticket_type = "task" dependencies = [] +++
§3 Compiler Architecture: The Pipeline — Stub to fill
File: edu/src/lisp-compiler.md, section ### 3. Compiler Architecture: The Pipeline
Replace the stub line with full content. Target 500–700 words. Design overview — ASCII diagrams, brief stage descriptions, Rust module layout, error philosophy. No code yet.
Learning objectives
- Understand the four compilation stages and what each produces
- Know the Rust type that flows between each stage
- Understand where errors originate and how they are reported
- See the module structure before writing any code
Content to write
Pipeline Diagram
Source text (&str)
│
▼
┌──────────┐
│ Parser │ src/parser.rs
└──────────┘
│ Vec<Expr>
▼
┌───────────────────┐
│ Semantic Analyser │ src/analyser.rs
└───────────────────┘
│ Vec<Expr> (validated)
▼
┌──────────────────┐
│ Code Generator │ src/codegen.rs
└──────────────────┘
│ String (C source)
▼
stdout / output file
Stage Descriptions
Parser (src/parser.rs). Accepts &str and produces Vec<Expr>. Uses nom combinators. Fails on syntax errors: unmatched parentheses, invalid tokens, unexpected EOF.
Semantic Analyser (src/analyser.rs). Walks Vec<Expr> and checks: every symbol reference resolves to a definition, every special form has the correct shape and arity, lambda bodies are non-empty. Returns the same Vec<Expr> on success; returns CompileError on failure. Does not do type inference — type errors surface as C compiler errors.
Code Generator (src/codegen.rs). Walks validated Vec<Expr> and produces a String of C source. This stage is pure — it cannot fail for valid input. Emits the preamble, forward declarations, and top-level forms in order.
Error type (src/error.rs). A CompileError enum with variants for each stage. Uniform error handling across the pipeline. Each variant carries enough context for a useful message (e.g., the undefined symbol name).
Module Layout
src/
├── main.rs # CLI: read input, call compile(), write output
├── ast.rs # Expr enum and Display impl
├── parser.rs # nom parsers → Vec<Expr>
├── analyser.rs # scope checking and form validation
├── codegen.rs # AST → C string
└── error.rs # CompileError enum
The compile Function
Show the top-level function signature the reader will implement in §16:
pub fn compile(source: &str) -> Result<String, CompileError> {
let exprs = parser::parse(source)?;
let exprs = analyser::analyse(exprs)?;
let c_source = codegen::generate(exprs);
Ok(c_source)
}
This makes explicit that parsing and analysis are fallible but code generation is not.
Error Reporting Philosophy
The compiler reports the first error it encounters and stops. It does not attempt to recover and continue after a syntax error. nom's cut combinator is used at commit points to produce better error messages. A production compiler would collect multiple errors — this is a deliberate simplification.
Errors include enough context to be actionable:
- Syntax errors: what character was unexpected and approximately where
- Semantic errors: the name of the undefined symbol or the malformed form
How Sections Map to the Diagram
Tell the reader: Sections 4–9 fill in the parser box. Sections 10–11 fill in the analyser box. Sections 12–15 fill in the code generator box. Section 16 wires them together.
Style notes
- Open with the pipeline diagram; it's the most information-dense single element in the section
- Keep prose tight — the diagram does the heavy lifting
- The
compilefunction signature is the key insight: two fallible stages, one infallible stage