You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

102 lines
4.1 KiB
Markdown

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

+++
title = "§3 Compiler Architecture: The Pipeline"
priority = 5
status = "done"
ticket_type = "task"
dependencies = []
+++
## §3 Compiler Architecture: The Pipeline — Stub to fill
File: `edu/src/lisp-compiler.md`, section `### 3. Compiler Architecture: The Pipeline`
Replace the stub line with full content. Target 500700 words. Design overview — ASCII diagrams, brief stage descriptions, Rust module layout, error philosophy. No code yet.
## Learning objectives
- Understand the four compilation stages and what each produces
- Know the Rust type that flows between each stage
- Understand where errors originate and how they are reported
- See the module structure before writing any code
## Content to write
### Pipeline Diagram
```
Source text (&str)
┌──────────┐
│ Parser │ src/parser.rs
└──────────┘
│ Vec<Expr>
┌───────────────────┐
│ Semantic Analyser │ src/analyser.rs
└───────────────────┘
│ Vec<Expr> (validated)
┌──────────────────┐
│ Code Generator │ src/codegen.rs
└──────────────────┘
│ String (C source)
stdout / output file
```
### Stage Descriptions
**Parser** (`src/parser.rs`). Accepts `&str` and produces `Vec<Expr>`. Uses nom combinators. Fails on syntax errors: unmatched parentheses, invalid tokens, unexpected EOF.
**Semantic Analyser** (`src/analyser.rs`). Walks `Vec<Expr>` and checks: every symbol reference resolves to a definition, every special form has the correct shape and arity, lambda bodies are non-empty. Returns the same `Vec<Expr>` on success; returns `CompileError` on failure. Does not do type inference — type errors surface as C compiler errors.
**Code Generator** (`src/codegen.rs`). Walks validated `Vec<Expr>` and produces a `String` of C source. This stage is pure — it cannot fail for valid input. Emits the preamble, forward declarations, and top-level forms in order.
**Error type** (`src/error.rs`). A `CompileError` enum with variants for each stage. Uniform error handling across the pipeline. Each variant carries enough context for a useful message (e.g., the undefined symbol name).
### Module Layout
```
src/
├── main.rs # CLI: read input, call compile(), write output
├── ast.rs # Expr enum and Display impl
├── parser.rs # nom parsers → Vec<Expr>
├── analyser.rs # scope checking and form validation
├── codegen.rs # AST → C string
└── error.rs # CompileError enum
```
### The `compile` Function
Show the top-level function signature the reader will implement in §16:
```rust
pub fn compile(source: &str) -> Result<String, CompileError> {
let exprs = parser::parse(source)?;
let exprs = analyser::analyse(exprs)?;
let c_source = codegen::generate(exprs);
Ok(c_source)
}
```
This makes explicit that parsing and analysis are fallible but code generation is not.
### Error Reporting Philosophy
The compiler reports the first error it encounters and stops. It does not attempt to recover and continue after a syntax error. nom's `cut` combinator is used at commit points to produce better error messages. A production compiler would collect multiple errors — this is a deliberate simplification.
Errors include enough context to be actionable:
- Syntax errors: what character was unexpected and approximately where
- Semantic errors: the name of the undefined symbol or the malformed form
### How Sections Map to the Diagram
Tell the reader: Sections 49 fill in the parser box. Sections 1011 fill in the analyser box. Sections 1215 fill in the code generator box. Section 16 wires them together.
## Style notes
- Open with the pipeline diagram; it's the most information-dense single element in the section
- Keep prose tight — the diagram does the heavy lifting
- The `compile` function signature is the key insight: two fallible stages, one infallible stage