You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

4.1 KiB

+++ title = "§3 Compiler Architecture: The Pipeline" priority = 5 status = "todo" ticket_type = "task" dependencies = [] +++

§3 Compiler Architecture: The Pipeline — Stub to fill

File: edu/src/lisp-compiler.md, section ### 3. Compiler Architecture: The Pipeline

Replace the stub line with full content. Target 500700 words. Design overview — ASCII diagrams, brief stage descriptions, Rust module layout, error philosophy. No code yet.

Learning objectives

  • Understand the four compilation stages and what each produces
  • Know the Rust type that flows between each stage
  • Understand where errors originate and how they are reported
  • See the module structure before writing any code

Content to write

Pipeline Diagram

Source text (&str)
      │
      ▼
  ┌──────────┐
  │  Parser  │   src/parser.rs
  └──────────┘
      │  Vec<Expr>
      ▼
  ┌───────────────────┐
  │ Semantic Analyser │  src/analyser.rs
  └───────────────────┘
      │  Vec<Expr>  (validated)
      ▼
  ┌──────────────────┐
  │  Code Generator  │  src/codegen.rs
  └──────────────────┘
      │  String  (C source)
      ▼
  stdout / output file

Stage Descriptions

Parser (src/parser.rs). Accepts &str and produces Vec<Expr>. Uses nom combinators. Fails on syntax errors: unmatched parentheses, invalid tokens, unexpected EOF.

Semantic Analyser (src/analyser.rs). Walks Vec<Expr> and checks: every symbol reference resolves to a definition, every special form has the correct shape and arity, lambda bodies are non-empty. Returns the same Vec<Expr> on success; returns CompileError on failure. Does not do type inference — type errors surface as C compiler errors.

Code Generator (src/codegen.rs). Walks validated Vec<Expr> and produces a String of C source. This stage is pure — it cannot fail for valid input. Emits the preamble, forward declarations, and top-level forms in order.

Error type (src/error.rs). A CompileError enum with variants for each stage. Uniform error handling across the pipeline. Each variant carries enough context for a useful message (e.g., the undefined symbol name).

Module Layout

src/
├── main.rs       # CLI: read input, call compile(), write output
├── ast.rs        # Expr enum and Display impl
├── parser.rs     # nom parsers → Vec<Expr>
├── analyser.rs   # scope checking and form validation
├── codegen.rs    # AST → C string
└── error.rs      # CompileError enum

The compile Function

Show the top-level function signature the reader will implement in §16:

pub fn compile(source: &str) -> Result<String, CompileError> {
    let exprs = parser::parse(source)?;
    let exprs = analyser::analyse(exprs)?;
    let c_source = codegen::generate(exprs);
    Ok(c_source)
}

This makes explicit that parsing and analysis are fallible but code generation is not.

Error Reporting Philosophy

The compiler reports the first error it encounters and stops. It does not attempt to recover and continue after a syntax error. nom's cut combinator is used at commit points to produce better error messages. A production compiler would collect multiple errors — this is a deliberate simplification.

Errors include enough context to be actionable:

  • Syntax errors: what character was unexpected and approximately where
  • Semantic errors: the name of the undefined symbol or the malformed form

How Sections Map to the Diagram

Tell the reader: Sections 49 fill in the parser box. Sections 1011 fill in the analyser box. Sections 1215 fill in the code generator box. Section 16 wires them together.

Style notes

  • Open with the pipeline diagram; it's the most information-dense single element in the section
  • Keep prose tight — the diagram does the heavy lifting
  • The compile function signature is the key insight: two fallible stages, one infallible stage