You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
vibed/edu/src/lisp-compiler.md

11 KiB

Writing a Lisp-to-C Compiler in Rust

This course walks you through building a complete, working compiler from scratch. You will write every component yourself — a lexer, a parser, a semantic analyser, and a code generator — ending with a program that reads MiniLisp source code and emits valid C. The compiler is written in Rust and uses the nom parser-combinator library for all parsing work. Sections marked 🚧 are stubs whose full content is tracked in an nbd ticket.


Table of Contents

Part 1 — Foundations

  1. Introduction: What We're Building
  2. MiniLisp Language Specification
  3. Compiler Architecture: The Pipeline

Part 2 — Parsing with nom

  1. Introduction to nom: Parser Combinators
  2. Setting Up the Project
  3. Recognizing Atoms: Integers, Booleans, Strings, Symbols
  4. The Abstract Syntax Tree
  5. Parsing Atoms with nom
  6. Parsing S-Expressions and Special Forms

Part 3 — Semantic Analysis

  1. Symbol Tables and Scope
  2. Checking Special Forms

Part 4 — Code Generation

  1. The C Runtime Preamble
  2. Generating C: Atoms and Expressions
  3. Generating C: Definitions and Functions
  4. Generating C: Control Flow and Sequencing

Part 5 — Putting It Together

  1. The Compilation Pipeline
  2. Testing the Compiler
  3. What's Next: Extensions and Further Reading

Part 1 — Foundations

1. Introduction: What We're Building

A compiler is a program that transforms source code written in one language into equivalent code in another. By the end of this course you will have written one that accepts MiniLisp — a small, clean dialect of Lisp — and produces human-readable C that you can compile and run with any standard C compiler. Along the way you will implement each classic compiler stage from scratch: lexical analysis, parsing, semantic analysis, and code generation.

🚧 Full content tracked in [nbd:e8da8b].


2. MiniLisp Language Specification

MiniLisp is the source language of our compiler. It is a minimal Lisp dialect with integers, booleans, strings, first-class functions, lexical scope, and a small set of built-in operators. This section defines every syntactic form precisely, gives the grammar in EBNF, and shows a complete example program so you know exactly what the compiler must handle before you write a single line of Rust.

🚧 Full content tracked in [nbd:a93829].


3. Compiler Architecture: The Pipeline

Our compiler is a classic multi-stage pipeline: source text passes through a parser, producing an AST; the AST passes through a semantic analyser, which validates scope and form usage; the validated AST passes through a code generator, which emits C. This section maps that pipeline onto the module structure you will build and explains how data and errors flow between stages.

🚧 Full content tracked in [nbd:3aeb62].


Part 2 — Parsing with nom

4. Introduction to nom: Parser Combinators

nom is a parser-combinator library: instead of writing a grammar file and running a generator, you write small Rust functions that each recognise a fragment of input, then combine them into larger parsers. This section introduces the core IResult<I, O, E> type, walks through the essential combinators (tag, char, alt, many0, map, tuple, delimited, preceded), and shows how to write, compose, and test parsers before you apply any of this to MiniLisp.

🚧 Full content tracked in [nbd:5835e9].


5. Setting Up the Project

You will create a new Rust binary crate for the compiler, add nom and any other dependencies to Cargo.toml, and lay out the module structure that the rest of the course fills in. By the end of this section you will have a project that compiles, a src/main.rs that reads from stdin, and placeholder modules for each compiler stage.

🚧 Full content tracked in [nbd:3dc36b].


6. Recognizing Atoms: Integers, Booleans, Strings, Symbols

Before building the full parser, you need nom parsers for each atomic value in MiniLisp: signed integers, boolean literals #t and #f, double-quoted strings with escape sequences, and symbol identifiers. This section develops each atom parser in isolation, explains the nom combinators used, and provides exercises to test your understanding before the parts are assembled into the full parser.

🚧 Full content tracked in [nbd:685f5e].


7. The Abstract Syntax Tree

The parser's output is an Abstract Syntax Tree — a Rust data structure that captures the meaning of a MiniLisp program without the syntactic noise of parentheses and whitespace. This section defines the Expr enum and its variants, discusses why the tree is structured the way it is, and implements Display so you can inspect parse results during development.

🚧 Full content tracked in [nbd:a1a827].


8. Parsing Atoms with nom

With atom parsers and the AST defined, this section assembles them into a single parse_atom function that recognises any MiniLisp atom and returns the corresponding Expr variant. You will use alt to try each alternative in turn, learn how nom reports errors and how to interpret them, and write unit tests that verify correct parsing of every atom type.

🚧 Full content tracked in [nbd:b6c9ad].


9. Parsing S-Expressions and Special Forms

S-expressions are parenthesised lists: the heart of Lisp syntax. This section extends the parser to handle arbitrarily nested lists, whitespace between elements, and comments. It then lifts special forms — define, if, lambda, let, begin — out of the generic list parser so they become distinct AST variants, and covers how to handle recursive parsers in nom without running into borrow-checker problems.

🚧 Full content tracked in [nbd:a4c9f8].


Part 3 — Semantic Analysis

10. Symbol Tables and Scope

A symbol table maps names to their definitions. This section walks through a scope-aware traversal of the AST that builds a symbol table, resolves every symbol reference to its definition, and reports helpful errors for undefined names or names used outside their scope. You will implement a simple environment chain — the standard technique for representing nested lexical scopes.

🚧 Full content tracked in [nbd:d0b9f8].


11. Checking Special Forms

Special forms have fixed shapes: if needs exactly three sub-expressions; define needs a name and a body; lambda needs a parameter list and at least one body expression. This section adds arity and shape checks for each special form so that malformed programs produce clear error messages rather than mysterious C output.

🚧 Full content tracked in [nbd:6d40a7].


Part 4 — Code Generation

12. The C Runtime Preamble

Every MiniLisp program compiles to a C file that begins with a standard preamble: #include directives, type aliases, boolean constants, and thin wrappers for built-in operations like display and newline. This section designs the preamble, explains why each piece is there, and shows how the code generator emits it before any user-defined code.

🚧 Full content tracked in [nbd:3e1250].


13. Generating C: Atoms and Expressions

This section implements the expression code generator — the recursive function that turns an Expr into a C expression string. Integers become C integer literals; booleans become TRUE and FALSE; strings become string literals; arithmetic and comparison operations become C operators; function calls become C function-call syntax. You will also handle name-mangling: turning Lisp symbols like my-var into valid C identifiers.

🚧 Full content tracked in [nbd:1eb794].


14. Generating C: Definitions and Functions

Top-level define forms and lambda expressions compile to C function and variable declarations. This section covers how to emit forward declarations (so mutual recursion works), how to turn a MiniLisp parameter list into a C function signature, how lambda compiles to a named C function, and how top-level definitions are ordered in the output file.

🚧 Full content tracked in [nbd:cbc6e3].


15. Generating C: Control Flow and Sequencing

if, begin, and let each require their own code-generation strategy. if becomes a C ternary expression or an if/else statement depending on context; begin becomes a sequence of C statements with the last value forwarded; let introduces a C block with local variable declarations. This section works through each form and resolves the practical question of when to emit expressions versus statements.

🚧 Full content tracked in [nbd:de82f1].


Part 5 — Putting It Together

16. The Compilation Pipeline

With all stages implemented, this section wires them into a single compile function and builds a CLI entry point that reads MiniLisp from a file or stdin and writes C to stdout or a file. You will add basic error reporting that shows the source location of each failure and trace a complete example — a recursive factorial function — through every stage.

🚧 Full content tracked in [nbd:58b37a].


17. Testing the Compiler

Good tests are what turn a working prototype into a reliable tool. This section adds unit tests for each compiler stage and integration tests that compile MiniLisp programs, feed the C output to cc, run the binary, and assert on stdout. You will build a small test corpus of MiniLisp programs covering all language features and ensure the compiler handles both valid and invalid input gracefully.

🚧 Full content tracked in [nbd:8fa47a].


18. What's Next: Extensions and Further Reading

The compiler you have built is deliberately minimal — a solid foundation. This final section surveys the directions you can take it further: tail-call optimisation, closures and lambda lifting, a garbage collector, hygienic macros, a type system, an interactive REPL, and a self-hosting MiniLisp standard library. It closes with a curated reading list for going deeper into compiler theory and Lisp implementation.

🚧 Full content tracked in [nbd:1d16da].