# Writing a Lisp-to-C Compiler in Rust This course walks you through building a complete, working compiler from scratch. You will write every component yourself — a lexer, a parser, a semantic analyser, and a code generator — ending with a program that reads **MiniLisp** source code and emits valid C. The compiler is written in Rust and uses the [nom](https://github.com/rust-bakery/nom) parser-combinator library for all parsing work. Sections marked 🚧 are stubs whose full content is tracked in an `nbd` ticket. --- ## Table of Contents **Part 1 — Foundations** 1. [Introduction: What We're Building](#1-introduction-what-were-building) 2. [MiniLisp Language Specification](#2-minilisp-language-specification) 3. [Compiler Architecture: The Pipeline](#3-compiler-architecture-the-pipeline) **Part 2 — Parsing with nom** 4. [Introduction to nom: Parser Combinators](#4-introduction-to-nom-parser-combinators) 5. [Setting Up the Project](#5-setting-up-the-project) 6. [Recognizing Atoms: Integers, Booleans, Strings, Symbols](#6-recognizing-atoms-integers-booleans-strings-symbols) 7. [The Abstract Syntax Tree](#7-the-abstract-syntax-tree) 8. [Parsing Atoms with nom](#8-parsing-atoms-with-nom) 9. [Parsing S-Expressions and Special Forms](#9-parsing-s-expressions-and-special-forms) **Part 3 — Semantic Analysis** 10. [Symbol Tables and Scope](#10-symbol-tables-and-scope) 11. [Checking Special Forms](#11-checking-special-forms) **Part 4 — Code Generation** 12. [The C Runtime Preamble](#12-the-c-runtime-preamble) 13. [Generating C: Atoms and Expressions](#13-generating-c-atoms-and-expressions) 14. [Generating C: Definitions and Functions](#14-generating-c-definitions-and-functions) 15. [Generating C: Control Flow and Sequencing](#15-generating-c-control-flow-and-sequencing) **Part 5 — Putting It Together** 16. [The Compilation Pipeline](#16-the-compilation-pipeline) 17. [Testing the Compiler](#17-testing-the-compiler) 18. [What's Next: Extensions and Further Reading](#18-whats-next-extensions-and-further-reading) --- ## Part 1 — Foundations ### 1. Introduction: What We're Building A compiler is a program that transforms source code written in one language into equivalent code in another. By the end of this course you will have written one that accepts MiniLisp — a small, clean dialect of Lisp — and produces human-readable C that you can compile and run with any standard C compiler. Along the way you will implement each classic compiler stage from scratch: lexical analysis, parsing, semantic analysis, and code generation. 🚧 Full content tracked in [nbd:e8da8b]. --- ### 2. MiniLisp Language Specification MiniLisp is the source language of our compiler. It is a minimal Lisp dialect with integers, booleans, strings, first-class functions, lexical scope, and a small set of built-in operators. This section defines every syntactic form precisely, gives the grammar in EBNF, and shows a complete example program so you know exactly what the compiler must handle before you write a single line of Rust. 🚧 Full content tracked in [nbd:a93829]. --- ### 3. Compiler Architecture: The Pipeline Our compiler is a classic multi-stage pipeline: source text passes through a parser, producing an AST; the AST passes through a semantic analyser, which validates scope and form usage; the validated AST passes through a code generator, which emits C. This section maps that pipeline onto the module structure you will build and explains how data and errors flow between stages. 🚧 Full content tracked in [nbd:3aeb62]. --- ## Part 2 — Parsing with nom ### 4. Introduction to nom: Parser Combinators nom is a parser-combinator library: instead of writing a grammar file and running a generator, you write small Rust functions that each recognise a fragment of input, then combine them into larger parsers. This section introduces the core `IResult` type, walks through the essential combinators (`tag`, `char`, `alt`, `many0`, `map`, `tuple`, `delimited`, `preceded`), and shows how to write, compose, and test parsers before you apply any of this to MiniLisp. 🚧 Full content tracked in [nbd:5835e9]. --- ### 5. Setting Up the Project You will create a new Rust binary crate for the compiler, add nom and any other dependencies to `Cargo.toml`, and lay out the module structure that the rest of the course fills in. By the end of this section you will have a project that compiles, a `src/main.rs` that reads from stdin, and placeholder modules for each compiler stage. 🚧 Full content tracked in [nbd:3dc36b]. --- ### 6. Recognizing Atoms: Integers, Booleans, Strings, Symbols Before building the full parser, you need nom parsers for each atomic value in MiniLisp: signed integers, boolean literals `#t` and `#f`, double-quoted strings with escape sequences, and symbol identifiers. This section develops each atom parser in isolation, explains the nom combinators used, and provides exercises to test your understanding before the parts are assembled into the full parser. 🚧 Full content tracked in [nbd:685f5e]. --- ### 7. The Abstract Syntax Tree The parser's output is an **Abstract Syntax Tree** — a Rust data structure that captures the meaning of a MiniLisp program without the syntactic noise of parentheses and whitespace. This section defines the `Expr` enum and its variants, discusses why the tree is structured the way it is, and implements `Display` so you can inspect parse results during development. 🚧 Full content tracked in [nbd:a1a827]. --- ### 8. Parsing Atoms with nom With atom parsers and the AST defined, this section assembles them into a single `parse_atom` function that recognises any MiniLisp atom and returns the corresponding `Expr` variant. You will use `alt` to try each alternative in turn, learn how nom reports errors and how to interpret them, and write unit tests that verify correct parsing of every atom type. 🚧 Full content tracked in [nbd:b6c9ad]. --- ### 9. Parsing S-Expressions and Special Forms S-expressions are parenthesised lists: the heart of Lisp syntax. This section extends the parser to handle arbitrarily nested lists, whitespace between elements, and comments. It then lifts special forms — `define`, `if`, `lambda`, `let`, `begin` — out of the generic list parser so they become distinct AST variants, and covers how to handle recursive parsers in nom without running into borrow-checker problems. 🚧 Full content tracked in [nbd:a4c9f8]. --- ## Part 3 — Semantic Analysis ### 10. Symbol Tables and Scope A symbol table maps names to their definitions. This section walks through a scope-aware traversal of the AST that builds a symbol table, resolves every symbol reference to its definition, and reports helpful errors for undefined names or names used outside their scope. You will implement a simple environment chain — the standard technique for representing nested lexical scopes. 🚧 Full content tracked in [nbd:d0b9f8]. --- ### 11. Checking Special Forms Special forms have fixed shapes: `if` needs exactly three sub-expressions; `define` needs a name and a body; `lambda` needs a parameter list and at least one body expression. This section adds arity and shape checks for each special form so that malformed programs produce clear error messages rather than mysterious C output. 🚧 Full content tracked in [nbd:6d40a7]. --- ## Part 4 — Code Generation ### 12. The C Runtime Preamble Every MiniLisp program compiles to a C file that begins with a standard preamble: `#include` directives, type aliases, boolean constants, and thin wrappers for built-in operations like `display` and `newline`. This section designs the preamble, explains why each piece is there, and shows how the code generator emits it before any user-defined code. 🚧 Full content tracked in [nbd:3e1250]. --- ### 13. Generating C: Atoms and Expressions This section implements the expression code generator — the recursive function that turns an `Expr` into a C expression string. Integers become C integer literals; booleans become `TRUE` and `FALSE`; strings become string literals; arithmetic and comparison operations become C operators; function calls become C function-call syntax. You will also handle name-mangling: turning Lisp symbols like `my-var` into valid C identifiers. 🚧 Full content tracked in [nbd:1eb794]. --- ### 14. Generating C: Definitions and Functions Top-level `define` forms and `lambda` expressions compile to C function and variable declarations. This section covers how to emit forward declarations (so mutual recursion works), how to turn a MiniLisp parameter list into a C function signature, how `lambda` compiles to a named C function, and how top-level definitions are ordered in the output file. 🚧 Full content tracked in [nbd:cbc6e3]. --- ### 15. Generating C: Control Flow and Sequencing `if`, `begin`, and `let` each require their own code-generation strategy. `if` becomes a C ternary expression or an `if`/`else` statement depending on context; `begin` becomes a sequence of C statements with the last value forwarded; `let` introduces a C block with local variable declarations. This section works through each form and resolves the practical question of when to emit expressions versus statements. 🚧 Full content tracked in [nbd:de82f1]. --- ## Part 5 — Putting It Together ### 16. The Compilation Pipeline With all stages implemented, this section wires them into a single `compile` function and builds a CLI entry point that reads MiniLisp from a file or stdin and writes C to stdout or a file. You will add basic error reporting that shows the source location of each failure and trace a complete example — a recursive factorial function — through every stage. 🚧 Full content tracked in [nbd:58b37a]. --- ### 17. Testing the Compiler Good tests are what turn a working prototype into a reliable tool. This section adds unit tests for each compiler stage and integration tests that compile MiniLisp programs, feed the C output to `cc`, run the binary, and assert on stdout. You will build a small test corpus of MiniLisp programs covering all language features and ensure the compiler handles both valid and invalid input gracefully. 🚧 Full content tracked in [nbd:8fa47a]. --- ### 18. What's Next: Extensions and Further Reading The compiler you have built is deliberately minimal — a solid foundation. This final section surveys the directions you can take it further: tail-call optimisation, closures and lambda lifting, a garbage collector, hygienic macros, a type system, an interactive REPL, and a self-hosting MiniLisp standard library. It closes with a curated reading list for going deeper into compiler theory and Lisp implementation. 🚧 Full content tracked in [nbd:1d16da].