You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

81 lines
5.0 KiB
Markdown

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

+++
title = "§18 What's Next: Extensions and Further Reading"
priority = 5
status = "done"
ticket_type = "task"
dependencies = []
+++
## §18 What's Next: Extensions and Further Reading — Stub to fill
File: `edu/src/lisp-compiler.md`, section `### 18. What's Next: Extensions and Further Reading`
Replace the stub line with full content. Target 600800 words. Survey the directions the compiler can be taken and provide a curated reading list. Reading-only, no code.
## Learning objectives
- Understand what limitations the current compiler has and why
- Know the conceptual approaches for each major extension
- Have a reading list for going deeper into compiler theory and Lisp implementation
## Content to write
### Congratulations — and what you skipped
Open by acknowledging what the reader has built: a complete compiler with a lexer, parser, semantic analyser, code generator, and test suite. Then honestly catalog what was left out:
### Extension 1: Closures and Lambda Lifting
The current compiler does not support closures — lambdas cannot capture variables from enclosing functions. Adding closures requires **lambda lifting**: transforming each lambda that captures free variables into a top-level function that takes those variables as extra parameters. This is a classical technique. Real Lisp runtimes use **closure records** (a struct containing the function pointer and captured values) allocated on the heap.
### Extension 2: Tail-Call Optimisation (TCO)
`(define (loop n) (if (= n 0) n (loop (- n 1))))` will stack overflow for large `n` in the current compiler because each recursive call pushes a new C stack frame. TCO transforms tail calls into jumps. In C, this can be approximated with the `__attribute__((optimize("O2")))` pragma or by using a trampoline pattern. A proper solution requires detecting tail-call position during code generation and emitting a `goto` loop.
### Extension 3: A Type System
Add type inference (Hindley-Milner or a simpler Hindley-style bidirectional checker) so that type errors are caught before C is emitted. This would allow the code generator to choose the correct `ml_display_*` variant and generate proper C function signatures for string-returning functions.
### Extension 4: Pairs, Lists, and a Runtime
`(cons a b)`, `(car p)`, `(cdr p)` require heap-allocated pair objects — a proper C struct. This opens the door to proper Lisp list processing. Once you have heap allocation, you need a garbage collector. The simplest GC is reference counting; a more robust approach is mark-and-sweep.
### Extension 5: Macros
Lisp macros transform code before compilation. A simple approach is **syntax transformers**: functions that run at compile time and return transformed AST nodes. This requires a small interpreter for the macro language. Hygienic macros (as in Scheme) are significantly more complex.
### Extension 6: A REPL
A read-eval-print loop compiles and runs one expression at a time. This requires either an interpreter (easier) or incremental native code emission (harder). An interpreter over the AST is a natural extension once the parser is complete — it's essentially the code generator replaced with a recursive evaluator.
### Extension 7: Self-Hosting
The ultimate milestone: rewrite the MiniLisp compiler in MiniLisp itself. This requires the language to be expressive enough (strings, I/O, some form of list processing) and the compiler to be complete enough to compile itself. Self-hosting is the proof that you've really built something.
### Further Reading
**Compiler theory:**
- *Crafting Interpreters* by Robert Nystrom — free online; builds a language in two complete implementations (tree-walking and bytecode)
- *Modern Compiler Implementation in ML/Java/C* by Andrew Appel — classic academic compiler textbook
- *Engineering a Compiler* by Cooper & Torczon — comprehensive modern treatment
**Lisp implementation:**
- *Structure and Interpretation of Computer Programs* (SICP) — chapters 4 and 5 cover interpreters and compilers for Scheme
- *Lisp in Small Pieces* by Christian Queinnec — 11 different Lisp implementations, from interpreter to compiler
- *Build Your Own Lisp* by Daniel Holden — free online, C implementation
**Parsing:**
- *Parsing Techniques* by Grune & Jacobs — comprehensive reference (free PDF)
- nom documentation and recipes: https://github.com/rust-bakery/nom/tree/main/doc
**Rust and compilers:**
- The `cranelift` crate — a code generator backend (Rust, used in Wasmtime)
- `inkwell` crate — safe Rust bindings to LLVM for native code generation
## Style notes
- Open warmly — the reader has accomplished something real
- Each extension should be a paragraph: what it is, why it is non-trivial, and the key technique
- The reading list is the most durable part of this section; keep it current and annotated
- Close the course with encouragement: the concepts learned here (parsing, AST manipulation, code generation) apply to every compiler, transpiler, and language tool the reader will ever build