|
|
+++
|
|
|
title = "§10 Symbol Tables and Scope"
|
|
|
priority = 5
|
|
|
status = "done"
|
|
|
ticket_type = "task"
|
|
|
dependencies = []
|
|
|
+++
|
|
|
|
|
|
## §10 Symbol Tables and Scope — Stub to fill
|
|
|
|
|
|
File: `edu/src/lisp-compiler.md`, section `### 10. Symbol Tables and Scope`
|
|
|
|
|
|
Replace the stub line with full content. Target 700–900 words. Build the environment chain that represents lexical scope, then write the scope-checking traversal. Reading-heavy with moderate code.
|
|
|
|
|
|
## Learning objectives
|
|
|
|
|
|
- Understand what a symbol table is and why it is needed
|
|
|
- Implement an environment chain (linked scope structure) in Rust
|
|
|
- Write an AST traversal that resolves all symbol references
|
|
|
- Produce clear `SemanticError` messages for undefined names
|
|
|
|
|
|
## Content to write
|
|
|
|
|
|
### What is a symbol table?
|
|
|
|
|
|
A symbol table maps names to information about them — where they are defined, their type (if we had types), and any other metadata. Our symbol table is simple: a set of names that are currently in scope. We just need to know *whether* a name is defined; we do not need to know *what* it is (no type information).
|
|
|
|
|
|
### Lexical scope
|
|
|
|
|
|
In MiniLisp, scope is lexical (also called static): a name's binding is determined by the syntactic structure of the program, not by the runtime call stack. When you write `(lambda (x) x)`, `x` is in scope inside the lambda body regardless of what `x` means in the surrounding context.
|
|
|
|
|
|
### The environment chain
|
|
|
|
|
|
Represent scope as a chain of `HashSet<String>`, one per scope level. Looking up a name means searching from innermost to outermost.
|
|
|
|
|
|
```rust
|
|
|
use std::collections::HashSet;
|
|
|
|
|
|
/// A chain of scopes representing the current lexical environment.
|
|
|
pub struct Env<'a> {
|
|
|
names: HashSet<String>,
|
|
|
parent: Option<&'a Env<'a>>,
|
|
|
}
|
|
|
|
|
|
impl<'a> Env<'a> {
|
|
|
pub fn new() -> Self {
|
|
|
Env { names: HashSet::new(), parent: None }
|
|
|
}
|
|
|
|
|
|
pub fn child(&'a self) -> Env<'a> {
|
|
|
Env { names: HashSet::new(), parent: Some(self) }
|
|
|
}
|
|
|
|
|
|
pub fn define(&mut self, name: &str) {
|
|
|
self.names.insert(name.to_string());
|
|
|
}
|
|
|
|
|
|
pub fn is_defined(&self, name: &str) -> bool {
|
|
|
self.names.contains(name) || self.parent.map_or(false, |p| p.is_defined(name))
|
|
|
}
|
|
|
}
|
|
|
```
|
|
|
|
|
|
Explain the lifetime `'a`: the child env borrows the parent env. Since children always have shorter lifetimes than parents (they go out of scope at the closing `)` of a `lambda` or `let`), this is safe.
|
|
|
|
|
|
### Pre-populating the global environment
|
|
|
|
|
|
Built-in operators and functions (`+`, `-`, `*`, `/`, `=`, `<`, `>`, `<=`, `>=`, `not`, `display`, `newline`, `error`) must be defined in the global env from the start — they are always available without a `define`.
|
|
|
|
|
|
```rust
|
|
|
pub fn global_env() -> Env<'static> {
|
|
|
let mut env = Env::new();
|
|
|
for name in ["+", "-", "*", "/", "=", "<", ">", "<=", ">=",
|
|
|
"not", "display", "newline", "error"] {
|
|
|
env.define(name);
|
|
|
}
|
|
|
env
|
|
|
}
|
|
|
```
|
|
|
|
|
|
### The scope-checking traversal
|
|
|
|
|
|
Walk the `Vec<Expr>` and call `check_expr` on each. The `check_expr` function pattern-matches on each `Expr` variant:
|
|
|
|
|
|
```rust
|
|
|
pub fn check_expr(expr: &Expr, env: &Env) -> Result<(), CompileError> {
|
|
|
match expr {
|
|
|
Expr::Symbol(name) => {
|
|
|
if !env.is_defined(name) {
|
|
|
return Err(CompileError::SemanticError(
|
|
|
format!("undefined symbol: `{}`", name)
|
|
|
));
|
|
|
}
|
|
|
Ok(())
|
|
|
}
|
|
|
Expr::Define { name, value } => {
|
|
|
check_expr(value, env)?;
|
|
|
// Note: we don't add `name` to env here because top-level defines
|
|
|
// are processed in a first pass (see below).
|
|
|
Ok(())
|
|
|
}
|
|
|
Expr::Lambda { params, body } => {
|
|
|
let mut child = env.child();
|
|
|
for p in params { child.define(p); }
|
|
|
for e in body { check_expr(e, &child)?; }
|
|
|
Ok(())
|
|
|
}
|
|
|
Expr::If { cond, then, else_ } => {
|
|
|
check_expr(cond, env)?;
|
|
|
check_expr(then, env)?;
|
|
|
check_expr(else_, env)
|
|
|
}
|
|
|
// ... Let, Begin, Call, atoms (atoms other than Symbol always pass)
|
|
|
_ => Ok(())
|
|
|
}
|
|
|
}
|
|
|
```
|
|
|
|
|
|
### Two-pass processing for mutual recursion
|
|
|
|
|
|
Top-level `define` forms can reference each other mutually (e.g., `even?` calling `odd?` and vice versa). A single left-to-right pass would reject the second function because the first is not yet defined.
|
|
|
|
|
|
Solution: a two-pass approach.
|
|
|
1. First pass: scan all top-level `Expr::Define` forms and add their names to the global env.
|
|
|
2. Second pass: check every expression with the fully-populated global env.
|
|
|
|
|
|
Show this in the `analyse` entry point:
|
|
|
|
|
|
```rust
|
|
|
pub fn analyse(exprs: Vec<Expr>) -> Result<Vec<Expr>, CompileError> {
|
|
|
let mut env = global_env();
|
|
|
// First pass: register all top-level names
|
|
|
for expr in &exprs {
|
|
|
if let Expr::Define { name, .. } = expr {
|
|
|
env.define(name);
|
|
|
}
|
|
|
}
|
|
|
// Second pass: check all expressions
|
|
|
for expr in &exprs {
|
|
|
check_expr(expr, &env)?;
|
|
|
}
|
|
|
Ok(exprs)
|
|
|
}
|
|
|
```
|
|
|
|
|
|
### Unit tests
|
|
|
|
|
|
Test: undefined symbol rejected, mutually recursive defines accepted, lambda scope is isolated, let bindings are in scope inside body.
|
|
|
|
|
|
## Style notes
|
|
|
|
|
|
- Motivate the environment chain before defining it — readers who have not seen this technique before will find it conceptually elegant once explained
|
|
|
- The two-pass trick is a genuine insight — give it appropriate emphasis
|
|
|
- Note that we return `Ok(exprs)` unchanged — the analyser is purely a checker; it does not transform the AST
|