You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
vibed/edu/.beans/edu-h3yx--10-symbol-tables-...

157 lines
5.5 KiB
Markdown

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

---
# edu-h3yx
title: §10 Symbol Tables and Scope
status: completed
type: task
priority: normal
created_at: 2026-03-10T23:30:02Z
updated_at: 2026-03-10T23:30:02Z
---
## §10 Symbol Tables and Scope — Stub to fill
File: `edu/src/lisp-compiler.md`, section `### 10. Symbol Tables and Scope`
Replace the stub line with full content. Target 700900 words. Build the environment chain that represents lexical scope, then write the scope-checking traversal. Reading-heavy with moderate code.
## Learning objectives
- Understand what a symbol table is and why it is needed
- Implement an environment chain (linked scope structure) in Rust
- Write an AST traversal that resolves all symbol references
- Produce clear `SemanticError` messages for undefined names
## Content to write
### What is a symbol table?
A symbol table maps names to information about them — where they are defined, their type (if we had types), and any other metadata. Our symbol table is simple: a set of names that are currently in scope. We just need to know *whether* a name is defined; we do not need to know *what* it is (no type information).
### Lexical scope
In MiniLisp, scope is lexical (also called static): a name's binding is determined by the syntactic structure of the program, not by the runtime call stack. When you write `(lambda (x) x)`, `x` is in scope inside the lambda body regardless of what `x` means in the surrounding context.
### The environment chain
Represent scope as a chain of `HashSet<String>`, one per scope level. Looking up a name means searching from innermost to outermost.
```rust
use std::collections::HashSet;
/// A chain of scopes representing the current lexical environment.
pub struct Env<'a> {
names: HashSet<String>,
parent: Option<&'a Env<'a>>,
}
impl<'a> Env<'a> {
pub fn new() -> Self {
Env { names: HashSet::new(), parent: None }
}
pub fn child(&'a self) -> Env<'a> {
Env { names: HashSet::new(), parent: Some(self) }
}
pub fn define(&mut self, name: &str) {
self.names.insert(name.to_string());
}
pub fn is_defined(&self, name: &str) -> bool {
self.names.contains(name) || self.parent.map_or(false, |p| p.is_defined(name))
}
}
```
Explain the lifetime `'a`: the child env borrows the parent env. Since children always have shorter lifetimes than parents (they go out of scope at the closing `)` of a `lambda` or `let`), this is safe.
### Pre-populating the global environment
Built-in operators and functions (`+`, `-`, `*`, `/`, `=`, `<`, `>`, `<=`, `>=`, `not`, `display`, `newline`, `error`) must be defined in the global env from the start — they are always available without a `define`.
```rust
pub fn global_env() -> Env<'static> {
let mut env = Env::new();
for name in ["+", "-", "*", "/", "=", "<", ">", "<=", ">=",
"not", "display", "newline", "error"] {
env.define(name);
}
env
}
```
### The scope-checking traversal
Walk the `Vec<Expr>` and call `check_expr` on each. The `check_expr` function pattern-matches on each `Expr` variant:
```rust
pub fn check_expr(expr: &Expr, env: &Env) -> Result<(), CompileError> {
match expr {
Expr::Symbol(name) => {
if !env.is_defined(name) {
return Err(CompileError::SemanticError(
format!("undefined symbol: `{}`", name)
));
}
Ok(())
}
Expr::Define { name, value } => {
check_expr(value, env)?;
// Note: we don't add `name` to env here because top-level defines
// are processed in a first pass (see below).
Ok(())
}
Expr::Lambda { params, body } => {
let mut child = env.child();
for p in params { child.define(p); }
for e in body { check_expr(e, &child)?; }
Ok(())
}
Expr::If { cond, then, else_ } => {
check_expr(cond, env)?;
check_expr(then, env)?;
check_expr(else_, env)
}
// ... Let, Begin, Call, atoms (atoms other than Symbol always pass)
_ => Ok(())
}
}
```
### Two-pass processing for mutual recursion
Top-level `define` forms can reference each other mutually (e.g., `even?` calling `odd?` and vice versa). A single left-to-right pass would reject the second function because the first is not yet defined.
Solution: a two-pass approach.
1. First pass: scan all top-level `Expr::Define` forms and add their names to the global env.
2. Second pass: check every expression with the fully-populated global env.
Show this in the `analyse` entry point:
```rust
pub fn analyse(exprs: Vec<Expr>) -> Result<Vec<Expr>, CompileError> {
let mut env = global_env();
// First pass: register all top-level names
for expr in &exprs {
if let Expr::Define { name, .. } = expr {
env.define(name);
}
}
// Second pass: check all expressions
for expr in &exprs {
check_expr(expr, &env)?;
}
Ok(exprs)
}
```
### Unit tests
Test: undefined symbol rejected, mutually recursive defines accepted, lambda scope is isolated, let bindings are in scope inside body.
## Style notes
- Motivate the environment chain before defining it — readers who have not seen this technique before will find it conceptually elegant once explained
- The two-pass trick is a genuine insight — give it appropriate emphasis
- Note that we return `Ok(exprs)` unchanged — the analyser is purely a checker; it does not transform the AST