You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
vibed/edu/.beans/archive/edu-h3yx--10-symbol-tables-...

5.5 KiB

title status type priority created_at updated_at
§10 Symbol Tables and Scope completed task normal 2026-03-10T23:30:02Z 2026-03-10T23:30:02Z

§10 Symbol Tables and Scope — Stub to fill

File: edu/src/lisp-compiler.md, section ### 10. Symbol Tables and Scope

Replace the stub line with full content. Target 700900 words. Build the environment chain that represents lexical scope, then write the scope-checking traversal. Reading-heavy with moderate code.

Learning objectives

  • Understand what a symbol table is and why it is needed
  • Implement an environment chain (linked scope structure) in Rust
  • Write an AST traversal that resolves all symbol references
  • Produce clear SemanticError messages for undefined names

Content to write

What is a symbol table?

A symbol table maps names to information about them — where they are defined, their type (if we had types), and any other metadata. Our symbol table is simple: a set of names that are currently in scope. We just need to know whether a name is defined; we do not need to know what it is (no type information).

Lexical scope

In MiniLisp, scope is lexical (also called static): a name's binding is determined by the syntactic structure of the program, not by the runtime call stack. When you write (lambda (x) x), x is in scope inside the lambda body regardless of what x means in the surrounding context.

The environment chain

Represent scope as a chain of HashSet<String>, one per scope level. Looking up a name means searching from innermost to outermost.

use std::collections::HashSet;

/// A chain of scopes representing the current lexical environment.
pub struct Env<'a> {
    names: HashSet<String>,
    parent: Option<&'a Env<'a>>,
}

impl<'a> Env<'a> {
    pub fn new() -> Self {
        Env { names: HashSet::new(), parent: None }
    }

    pub fn child(&'a self) -> Env<'a> {
        Env { names: HashSet::new(), parent: Some(self) }
    }

    pub fn define(&mut self, name: &str) {
        self.names.insert(name.to_string());
    }

    pub fn is_defined(&self, name: &str) -> bool {
        self.names.contains(name) || self.parent.map_or(false, |p| p.is_defined(name))
    }
}

Explain the lifetime 'a: the child env borrows the parent env. Since children always have shorter lifetimes than parents (they go out of scope at the closing ) of a lambda or let), this is safe.

Pre-populating the global environment

Built-in operators and functions (+, -, *, /, =, <, >, <=, >=, not, display, newline, error) must be defined in the global env from the start — they are always available without a define.

pub fn global_env() -> Env<'static> {
    let mut env = Env::new();
    for name in ["+", "-", "*", "/", "=", "<", ">", "<=", ">=",
                 "not", "display", "newline", "error"] {
        env.define(name);
    }
    env
}

The scope-checking traversal

Walk the Vec<Expr> and call check_expr on each. The check_expr function pattern-matches on each Expr variant:

pub fn check_expr(expr: &Expr, env: &Env) -> Result<(), CompileError> {
    match expr {
        Expr::Symbol(name) => {
            if !env.is_defined(name) {
                return Err(CompileError::SemanticError(
                    format!("undefined symbol: `{}`", name)
                ));
            }
            Ok(())
        }
        Expr::Define { name, value } => {
            check_expr(value, env)?;
            // Note: we don't add `name` to env here because top-level defines
            // are processed in a first pass (see below).
            Ok(())
        }
        Expr::Lambda { params, body } => {
            let mut child = env.child();
            for p in params { child.define(p); }
            for e in body { check_expr(e, &child)?; }
            Ok(())
        }
        Expr::If { cond, then, else_ } => {
            check_expr(cond, env)?;
            check_expr(then, env)?;
            check_expr(else_, env)
        }
        // ... Let, Begin, Call, atoms (atoms other than Symbol always pass)
        _ => Ok(())
    }
}

Two-pass processing for mutual recursion

Top-level define forms can reference each other mutually (e.g., even? calling odd? and vice versa). A single left-to-right pass would reject the second function because the first is not yet defined.

Solution: a two-pass approach.

  1. First pass: scan all top-level Expr::Define forms and add their names to the global env.
  2. Second pass: check every expression with the fully-populated global env.

Show this in the analyse entry point:

pub fn analyse(exprs: Vec<Expr>) -> Result<Vec<Expr>, CompileError> {
    let mut env = global_env();
    // First pass: register all top-level names
    for expr in &exprs {
        if let Expr::Define { name, .. } = expr {
            env.define(name);
        }
    }
    // Second pass: check all expressions
    for expr in &exprs {
        check_expr(expr, &env)?;
    }
    Ok(exprs)
}

Unit tests

Test: undefined symbol rejected, mutually recursive defines accepted, lambda scope is isolated, let bindings are in scope inside body.

Style notes

  • Motivate the environment chain before defining it — readers who have not seen this technique before will find it conceptually elegant once explained
  • The two-pass trick is a genuine insight — give it appropriate emphasis
  • Note that we return Ok(exprs) unchanged — the analyser is purely a checker; it does not transform the AST