You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

6.8 KiB

+++ title = "§9 Parsing S-Expressions and Special Forms" priority = 5 status = "done" ticket_type = "task" dependencies = [] +++

§9 Parsing S-Expressions and Special Forms — Stub to fill

File: edu/src/lisp-compiler.md, section ### 9. Parsing S-Expressions and Special Forms

Replace the stub line with full content. Target 10001300 words. This is the hardest parsing section — recursive parsers, special-form recognition, and the top-level parse entry point.

Learning objectives

  • Write a recursive parser in nom (handling the recursion challenge)
  • Distinguish special forms from generic calls during parsing and produce typed AST variants
  • Parse define, lambda, if, let, begin into the correct Expr variants
  • Implement the top-level parse function
  • Understand when to use cut to commit to a parse branch

Content to write

The Recursion Problem in nom

nom parsers must have known types at compile time, but a parser for S-expressions is recursive: an expression is either an atom or a list of expressions. Rust's type system normally prevents this with "infinite type" errors.

Solution: use a function definition rather than a closure, and break the cycle with a forward reference. In Rust, a named function works because the function pointer has a known size.

pub fn parse_expr(input: &str) -> IResult<&str, Expr> {
    ws(alt((
        parse_list,
        parse_atom,
    )))(input)
}

parse_list calls parse_expr recursively. Because parse_expr is a named function (not a closure), its type is fn(&str) -> IResult<&str, Expr> — a known size — so the recursion is fine.

Parsing Generic Lists → Calls

A generic list (func arg1 arg2 ...) is parsed into Expr::Call:

fn parse_call(input: &str) -> IResult<&str, Expr> {
    let (input, exprs) = delimited(
        ws(char('(')),
        many1(ws(parse_expr)),
        ws(char(')')),
    )(input)?;
    let mut iter = exprs.into_iter();
    let func = iter.next().unwrap(); // safe: many1 guarantees >= 1
    let args = iter.collect();
    Ok((input, Expr::Call { func: Box::new(func), args }))
}

Recognizing Special Forms

Special forms are lists that begin with a specific keyword. Recognize them inside the list parser by peeking at the first token. The cleanest approach: try each special-form parser in an alt before falling back to parse_call.

fn parse_list(input: &str) -> IResult<&str, Expr> {
    alt((
        parse_define,
        parse_lambda,
        parse_if,
        parse_let,
        parse_begin,
        parse_call,
    ))(input)
}

Parsing define

Two shapes: (define name expr) and (define (name params...) body...). Parse both; the second desugars into a Define wrapping a Lambda.

fn parse_define(input: &str) -> IResult<&str, Expr> {
    let (input, _) = ws(char('('))(input)?;
    let (input, _) = ws(tag("define"))(input)?;
    // Use cut here: we've seen "(define", so commit to this branch
    cut(|input| {
        alt((
            // Function shorthand: (define (name params...) body...)
            |input| {
                let (input, _) = ws(char('('))(input)?;
                let (input, name) = ws(parse_symbol_str)(input)?;
                let (input, params) = many0(ws(parse_symbol_str))(input)?;
                let (input, _) = ws(char(')'))(input)?;
                let (input, body) = many1(ws(parse_expr))(input)?;
                let (input, _) = ws(char(')'))(input)?;
                let lambda = Expr::Lambda { params, body };
                Ok((input, Expr::Define { name: name.to_string(), value: Box::new(lambda) }))
            },
            // Variable binding: (define name expr)
            |input| {
                let (input, name) = ws(parse_symbol_str)(input)?;
                let (input, value) = ws(parse_expr)(input)?;
                let (input, _) = ws(char(')'))(input)?;
                Ok((input, Expr::Define { name: name.to_string(), value: Box::new(value) }))
            },
        ))(input)
    })(input)
}

Explain cut: after matching (define, we are committed to this branch. If the body is malformed, cut converts recoverable errors to failures, producing better error messages and preventing backtracking to parse_call.

Parsing lambda, if, let, begin

Show each parser in similar style. Key details:

lambda: (lambda (params...) body...) — use many0 for params (zero-parameter functions are valid), many1 for body.

if: (if cond then else) — exactly three sub-expressions; the third (else) is required in MiniLisp.

let: (let ((name expr)...) body...) — parse a list of (name expr) pairs, collect into Vec<(String, Expr)>.

begin: (begin expr...) — one or more expressions.

Comments in the expression parser

Comments must be silently consumed wherever whitespace is allowed. Update ws (or create a separate skip combinator) to skip both whitespace and comments:

fn skip(input: &str) -> IResult<&str, ()> {
    value((), many0(alt((
        value((), multispace1),
        value((), pair(char(';'), opt(is_not("\n\r")))),
    ))))(input)
}

Then use skip in place of multispace0 in the ws wrapper.

The top-level parse function

/// Parse a complete MiniLisp program (zero or more top-level expressions).
pub fn parse(source: &str) -> Result<Vec<Expr>, crate::error::CompileError> {
    let (remaining, exprs) = many0(ws(parse_expr))(source)
        .map_err(|e| crate::error::CompileError::ParseError(e.to_string()))?;
    if !remaining.trim().is_empty() {
        return Err(crate::error::CompileError::ParseError(
            format!("unexpected input: {:?}", &remaining[..remaining.len().min(20)])
        ));
    }
    Ok(exprs)
}

Unit tests

#[test]
fn test_parse_if() {
    let src = "(if #t 1 2)";
    let result = parse(src).unwrap();
    assert_eq!(result.len(), 1);
    assert!(matches!(result[0], Expr::If { .. }));
}

#[test]
fn test_parse_define_fn() {
    let src = "(define (add a b) (+ a b))";
    let result = parse(src).unwrap();
    assert!(matches!(&result[0], Expr::Define { name, .. } if name == "add"));
}

#[test]
fn test_nested_calls() {
    let src = "(display (* 2 (+ 3 4)))";
    assert!(parse(src).is_ok());
}

#[test]
fn test_comments_skipped() {
    let src = "; this is a comment\n(define x 42)";
    assert!(parse(src).is_ok());
}

Style notes

  • The recursion problem is the hardest conceptual moment — explain it thoroughly before showing the solution
  • cut is essential for good error messages; explain why each use of cut is there
  • The top-level parse function must check for unconsumed input — show why (trailing garbage would otherwise be silently ignored)
  • End with a checkpoint: parse the complete factorial example and print the AST using the Display impl from §7