vibed/edu/.nbd/tickets/a4c9f8.md

+++
title = "§9 Parsing S-Expressions and Special Forms"
priority = 5
status = "todo"
ticket_type = "task"
dependencies = []
+++

## §9 Parsing S-Expressions and Special Forms — Stub to fill

File: `edu/src/lisp-compiler.md`, section `### 9. Parsing S-Expressions and Special Forms`

Replace the stub line with full content. Target 1000–1300 words. This is the hardest parsing section — recursive parsers, special-form recognition, and the top-level `parse` entry point.

## Learning objectives

- Write a recursive parser in nom (handling the recursion challenge)
- Distinguish special forms from generic calls during parsing and produce typed AST variants
- Parse `define`, `lambda`, `if`, `let`, `begin` into the correct `Expr` variants
- Implement the top-level `parse` function
- Understand when to use `cut` to commit to a parse branch

## Content to write

### The Recursion Problem in nom

nom parsers must have known types at compile time, but a parser for S-expressions is recursive: an expression is either an atom or a list of expressions. Rust's type system normally prevents this with "infinite type" errors.

Solution: use a function definition rather than a closure, and break the cycle with a forward reference. In Rust, a named function works because the function pointer has a known size.

```rust
pub fn parse_expr(input: &str) -> IResult<&str, Expr> {
    ws(alt((
        parse_list,
        parse_atom,
    )))(input)
}
```

`parse_list` calls `parse_expr` recursively. Because `parse_expr` is a named function (not a closure), its type is `fn(&str) -> IResult<&str, Expr>` — a known size — so the recursion is fine.

### Parsing Generic Lists → Calls

A generic list `(func arg1 arg2 ...)` is parsed into `Expr::Call`:

```rust
fn parse_call(input: &str) -> IResult<&str, Expr> {
    let (input, exprs) = delimited(
        ws(char('(')),
        many1(ws(parse_expr)),
        ws(char(')')),
    )(input)?;
    let mut iter = exprs.into_iter();
    let func = iter.next().unwrap(); // safe: many1 guarantees >= 1
    let args = iter.collect();
    Ok((input, Expr::Call { func: Box::new(func), args }))
}
```

### Recognizing Special Forms

Special forms are lists that begin with a specific keyword. Recognize them *inside* the list parser by peeking at the first token. The cleanest approach: try each special-form parser in an `alt` before falling back to `parse_call`.

```rust
fn parse_list(input: &str) -> IResult<&str, Expr> {
    alt((
        parse_define,
        parse_lambda,
        parse_if,
        parse_let,
        parse_begin,
        parse_call,
    ))(input)
}
```

### Parsing `define`

Two shapes: `(define name expr)` and `(define (name params...) body...)`. Parse both; the second desugars into a `Define` wrapping a `Lambda`.

```rust
fn parse_define(input: &str) -> IResult<&str, Expr> {
    let (input, _) = ws(char('('))(input)?;
    let (input, _) = ws(tag("define"))(input)?;
    // Use cut here: we've seen "(define", so commit to this branch
    cut(|input| {
        alt((
            // Function shorthand: (define (name params...) body...)
            |input| {
                let (input, _) = ws(char('('))(input)?;
                let (input, name) = ws(parse_symbol_str)(input)?;
                let (input, params) = many0(ws(parse_symbol_str))(input)?;
                let (input, _) = ws(char(')'))(input)?;
                let (input, body) = many1(ws(parse_expr))(input)?;
                let (input, _) = ws(char(')'))(input)?;
                let lambda = Expr::Lambda { params, body };
                Ok((input, Expr::Define { name: name.to_string(), value: Box::new(lambda) }))
            },
            // Variable binding: (define name expr)
            |input| {
                let (input, name) = ws(parse_symbol_str)(input)?;
                let (input, value) = ws(parse_expr)(input)?;
                let (input, _) = ws(char(')'))(input)?;
                Ok((input, Expr::Define { name: name.to_string(), value: Box::new(value) }))
            },
        ))(input)
    })(input)
}
```

Explain `cut`: after matching `(define`, we are committed to this branch. If the body is malformed, `cut` converts recoverable errors to failures, producing better error messages and preventing backtracking to `parse_call`.

### Parsing `lambda`, `if`, `let`, `begin`

Show each parser in similar style. Key details:

**`lambda`**: `(lambda (params...) body...)` — use `many0` for params (zero-parameter functions are valid), `many1` for body.

**`if`**: `(if cond then else)` — exactly three sub-expressions; the third (`else`) is required in MiniLisp.

**`let`**: `(let ((name expr)...) body...)` — parse a list of `(name expr)` pairs, collect into `Vec<(String, Expr)>`.

**`begin`**: `(begin expr...)` — one or more expressions.

### Comments in the expression parser

Comments must be silently consumed wherever whitespace is allowed. Update `ws` (or create a separate `skip` combinator) to skip both whitespace and comments:

```rust
fn skip(input: &str) -> IResult<&str, ()> {
    value((), many0(alt((
        value((), multispace1),
        value((), pair(char(';'), opt(is_not("\n\r")))),
    ))))(input)
}
```

Then use `skip` in place of `multispace0` in the `ws` wrapper.

### The top-level `parse` function

```rust
/// Parse a complete MiniLisp program (zero or more top-level expressions).
pub fn parse(source: &str) -> Result<Vec<Expr>, crate::error::CompileError> {
    let (remaining, exprs) = many0(ws(parse_expr))(source)
        .map_err(|e| crate::error::CompileError::ParseError(e.to_string()))?;
    if !remaining.trim().is_empty() {
        return Err(crate::error::CompileError::ParseError(
            format!("unexpected input: {:?}", &remaining[..remaining.len().min(20)])
        ));
    }
    Ok(exprs)
}
```

### Unit tests

```rust
#[test]
fn test_parse_if() {
    let src = "(if #t 1 2)";
    let result = parse(src).unwrap();
    assert_eq!(result.len(), 1);
    assert!(matches!(result[0], Expr::If { .. }));
}

#[test]
fn test_parse_define_fn() {
    let src = "(define (add a b) (+ a b))";
    let result = parse(src).unwrap();
    assert!(matches!(&result[0], Expr::Define { name, .. } if name == "add"));
}

#[test]
fn test_nested_calls() {
    let src = "(display (* 2 (+ 3 4)))";
    assert!(parse(src).is_ok());
}

#[test]
fn test_comments_skipped() {
    let src = "; this is a comment\n(define x 42)";
    assert!(parse(src).is_ok());
}
```

## Style notes

- The recursion problem is the hardest conceptual moment — explain it thoroughly before showing the solution
- `cut` is essential for good error messages; explain why each use of `cut` is there
- The top-level `parse` function must check for unconsumed input — show why (trailing garbage would otherwise be silently ignored)
- End with a checkpoint: parse the complete factorial example and print the AST using the `Display` impl from §7