--- # edu-tzzh title: §9 Parsing S-Expressions and Special Forms status: completed type: task priority: normal created_at: 2026-03-10T23:30:02Z updated_at: 2026-03-10T23:30:02Z --- ## §9 Parsing S-Expressions and Special Forms — Stub to fill File: `edu/src/lisp-compiler.md`, section `### 9. Parsing S-Expressions and Special Forms` Replace the stub line with full content. Target 1000–1300 words. This is the hardest parsing section — recursive parsers, special-form recognition, and the top-level `parse` entry point. ## Learning objectives - Write a recursive parser in nom (handling the recursion challenge) - Distinguish special forms from generic calls during parsing and produce typed AST variants - Parse `define`, `lambda`, `if`, `let`, `begin` into the correct `Expr` variants - Implement the top-level `parse` function - Understand when to use `cut` to commit to a parse branch ## Content to write ### The Recursion Problem in nom nom parsers must have known types at compile time, but a parser for S-expressions is recursive: an expression is either an atom or a list of expressions. Rust's type system normally prevents this with "infinite type" errors. Solution: use a function definition rather than a closure, and break the cycle with a forward reference. In Rust, a named function works because the function pointer has a known size. ```rust pub fn parse_expr(input: &str) -> IResult<&str, Expr> { ws(alt(( parse_list, parse_atom, )))(input) } ``` `parse_list` calls `parse_expr` recursively. Because `parse_expr` is a named function (not a closure), its type is `fn(&str) -> IResult<&str, Expr>` — a known size — so the recursion is fine. ### Parsing Generic Lists → Calls A generic list `(func arg1 arg2 ...)` is parsed into `Expr::Call`: ```rust fn parse_call(input: &str) -> IResult<&str, Expr> { let (input, exprs) = delimited( ws(char('(')), many1(ws(parse_expr)), ws(char(')')), )(input)?; let mut iter = exprs.into_iter(); let func = iter.next().unwrap(); // safe: many1 guarantees >= 1 let args = iter.collect(); Ok((input, Expr::Call { func: Box::new(func), args })) } ``` ### Recognizing Special Forms Special forms are lists that begin with a specific keyword. Recognize them *inside* the list parser by peeking at the first token. The cleanest approach: try each special-form parser in an `alt` before falling back to `parse_call`. ```rust fn parse_list(input: &str) -> IResult<&str, Expr> { alt(( parse_define, parse_lambda, parse_if, parse_let, parse_begin, parse_call, ))(input) } ``` ### Parsing `define` Two shapes: `(define name expr)` and `(define (name params...) body...)`. Parse both; the second desugars into a `Define` wrapping a `Lambda`. ```rust fn parse_define(input: &str) -> IResult<&str, Expr> { let (input, _) = ws(char('('))(input)?; let (input, _) = ws(tag("define"))(input)?; // Use cut here: we've seen "(define", so commit to this branch cut(|input| { alt(( // Function shorthand: (define (name params...) body...) |input| { let (input, _) = ws(char('('))(input)?; let (input, name) = ws(parse_symbol_str)(input)?; let (input, params) = many0(ws(parse_symbol_str))(input)?; let (input, _) = ws(char(')'))(input)?; let (input, body) = many1(ws(parse_expr))(input)?; let (input, _) = ws(char(')'))(input)?; let lambda = Expr::Lambda { params, body }; Ok((input, Expr::Define { name: name.to_string(), value: Box::new(lambda) })) }, // Variable binding: (define name expr) |input| { let (input, name) = ws(parse_symbol_str)(input)?; let (input, value) = ws(parse_expr)(input)?; let (input, _) = ws(char(')'))(input)?; Ok((input, Expr::Define { name: name.to_string(), value: Box::new(value) })) }, ))(input) })(input) } ``` Explain `cut`: after matching `(define`, we are committed to this branch. If the body is malformed, `cut` converts recoverable errors to failures, producing better error messages and preventing backtracking to `parse_call`. ### Parsing `lambda`, `if`, `let`, `begin` Show each parser in similar style. Key details: **`lambda`**: `(lambda (params...) body...)` — use `many0` for params (zero-parameter functions are valid), `many1` for body. **`if`**: `(if cond then else)` — exactly three sub-expressions; the third (`else`) is required in MiniLisp. **`let`**: `(let ((name expr)...) body...)` — parse a list of `(name expr)` pairs, collect into `Vec<(String, Expr)>`. **`begin`**: `(begin expr...)` — one or more expressions. ### Comments in the expression parser Comments must be silently consumed wherever whitespace is allowed. Update `ws` (or create a separate `skip` combinator) to skip both whitespace and comments: ```rust fn skip(input: &str) -> IResult<&str, ()> { value((), many0(alt(( value((), multispace1), value((), pair(char(';'), opt(is_not("\n\r")))), ))))(input) } ``` Then use `skip` in place of `multispace0` in the `ws` wrapper. ### The top-level `parse` function ```rust /// Parse a complete MiniLisp program (zero or more top-level expressions). pub fn parse(source: &str) -> Result, crate::error::CompileError> { let (remaining, exprs) = many0(ws(parse_expr))(source) .map_err(|e| crate::error::CompileError::ParseError(e.to_string()))?; if !remaining.trim().is_empty() { return Err(crate::error::CompileError::ParseError( format!("unexpected input: {:?}", &remaining[..remaining.len().min(20)]) )); } Ok(exprs) } ``` ### Unit tests ```rust #[test] fn test_parse_if() { let src = "(if #t 1 2)"; let result = parse(src).unwrap(); assert_eq!(result.len(), 1); assert!(matches!(result[0], Expr::If { .. })); } #[test] fn test_parse_define_fn() { let src = "(define (add a b) (+ a b))"; let result = parse(src).unwrap(); assert!(matches!(&result[0], Expr::Define { name, .. } if name == "add")); } #[test] fn test_nested_calls() { let src = "(display (* 2 (+ 3 4)))"; assert!(parse(src).is_ok()); } #[test] fn test_comments_skipped() { let src = "; this is a comment\n(define x 42)"; assert!(parse(src).is_ok()); } ``` ## Style notes - The recursion problem is the hardest conceptual moment — explain it thoroughly before showing the solution - `cut` is essential for good error messages; explain why each use of `cut` is there - The top-level `parse` function must check for unconsumed input — show why (trailing garbage would otherwise be silently ignored) - End with a checkpoint: parse the complete factorial example and print the AST using the `Display` impl from §7