6.8 KiB
+++ title = "§9 Parsing S-Expressions and Special Forms" priority = 5 status = "todo" ticket_type = "task" dependencies = [] +++
§9 Parsing S-Expressions and Special Forms — Stub to fill
File: edu/src/lisp-compiler.md, section ### 9. Parsing S-Expressions and Special Forms
Replace the stub line with full content. Target 1000–1300 words. This is the hardest parsing section — recursive parsers, special-form recognition, and the top-level parse entry point.
Learning objectives
- Write a recursive parser in nom (handling the recursion challenge)
- Distinguish special forms from generic calls during parsing and produce typed AST variants
- Parse
define,lambda,if,let,begininto the correctExprvariants - Implement the top-level
parsefunction - Understand when to use
cutto commit to a parse branch
Content to write
The Recursion Problem in nom
nom parsers must have known types at compile time, but a parser for S-expressions is recursive: an expression is either an atom or a list of expressions. Rust's type system normally prevents this with "infinite type" errors.
Solution: use a function definition rather than a closure, and break the cycle with a forward reference. In Rust, a named function works because the function pointer has a known size.
pub fn parse_expr(input: &str) -> IResult<&str, Expr> {
ws(alt((
parse_list,
parse_atom,
)))(input)
}
parse_list calls parse_expr recursively. Because parse_expr is a named function (not a closure), its type is fn(&str) -> IResult<&str, Expr> — a known size — so the recursion is fine.
Parsing Generic Lists → Calls
A generic list (func arg1 arg2 ...) is parsed into Expr::Call:
fn parse_call(input: &str) -> IResult<&str, Expr> {
let (input, exprs) = delimited(
ws(char('(')),
many1(ws(parse_expr)),
ws(char(')')),
)(input)?;
let mut iter = exprs.into_iter();
let func = iter.next().unwrap(); // safe: many1 guarantees >= 1
let args = iter.collect();
Ok((input, Expr::Call { func: Box::new(func), args }))
}
Recognizing Special Forms
Special forms are lists that begin with a specific keyword. Recognize them inside the list parser by peeking at the first token. The cleanest approach: try each special-form parser in an alt before falling back to parse_call.
fn parse_list(input: &str) -> IResult<&str, Expr> {
alt((
parse_define,
parse_lambda,
parse_if,
parse_let,
parse_begin,
parse_call,
))(input)
}
Parsing define
Two shapes: (define name expr) and (define (name params...) body...). Parse both; the second desugars into a Define wrapping a Lambda.
fn parse_define(input: &str) -> IResult<&str, Expr> {
let (input, _) = ws(char('('))(input)?;
let (input, _) = ws(tag("define"))(input)?;
// Use cut here: we've seen "(define", so commit to this branch
cut(|input| {
alt((
// Function shorthand: (define (name params...) body...)
|input| {
let (input, _) = ws(char('('))(input)?;
let (input, name) = ws(parse_symbol_str)(input)?;
let (input, params) = many0(ws(parse_symbol_str))(input)?;
let (input, _) = ws(char(')'))(input)?;
let (input, body) = many1(ws(parse_expr))(input)?;
let (input, _) = ws(char(')'))(input)?;
let lambda = Expr::Lambda { params, body };
Ok((input, Expr::Define { name: name.to_string(), value: Box::new(lambda) }))
},
// Variable binding: (define name expr)
|input| {
let (input, name) = ws(parse_symbol_str)(input)?;
let (input, value) = ws(parse_expr)(input)?;
let (input, _) = ws(char(')'))(input)?;
Ok((input, Expr::Define { name: name.to_string(), value: Box::new(value) }))
},
))(input)
})(input)
}
Explain cut: after matching (define, we are committed to this branch. If the body is malformed, cut converts recoverable errors to failures, producing better error messages and preventing backtracking to parse_call.
Parsing lambda, if, let, begin
Show each parser in similar style. Key details:
lambda: (lambda (params...) body...) — use many0 for params (zero-parameter functions are valid), many1 for body.
if: (if cond then else) — exactly three sub-expressions; the third (else) is required in MiniLisp.
let: (let ((name expr)...) body...) — parse a list of (name expr) pairs, collect into Vec<(String, Expr)>.
begin: (begin expr...) — one or more expressions.
Comments in the expression parser
Comments must be silently consumed wherever whitespace is allowed. Update ws (or create a separate skip combinator) to skip both whitespace and comments:
fn skip(input: &str) -> IResult<&str, ()> {
value((), many0(alt((
value((), multispace1),
value((), pair(char(';'), opt(is_not("\n\r")))),
))))(input)
}
Then use skip in place of multispace0 in the ws wrapper.
The top-level parse function
/// Parse a complete MiniLisp program (zero or more top-level expressions).
pub fn parse(source: &str) -> Result<Vec<Expr>, crate::error::CompileError> {
let (remaining, exprs) = many0(ws(parse_expr))(source)
.map_err(|e| crate::error::CompileError::ParseError(e.to_string()))?;
if !remaining.trim().is_empty() {
return Err(crate::error::CompileError::ParseError(
format!("unexpected input: {:?}", &remaining[..remaining.len().min(20)])
));
}
Ok(exprs)
}
Unit tests
#[test]
fn test_parse_if() {
let src = "(if #t 1 2)";
let result = parse(src).unwrap();
assert_eq!(result.len(), 1);
assert!(matches!(result[0], Expr::If { .. }));
}
#[test]
fn test_parse_define_fn() {
let src = "(define (add a b) (+ a b))";
let result = parse(src).unwrap();
assert!(matches!(&result[0], Expr::Define { name, .. } if name == "add"));
}
#[test]
fn test_nested_calls() {
let src = "(display (* 2 (+ 3 4)))";
assert!(parse(src).is_ok());
}
#[test]
fn test_comments_skipped() {
let src = "; this is a comment\n(define x 42)";
assert!(parse(src).is_ok());
}
Style notes
- The recursion problem is the hardest conceptual moment — explain it thoroughly before showing the solution
cutis essential for good error messages; explain why each use ofcutis there- The top-level
parsefunction must check for unconsumed input — show why (trailing garbage would otherwise be silently ignored) - End with a checkpoint: parse the complete factorial example and print the AST using the
Displayimpl from §7