|
|
+++
|
|
|
title = "§9 Parsing S-Expressions and Special Forms"
|
|
|
priority = 5
|
|
|
status = "todo"
|
|
|
ticket_type = "task"
|
|
|
dependencies = []
|
|
|
+++
|
|
|
|
|
|
## §9 Parsing S-Expressions and Special Forms — Stub to fill
|
|
|
|
|
|
File: `edu/src/lisp-compiler.md`, section `### 9. Parsing S-Expressions and Special Forms`
|
|
|
|
|
|
Replace the stub line with full content. Target 1000–1300 words. This is the hardest parsing section — recursive parsers, special-form recognition, and the top-level `parse` entry point.
|
|
|
|
|
|
## Learning objectives
|
|
|
|
|
|
- Write a recursive parser in nom (handling the recursion challenge)
|
|
|
- Distinguish special forms from generic calls during parsing and produce typed AST variants
|
|
|
- Parse `define`, `lambda`, `if`, `let`, `begin` into the correct `Expr` variants
|
|
|
- Implement the top-level `parse` function
|
|
|
- Understand when to use `cut` to commit to a parse branch
|
|
|
|
|
|
## Content to write
|
|
|
|
|
|
### The Recursion Problem in nom
|
|
|
|
|
|
nom parsers must have known types at compile time, but a parser for S-expressions is recursive: an expression is either an atom or a list of expressions. Rust's type system normally prevents this with "infinite type" errors.
|
|
|
|
|
|
Solution: use a function definition rather than a closure, and break the cycle with a forward reference. In Rust, a named function works because the function pointer has a known size.
|
|
|
|
|
|
```rust
|
|
|
pub fn parse_expr(input: &str) -> IResult<&str, Expr> {
|
|
|
ws(alt((
|
|
|
parse_list,
|
|
|
parse_atom,
|
|
|
)))(input)
|
|
|
}
|
|
|
```
|
|
|
|
|
|
`parse_list` calls `parse_expr` recursively. Because `parse_expr` is a named function (not a closure), its type is `fn(&str) -> IResult<&str, Expr>` — a known size — so the recursion is fine.
|
|
|
|
|
|
### Parsing Generic Lists → Calls
|
|
|
|
|
|
A generic list `(func arg1 arg2 ...)` is parsed into `Expr::Call`:
|
|
|
|
|
|
```rust
|
|
|
fn parse_call(input: &str) -> IResult<&str, Expr> {
|
|
|
let (input, exprs) = delimited(
|
|
|
ws(char('(')),
|
|
|
many1(ws(parse_expr)),
|
|
|
ws(char(')')),
|
|
|
)(input)?;
|
|
|
let mut iter = exprs.into_iter();
|
|
|
let func = iter.next().unwrap(); // safe: many1 guarantees >= 1
|
|
|
let args = iter.collect();
|
|
|
Ok((input, Expr::Call { func: Box::new(func), args }))
|
|
|
}
|
|
|
```
|
|
|
|
|
|
### Recognizing Special Forms
|
|
|
|
|
|
Special forms are lists that begin with a specific keyword. Recognize them *inside* the list parser by peeking at the first token. The cleanest approach: try each special-form parser in an `alt` before falling back to `parse_call`.
|
|
|
|
|
|
```rust
|
|
|
fn parse_list(input: &str) -> IResult<&str, Expr> {
|
|
|
alt((
|
|
|
parse_define,
|
|
|
parse_lambda,
|
|
|
parse_if,
|
|
|
parse_let,
|
|
|
parse_begin,
|
|
|
parse_call,
|
|
|
))(input)
|
|
|
}
|
|
|
```
|
|
|
|
|
|
### Parsing `define`
|
|
|
|
|
|
Two shapes: `(define name expr)` and `(define (name params...) body...)`. Parse both; the second desugars into a `Define` wrapping a `Lambda`.
|
|
|
|
|
|
```rust
|
|
|
fn parse_define(input: &str) -> IResult<&str, Expr> {
|
|
|
let (input, _) = ws(char('('))(input)?;
|
|
|
let (input, _) = ws(tag("define"))(input)?;
|
|
|
// Use cut here: we've seen "(define", so commit to this branch
|
|
|
cut(|input| {
|
|
|
alt((
|
|
|
// Function shorthand: (define (name params...) body...)
|
|
|
|input| {
|
|
|
let (input, _) = ws(char('('))(input)?;
|
|
|
let (input, name) = ws(parse_symbol_str)(input)?;
|
|
|
let (input, params) = many0(ws(parse_symbol_str))(input)?;
|
|
|
let (input, _) = ws(char(')'))(input)?;
|
|
|
let (input, body) = many1(ws(parse_expr))(input)?;
|
|
|
let (input, _) = ws(char(')'))(input)?;
|
|
|
let lambda = Expr::Lambda { params, body };
|
|
|
Ok((input, Expr::Define { name: name.to_string(), value: Box::new(lambda) }))
|
|
|
},
|
|
|
// Variable binding: (define name expr)
|
|
|
|input| {
|
|
|
let (input, name) = ws(parse_symbol_str)(input)?;
|
|
|
let (input, value) = ws(parse_expr)(input)?;
|
|
|
let (input, _) = ws(char(')'))(input)?;
|
|
|
Ok((input, Expr::Define { name: name.to_string(), value: Box::new(value) }))
|
|
|
},
|
|
|
))(input)
|
|
|
})(input)
|
|
|
}
|
|
|
```
|
|
|
|
|
|
Explain `cut`: after matching `(define`, we are committed to this branch. If the body is malformed, `cut` converts recoverable errors to failures, producing better error messages and preventing backtracking to `parse_call`.
|
|
|
|
|
|
### Parsing `lambda`, `if`, `let`, `begin`
|
|
|
|
|
|
Show each parser in similar style. Key details:
|
|
|
|
|
|
**`lambda`**: `(lambda (params...) body...)` — use `many0` for params (zero-parameter functions are valid), `many1` for body.
|
|
|
|
|
|
**`if`**: `(if cond then else)` — exactly three sub-expressions; the third (`else`) is required in MiniLisp.
|
|
|
|
|
|
**`let`**: `(let ((name expr)...) body...)` — parse a list of `(name expr)` pairs, collect into `Vec<(String, Expr)>`.
|
|
|
|
|
|
**`begin`**: `(begin expr...)` — one or more expressions.
|
|
|
|
|
|
### Comments in the expression parser
|
|
|
|
|
|
Comments must be silently consumed wherever whitespace is allowed. Update `ws` (or create a separate `skip` combinator) to skip both whitespace and comments:
|
|
|
|
|
|
```rust
|
|
|
fn skip(input: &str) -> IResult<&str, ()> {
|
|
|
value((), many0(alt((
|
|
|
value((), multispace1),
|
|
|
value((), pair(char(';'), opt(is_not("\n\r")))),
|
|
|
))))(input)
|
|
|
}
|
|
|
```
|
|
|
|
|
|
Then use `skip` in place of `multispace0` in the `ws` wrapper.
|
|
|
|
|
|
### The top-level `parse` function
|
|
|
|
|
|
```rust
|
|
|
/// Parse a complete MiniLisp program (zero or more top-level expressions).
|
|
|
pub fn parse(source: &str) -> Result<Vec<Expr>, crate::error::CompileError> {
|
|
|
let (remaining, exprs) = many0(ws(parse_expr))(source)
|
|
|
.map_err(|e| crate::error::CompileError::ParseError(e.to_string()))?;
|
|
|
if !remaining.trim().is_empty() {
|
|
|
return Err(crate::error::CompileError::ParseError(
|
|
|
format!("unexpected input: {:?}", &remaining[..remaining.len().min(20)])
|
|
|
));
|
|
|
}
|
|
|
Ok(exprs)
|
|
|
}
|
|
|
```
|
|
|
|
|
|
### Unit tests
|
|
|
|
|
|
```rust
|
|
|
#[test]
|
|
|
fn test_parse_if() {
|
|
|
let src = "(if #t 1 2)";
|
|
|
let result = parse(src).unwrap();
|
|
|
assert_eq!(result.len(), 1);
|
|
|
assert!(matches!(result[0], Expr::If { .. }));
|
|
|
}
|
|
|
|
|
|
#[test]
|
|
|
fn test_parse_define_fn() {
|
|
|
let src = "(define (add a b) (+ a b))";
|
|
|
let result = parse(src).unwrap();
|
|
|
assert!(matches!(&result[0], Expr::Define { name, .. } if name == "add"));
|
|
|
}
|
|
|
|
|
|
#[test]
|
|
|
fn test_nested_calls() {
|
|
|
let src = "(display (* 2 (+ 3 4)))";
|
|
|
assert!(parse(src).is_ok());
|
|
|
}
|
|
|
|
|
|
#[test]
|
|
|
fn test_comments_skipped() {
|
|
|
let src = "; this is a comment\n(define x 42)";
|
|
|
assert!(parse(src).is_ok());
|
|
|
}
|
|
|
```
|
|
|
|
|
|
## Style notes
|
|
|
|
|
|
- The recursion problem is the hardest conceptual moment — explain it thoroughly before showing the solution
|
|
|
- `cut` is essential for good error messages; explain why each use of `cut` is there
|
|
|
- The top-level `parse` function must check for unconsumed input — show why (trailing garbage would otherwise be silently ignored)
|
|
|
- End with a checkpoint: parse the complete factorial example and print the AST using the `Display` impl from §7
|