6.8 KiB
| title | status | type | priority | created_at | updated_at |
|---|---|---|---|---|---|
| §4 Introduction to nom: Parser Combinators | completed | task | normal | 2026-03-10T23:30:00Z | 2026-03-10T23:30:00Z |
§4 Introduction to nom: Parser Combinators — Stub to fill
File: edu/src/lisp-compiler.md, section ### 4. Introduction to nom: Parser Combinators
Replace the stub line with full content. Target 900–1200 words. This is the conceptual and practical foundation for all parsing in the course. The reader needs to understand nom well enough to write parsers without hand-holding by §8.
Learning objectives
- Understand what a parser combinator is and why it is better than hand-rolling a recursive descent parser for our purposes
- Understand
IResult<I, O, E>and what its three variants mean - Know and be able to use:
tag,char,alpha1,digit1,multispace0,alt,many0,map,map_res,tuple,delimited,preceded,terminated,opt,recognize,verify,cut - Know how to write a parser function, call it, and test it
- Know how to use the
wswhitespace-wrapper pattern
Content to write
What is a parser combinator?
A parser combinator is a function that takes one or more parsers and returns a new parser. Individual parsers handle small fragments of input; combinators compose them into larger parsers. The result is a parser written entirely in the host language (Rust), with no grammar files, no code generation, and no build-time magic.
Contrast with traditional parser generators (ANTLR, yacc): those require a separate grammar file, a code-generation step, and often a bespoke DSL for semantic actions. nom parsers are plain Rust functions.
The IResult Type
type IResult<I, O, E = nom::error::Error<I>> = Result<(I, O), nom::Err<E>>;
On success: Ok((remaining_input, output)). The parser consumed some input and produced a value; remaining_input is whatever was left.
On failure (recoverable): Err(nom::Err::Error(e)). The parser tried and failed; the caller can try an alternative.
On failure (unrecoverable): Err(nom::Err::Failure(e)). The parser is committed — no alternatives should be tried. Triggered by cut.
The key insight: parsers return the remaining input. This is what makes composition work — one parser's remaining output is the next parser's input.
Writing a Parser
Show the anatomy of a parser function:
use nom::{IResult, bytes::complete::tag};
fn parse_hello(input: &str) -> IResult<&str, &str> {
tag("hello")(input)
}
#[test]
fn test_parse_hello() {
assert_eq!(parse_hello("hello world"), Ok((" world", "hello")));
assert!(parse_hello("goodbye").is_err());
}
Essential Combinators
Work through each combinator with a small standalone example:
tag(s) — match a literal string.
tag("(")(input) // matches the literal "("
char(c) — match a single character.
char('(')(input)
alpha1, digit1, alphanumeric1 — match one or more letters/digits/alphanumerics.
multispace0, multispace1 — match zero/one or more whitespace characters.
alt((p1, p2, ...)) — try each parser in order; return the first success.
alt((tag("true"), tag("false")))(input)
many0(p) — apply p zero or more times; return Vec<O>.
map(p, f) — transform a parser's output.
map(digit1, |s: &str| s.parse::<i64>().unwrap())
map_res(p, f) — like map but f returns Result; propagates errors.
map_res(digit1, |s: &str| s.parse::<i64>())
tuple((p1, p2, ...)) — run parsers in sequence; collect outputs as a tuple.
delimited(open, inner, close) — parse open, inner, close; return only inner's output. Perfect for parenthesised expressions.
delimited(char('('), inner_parser, char(')'))(input)
preceded(prefix, inner) — parse prefix then inner; return only inner.
terminated(inner, suffix) — parse inner then suffix; return only inner.
opt(p) — make p optional; returns Option<O>.
recognize(p) — run p but return the input slice it consumed rather than its output. Useful for building string slices from composed parsers.
verify(p, pred) — run p, then apply predicate pred; fail if predicate returns false.
cut(p) — mark this branch as committed; convert recoverable errors into unrecoverable ones. Use after a discriminating tag (e.g., after matching (define, commit to parsing a define form).
The ws Combinator Pattern
Whitespace appears between any two tokens in Lisp. Define a helper that strips whitespace before and after any parser:
use nom::{Parser, IResult, character::complete::multispace0, sequence::delimited};
use nom::error::ParseError;
pub fn ws<'a, O, E, F>(inner: F) -> impl Parser<&'a str, Output = O, Error = E>
where
E: ParseError<&'a str>,
F: Parser<&'a str, Output = O, Error = E>,
{
delimited(multispace0, inner, multispace0)
}
Testing parsers
Show the pattern: use assert_eq! on Ok((remaining, output)) for success cases, assert!(result.is_err()) for failure cases. Note that remaining input is part of the assertion — it is easy to accidentally under-consume.
nom 8 API note
nom 8 changed the parser API: combinators now return types that implement Parser<I> rather than closures. Call .parse(input) on them, or pass input directly as combinator(args)(input). The Parser trait is in scope with use nom::Parser;. Reference: nom changelog.
Key references
- nom README: https://github.com/rust-bakery/nom
nom::bytes::completemodule (tag, take_while, take_until, is_not)nom::character::completemodule (char, alpha1, digit1, multispace0)nom::sequencemodule (delimited, preceded, terminated, tuple, pair)nom::multimodule (many0, many1, separated_list0)nom::combinatormodule (map, map_res, opt, recognize, verify, cut, value)nom::branchmodule (alt)- Recipes: https://github.com/rust-bakery/nom/blob/main/doc/nom_recipes.md
Exercises to include
- Write a parser for
#tand#fbooleans usingaltandtag - Write a parser for a C-style identifier (starts with letter or
_, then alphanumeric or_) - Write a parser for a decimal integer using
recognize,opt(char('-')), anddigit1 - Compose the above three into an
altthat returns a string slice matching any of them
Each exercise should have a collapsible reference solution.
Style notes
- Introduce
IResultbefore showing any combinator — readers need to understand the return type to understand what combinators are doing - Show every combinator with a working code snippet, not just a description
- Make the
wswrapper a "save this — you will use it throughout" moment