+++ title = "§4 Introduction to nom: Parser Combinators" priority = 5 status = "done" ticket_type = "task" dependencies = [] +++ ## §4 Introduction to nom: Parser Combinators — Stub to fill File: `edu/src/lisp-compiler.md`, section `### 4. Introduction to nom: Parser Combinators` Replace the stub line with full content. Target 900–1200 words. This is the conceptual and practical foundation for all parsing in the course. The reader needs to understand nom well enough to write parsers without hand-holding by §8. ## Learning objectives - Understand what a parser combinator is and why it is better than hand-rolling a recursive descent parser for our purposes - Understand `IResult` and what its three variants mean - Know and be able to use: `tag`, `char`, `alpha1`, `digit1`, `multispace0`, `alt`, `many0`, `map`, `map_res`, `tuple`, `delimited`, `preceded`, `terminated`, `opt`, `recognize`, `verify`, `cut` - Know how to write a parser function, call it, and test it - Know how to use the `ws` whitespace-wrapper pattern ## Content to write ### What is a parser combinator? A parser combinator is a function that takes one or more parsers and returns a new parser. Individual parsers handle small fragments of input; combinators compose them into larger parsers. The result is a parser written entirely in the host language (Rust), with no grammar files, no code generation, and no build-time magic. Contrast with traditional parser generators (ANTLR, yacc): those require a separate grammar file, a code-generation step, and often a bespoke DSL for semantic actions. nom parsers are plain Rust functions. ### The `IResult` Type ```rust type IResult> = Result<(I, O), nom::Err>; ``` On success: `Ok((remaining_input, output))`. The parser consumed some input and produced a value; `remaining_input` is whatever was left. On failure (recoverable): `Err(nom::Err::Error(e))`. The parser tried and failed; the caller can try an alternative. On failure (unrecoverable): `Err(nom::Err::Failure(e))`. The parser is committed — no alternatives should be tried. Triggered by `cut`. The key insight: parsers return the *remaining* input. This is what makes composition work — one parser's remaining output is the next parser's input. ### Writing a Parser Show the anatomy of a parser function: ```rust use nom::{IResult, bytes::complete::tag}; fn parse_hello(input: &str) -> IResult<&str, &str> { tag("hello")(input) } #[test] fn test_parse_hello() { assert_eq!(parse_hello("hello world"), Ok((" world", "hello"))); assert!(parse_hello("goodbye").is_err()); } ``` ### Essential Combinators Work through each combinator with a small standalone example: **`tag(s)`** — match a literal string. ```rust tag("(")(input) // matches the literal "(" ``` **`char(c)`** — match a single character. ```rust char('(')(input) ``` **`alpha1`, `digit1`, `alphanumeric1`** — match one or more letters/digits/alphanumerics. **`multispace0`, `multispace1`** — match zero/one or more whitespace characters. **`alt((p1, p2, ...))`** — try each parser in order; return the first success. ```rust alt((tag("true"), tag("false")))(input) ``` **`many0(p)`** — apply `p` zero or more times; return `Vec`. **`map(p, f)`** — transform a parser's output. ```rust map(digit1, |s: &str| s.parse::().unwrap()) ``` **`map_res(p, f)`** — like `map` but `f` returns `Result`; propagates errors. ```rust map_res(digit1, |s: &str| s.parse::()) ``` **`tuple((p1, p2, ...))`** — run parsers in sequence; collect outputs as a tuple. **`delimited(open, inner, close)`** — parse `open`, `inner`, `close`; return only `inner`'s output. Perfect for parenthesised expressions. ```rust delimited(char('('), inner_parser, char(')'))(input) ``` **`preceded(prefix, inner)`** — parse `prefix` then `inner`; return only `inner`. **`terminated(inner, suffix)`** — parse `inner` then `suffix`; return only `inner`. **`opt(p)`** — make `p` optional; returns `Option`. **`recognize(p)`** — run `p` but return the input slice it consumed rather than its output. Useful for building string slices from composed parsers. **`verify(p, pred)`** — run `p`, then apply predicate `pred`; fail if predicate returns false. **`cut(p)`** — mark this branch as committed; convert recoverable errors into unrecoverable ones. Use after a discriminating tag (e.g., after matching `(define`, commit to parsing a define form). ### The `ws` Combinator Pattern Whitespace appears between any two tokens in Lisp. Define a helper that strips whitespace before and after any parser: ```rust use nom::{Parser, IResult, character::complete::multispace0, sequence::delimited}; use nom::error::ParseError; pub fn ws<'a, O, E, F>(inner: F) -> impl Parser<&'a str, Output = O, Error = E> where E: ParseError<&'a str>, F: Parser<&'a str, Output = O, Error = E>, { delimited(multispace0, inner, multispace0) } ``` ### Testing parsers Show the pattern: use `assert_eq!` on `Ok((remaining, output))` for success cases, `assert!(result.is_err())` for failure cases. Note that remaining input is part of the assertion — it is easy to accidentally under-consume. ### nom 8 API note nom 8 changed the parser API: combinators now return types that implement `Parser` rather than closures. Call `.parse(input)` on them, or pass input directly as `combinator(args)(input)`. The `Parser` trait is in scope with `use nom::Parser;`. Reference: [nom changelog](https://github.com/rust-bakery/nom/blob/main/CHANGELOG.md). ## Key references - nom README: https://github.com/rust-bakery/nom - `nom::bytes::complete` module (tag, take_while, take_until, is_not) - `nom::character::complete` module (char, alpha1, digit1, multispace0) - `nom::sequence` module (delimited, preceded, terminated, tuple, pair) - `nom::multi` module (many0, many1, separated_list0) - `nom::combinator` module (map, map_res, opt, recognize, verify, cut, value) - `nom::branch` module (alt) - Recipes: https://github.com/rust-bakery/nom/blob/main/doc/nom_recipes.md ## Exercises to include 1. Write a parser for `#t` and `#f` booleans using `alt` and `tag` 2. Write a parser for a C-style identifier (starts with letter or `_`, then alphanumeric or `_`) 3. Write a parser for a decimal integer using `recognize`, `opt(char('-'))`, and `digit1` 4. Compose the above three into an `alt` that returns a string slice matching any of them Each exercise should have a collapsible reference solution. ## Style notes - Introduce `IResult` before showing any combinator — readers need to understand the return type to understand what combinators are doing - Show every combinator with a working code snippet, not just a description - Make the `ws` wrapper a "save this — you will use it throughout" moment