4.3 KiB
+++ title = "§8 Parsing Atoms with nom" priority = 5 status = "todo" ticket_type = "task" dependencies = [] +++
§8 Parsing Atoms with nom — Stub to fill
File: edu/src/lisp-compiler.md, section ### 8. Parsing Atoms with nom
Replace the stub line with full content. Target 600–800 words. This section takes the individual atom parsers from §6 and the AST from §7 and combines them into a single parse_atom function that returns IResult<&str, Expr>. Includes tests.
Learning objectives
- Combine individual atom parsers into a single
altusingmapto produceExprvalues - Understand how to add
src/parser.rsto the project properly - Write comprehensive unit tests for atom parsing
- Handle the ordering constraint in
alt: integers before symbols
Content to write
The parse_atom function
In src/parser.rs, import the atom parsers from §6 and the Expr type from src/ast.rs, then combine them:
use nom::{IResult, branch::alt, combinator::map};
use crate::ast::Expr;
/// Parse any MiniLisp atom: integer, boolean, string, or symbol.
pub fn parse_atom(input: &str) -> IResult<&str, Expr> {
alt((
map(parse_integer, Expr::Int),
map(parse_bool, Expr::Bool),
map(parse_string, Expr::Str),
map(parse_symbol, |s: &str| Expr::Symbol(s.to_string())),
))(input)
}
Explain the ordering:
- Integer before symbol:
-7must match as integer, not as a symbol starting with-. Becauseparse_integerconsumes the full-7beforeparse_symbolis tried, the ordering ensures correct behavior. - Boolean before symbol:
#tand#fare not valid symbols (since#is not a symbol-start character), so ordering here does not matter — but it is cleaner to try booleans first. - String last among atoms: no overlap with the others since strings start with
".
Module organisation
Show the complete src/parser.rs header at this point:
//! Parser for MiniLisp source code.
//!
//! Entry point: [`parse`] which accepts a `&str` and returns `Vec<Expr>`.
use nom::{
IResult,
branch::alt,
bytes::complete::{escaped_transform, is_not, tag, take_while, take_while_m_n},
character::complete::{char, digit1, multispace0, line_ending},
combinator::{map, map_res, opt, recognize, value},
sequence::{delimited, pair},
};
use crate::ast::Expr;
Whitespace-aware atom parser
Wrap parse_atom in the ws combinator so callers do not have to think about surrounding whitespace:
pub fn parse_atom_ws(input: &str) -> IResult<&str, Expr> {
ws(parse_atom)(input)
}
Unit tests
Write a #[cfg(test)] module in src/parser.rs testing every atom type with multiple cases:
#[cfg(test)]
mod tests {
use super::*;
use crate::ast::Expr;
#[test]
fn test_integer_atom() {
assert_eq!(parse_atom("42"), Ok(("", Expr::Int(42))));
assert_eq!(parse_atom("-7 "), Ok((" ", Expr::Int(-7))));
assert_eq!(parse_atom("0"), Ok(("", Expr::Int(0))));
}
#[test]
fn test_bool_atom() {
assert_eq!(parse_atom("#t"), Ok(("", Expr::Bool(true))));
assert_eq!(parse_atom("#f"), Ok(("", Expr::Bool(false))));
}
#[test]
fn test_string_atom() {
assert_eq!(parse_atom(r#""hello""#), Ok(("", Expr::Str("hello".into()))));
assert_eq!(parse_atom(r#""a\nb""#), Ok(("", Expr::Str("a\nb".into()))));
}
#[test]
fn test_symbol_atom() {
assert_eq!(parse_atom("my-var"), Ok(("", Expr::Symbol("my-var".into()))));
assert_eq!(parse_atom("+"), Ok(("", Expr::Symbol("+".into()))));
assert_eq!(parse_atom("factorial rest"), Ok((" rest", Expr::Symbol("factorial".into()))));
}
#[test]
fn test_negative_integer_vs_symbol() {
// -7 must be an integer, not a symbol
assert_eq!(parse_atom("-7"), Ok(("", Expr::Int(-7))));
// lone - is a symbol
assert_eq!(parse_atom("- "), Ok((" ", Expr::Symbol("-".into()))));
}
}
Run the tests
cargo test parser
All tests should pass before proceeding to §9.
Style notes
- The ordering section is the most important teaching moment here — make it explicit
- Show how
mapis used to lift a primitive value into anExprvariant - The test for
-7vs-(lone minus) is critical — flag it as something to get right