+++ title = "§8 Parsing Atoms with nom" priority = 5 status = "done" ticket_type = "task" dependencies = [] +++ ## §8 Parsing Atoms with nom — Stub to fill File: `edu/src/lisp-compiler.md`, section `### 8. Parsing Atoms with nom` Replace the stub line with full content. Target 600–800 words. This section takes the individual atom parsers from §6 and the AST from §7 and combines them into a single `parse_atom` function that returns `IResult<&str, Expr>`. Includes tests. ## Learning objectives - Combine individual atom parsers into a single `alt` using `map` to produce `Expr` values - Understand how to add `src/parser.rs` to the project properly - Write comprehensive unit tests for atom parsing - Handle the ordering constraint in `alt`: integers before symbols ## Content to write ### The `parse_atom` function In `src/parser.rs`, import the atom parsers from §6 and the `Expr` type from `src/ast.rs`, then combine them: ```rust use nom::{IResult, branch::alt, combinator::map}; use crate::ast::Expr; /// Parse any MiniLisp atom: integer, boolean, string, or symbol. pub fn parse_atom(input: &str) -> IResult<&str, Expr> { alt(( map(parse_integer, Expr::Int), map(parse_bool, Expr::Bool), map(parse_string, Expr::Str), map(parse_symbol, |s: &str| Expr::Symbol(s.to_string())), ))(input) } ``` Explain the ordering: 1. **Integer before symbol**: `-7` must match as integer, not as a symbol starting with `-`. Because `parse_integer` consumes the full `-7` before `parse_symbol` is tried, the ordering ensures correct behavior. 2. **Boolean before symbol**: `#t` and `#f` are not valid symbols (since `#` is not a symbol-start character), so ordering here does not matter — but it is cleaner to try booleans first. 3. **String last among atoms**: no overlap with the others since strings start with `"`. ### Module organisation Show the complete `src/parser.rs` header at this point: ```rust //! Parser for MiniLisp source code. //! //! Entry point: [`parse`] which accepts a `&str` and returns `Vec`. use nom::{ IResult, branch::alt, bytes::complete::{escaped_transform, is_not, tag, take_while, take_while_m_n}, character::complete::{char, digit1, multispace0, line_ending}, combinator::{map, map_res, opt, recognize, value}, sequence::{delimited, pair}, }; use crate::ast::Expr; ``` ### Whitespace-aware atom parser Wrap `parse_atom` in the `ws` combinator so callers do not have to think about surrounding whitespace: ```rust pub fn parse_atom_ws(input: &str) -> IResult<&str, Expr> { ws(parse_atom)(input) } ``` ### Unit tests Write a `#[cfg(test)]` module in `src/parser.rs` testing every atom type with multiple cases: ```rust #[cfg(test)] mod tests { use super::*; use crate::ast::Expr; #[test] fn test_integer_atom() { assert_eq!(parse_atom("42"), Ok(("", Expr::Int(42)))); assert_eq!(parse_atom("-7 "), Ok((" ", Expr::Int(-7)))); assert_eq!(parse_atom("0"), Ok(("", Expr::Int(0)))); } #[test] fn test_bool_atom() { assert_eq!(parse_atom("#t"), Ok(("", Expr::Bool(true)))); assert_eq!(parse_atom("#f"), Ok(("", Expr::Bool(false)))); } #[test] fn test_string_atom() { assert_eq!(parse_atom(r#""hello""#), Ok(("", Expr::Str("hello".into())))); assert_eq!(parse_atom(r#""a\nb""#), Ok(("", Expr::Str("a\nb".into())))); } #[test] fn test_symbol_atom() { assert_eq!(parse_atom("my-var"), Ok(("", Expr::Symbol("my-var".into())))); assert_eq!(parse_atom("+"), Ok(("", Expr::Symbol("+".into())))); assert_eq!(parse_atom("factorial rest"), Ok((" rest", Expr::Symbol("factorial".into())))); } #[test] fn test_negative_integer_vs_symbol() { // -7 must be an integer, not a symbol assert_eq!(parse_atom("-7"), Ok(("", Expr::Int(-7)))); // lone - is a symbol assert_eq!(parse_atom("- "), Ok((" ", Expr::Symbol("-".into())))); } } ``` ### Run the tests ```sh cargo test parser ``` All tests should pass before proceeding to §9. ## Style notes - The ordering section is the most important teaching moment here — make it explicit - Show how `map` is used to lift a primitive value into an `Expr` variant - The test for `-7` vs `-` (lone minus) is critical — flag it as something to get right