--- # edu-16fy title: '§6 Recognizing Atoms: Integers, Booleans, Strings, Symbols' status: completed type: task priority: normal created_at: 2026-03-10T23:30:01Z updated_at: 2026-03-10T23:30:01Z --- ## §6 Recognizing Atoms: Integers, Booleans, Strings, Symbols — Stub to fill File: `edu/src/lisp-compiler.md`, section `### 6. Recognizing Atoms: Integers, Booleans, Strings, Symbols` Replace the stub line with full content. Target 800–1100 words. This is a hands-on section that builds one atom parser at a time. Each parser is developed in isolation before being combined in §8. ## Learning objectives - Write a nom parser for each MiniLisp atom type - Use `map_res`, `recognize`, `opt`, `alt`, `tag`, `char`, `take_while1`, `is_not`, `escaped_transform` - Understand how to test parsers with `assert_eq!` on the full `IResult` - Know the tricky cases: negative integers vs symbol `-`, `#t`/`#f` ambiguity, string escapes ## Content to write Work through each atom parser in a subsection with: explanation, full code, tricky cases, and a test block. ### Integer parser A signed decimal integer: optional `-`, then one or more digits, converted to `i64`. ```rust use nom::{IResult, combinator::{map_res, recognize, opt}, character::complete::{char, digit1}, sequence::pair}; pub fn parse_integer(input: &str) -> IResult<&str, i64> { map_res( recognize(pair(opt(char('-')), digit1)), |s: &str| s.parse::() )(input) } ``` Tricky case: the symbol `-` and negative integers. Because `opt(char('-'))` allows a lone `-`, `parse_integer("-")` will try to parse `-` as an integer and fail at `map_res` (because `"-"` does not parse as i64). This is fine — the failure is recoverable and `alt` in the atom parser will fall through to the symbol parser. However, this means the integer parser must be tried *before* the symbol parser in the `alt`. Tests: ```rust assert_eq!(parse_integer("42 rest"), Ok((" rest", 42))); assert_eq!(parse_integer("-7"), Ok(("", -7))); assert!(parse_integer("abc").is_err()); ``` ### Boolean parser ```rust use nom::{IResult, branch::alt, bytes::complete::tag, combinator::value}; pub fn parse_bool(input: &str) -> IResult<&str, bool> { alt(( value(true, tag("#t")), value(false, tag("#f")), ))(input) } ``` Explain `value(output, parser)` — discards the parser's output and returns a fixed value instead. This avoids a `map` that ignores its argument. Tricky case: `#t` and `#f` must not be valid symbol characters, otherwise a symbol starting with `#` would be ambiguous. Confirm that `#` is not in the symbol character set (per §2). ### Symbol parser Symbols start with a `sym_start` character and continue with zero or more `sym_cont` characters. Use `recognize` to return the input slice. ```rust use nom::{IResult, combinator::recognize, sequence::pair, character::complete::{alpha1, alphanumeric1}, bytes::complete::take_while1, branch::alt}; fn is_sym_start(c: char) -> bool { c.is_alphabetic() || "-_?!+*/=<>".contains(c) } fn is_sym_cont(c: char) -> bool { c.is_alphanumeric() || "-_?!+*/=<>".contains(c) } pub fn parse_symbol(input: &str) -> IResult<&str, &str> { recognize(pair( nom::bytes::complete::take_while_m_n(1, 1, is_sym_start), nom::bytes::complete::take_while(is_sym_cont), ))(input) } ``` Tricky case: `+`, `*`, `/`, `=`, `<`, `>` are valid single-character symbols (used as operator names). The parser must handle them. Tests: ```rust assert_eq!(parse_symbol("my-var rest"), Ok((" rest", "my-var"))); assert_eq!(parse_symbol("+"), Ok(("", "+"))); assert!(parse_symbol("42").is_err()); ``` ### String parser Double-quoted strings with escape sequences `\"`, `\\`, `\n`, `\t`. ```rust use nom::{IResult, bytes::complete::{tag, is_not}, sequence::delimited, combinator::map, branch::alt}; use nom::bytes::complete::escaped_transform; use nom::character::complete::char; pub fn parse_string(input: &str) -> IResult<&str, String> { delimited( char('"'), escaped_transform( is_not("\\\""), '\\', alt(( map(char('"'), |_| "\""), map(char('\\'), |_| "\\"), map(char('n'), |_| "\n"), map(char('t'), |_| "\t"), )) ), char('"'), )(input) } ``` Note: `escaped_transform` returns `String` (owned), not `&str`, because it must allocate when escape sequences are expanded. Tricky case: an empty string `""` — `is_not` requires at least one character. Test it explicitly. Tests: ```rust assert_eq!(parse_string(r#""hello""#), Ok(("", "hello".to_string()))); assert_eq!(parse_string(r#""a\nb""#), Ok(("", "a\nb".to_string()))); assert_eq!(parse_string(r#""""#), Ok(("", "".to_string()))); ``` ### Comment parser Comments are consumed and discarded — they produce no AST node. ```rust use nom::{IResult, bytes::complete::is_not, sequence::pair, character::complete::{char, line_ending}, combinator::opt, combinator::value}; pub fn parse_comment(input: &str) -> IResult<&str, ()> { value((), pair(char(';'), opt(is_not("\n\r"))))(input) } ``` ## Exercises 1. Extend the integer parser to also recognise hexadecimal literals prefixed with `0x` — use `alt` and `map_res` with `i64::from_str_radix`. 2. Extend the symbol parser to reject the single character `-` followed immediately by a digit (since that should be parsed as a negative integer). Both exercises should have collapsible reference solutions. ## Style notes - One subsection per atom type, in the order they will appear in the `alt` in §8 - Every code block must be self-contained with `use` statements - Show tricky cases and why they are tricky before showing the solution — the reader should understand the pitfall, not just copy the fix - nom version note: use `nom::bytes::complete` (not `nom::bytes::streaming`) throughout