You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
vibed/edu/.beans/archive/edu-16fy--6-recognizing-ato...

5.8 KiB

title status type priority created_at updated_at
§6 Recognizing Atoms: Integers, Booleans, Strings, Symbols completed task normal 2026-03-10T23:30:01Z 2026-03-10T23:30:01Z

§6 Recognizing Atoms: Integers, Booleans, Strings, Symbols — Stub to fill

File: edu/src/lisp-compiler.md, section ### 6. Recognizing Atoms: Integers, Booleans, Strings, Symbols

Replace the stub line with full content. Target 8001100 words. This is a hands-on section that builds one atom parser at a time. Each parser is developed in isolation before being combined in §8.

Learning objectives

  • Write a nom parser for each MiniLisp atom type
  • Use map_res, recognize, opt, alt, tag, char, take_while1, is_not, escaped_transform
  • Understand how to test parsers with assert_eq! on the full IResult
  • Know the tricky cases: negative integers vs symbol -, #t/#f ambiguity, string escapes

Content to write

Work through each atom parser in a subsection with: explanation, full code, tricky cases, and a test block.

Integer parser

A signed decimal integer: optional -, then one or more digits, converted to i64.

use nom::{IResult, combinator::{map_res, recognize, opt}, character::complete::{char, digit1}, sequence::pair};

pub fn parse_integer(input: &str) -> IResult<&str, i64> {
    map_res(
        recognize(pair(opt(char('-')), digit1)),
        |s: &str| s.parse::<i64>()
    )(input)
}

Tricky case: the symbol - and negative integers. Because opt(char('-')) allows a lone -, parse_integer("-") will try to parse - as an integer and fail at map_res (because "-" does not parse as i64). This is fine — the failure is recoverable and alt in the atom parser will fall through to the symbol parser. However, this means the integer parser must be tried before the symbol parser in the alt.

Tests:

assert_eq!(parse_integer("42 rest"), Ok((" rest", 42)));
assert_eq!(parse_integer("-7"), Ok(("", -7)));
assert!(parse_integer("abc").is_err());

Boolean parser

use nom::{IResult, branch::alt, bytes::complete::tag, combinator::value};

pub fn parse_bool(input: &str) -> IResult<&str, bool> {
    alt((
        value(true,  tag("#t")),
        value(false, tag("#f")),
    ))(input)
}

Explain value(output, parser) — discards the parser's output and returns a fixed value instead. This avoids a map that ignores its argument.

Tricky case: #t and #f must not be valid symbol characters, otherwise a symbol starting with # would be ambiguous. Confirm that # is not in the symbol character set (per §2).

Symbol parser

Symbols start with a sym_start character and continue with zero or more sym_cont characters. Use recognize to return the input slice.

use nom::{IResult, combinator::recognize, sequence::pair,
          character::complete::{alpha1, alphanumeric1},
          bytes::complete::take_while1, branch::alt};

fn is_sym_start(c: char) -> bool {
    c.is_alphabetic() || "-_?!+*/=<>".contains(c)
}

fn is_sym_cont(c: char) -> bool {
    c.is_alphanumeric() || "-_?!+*/=<>".contains(c)
}

pub fn parse_symbol(input: &str) -> IResult<&str, &str> {
    recognize(pair(
        nom::bytes::complete::take_while_m_n(1, 1, is_sym_start),
        nom::bytes::complete::take_while(is_sym_cont),
    ))(input)
}

Tricky case: +, *, /, =, <, > are valid single-character symbols (used as operator names). The parser must handle them.

Tests:

assert_eq!(parse_symbol("my-var rest"), Ok((" rest", "my-var")));
assert_eq!(parse_symbol("+"), Ok(("", "+")));
assert!(parse_symbol("42").is_err());

String parser

Double-quoted strings with escape sequences \", \\, \n, \t.

use nom::{IResult, bytes::complete::{tag, is_not}, sequence::delimited,
          combinator::map, branch::alt};
use nom::bytes::complete::escaped_transform;
use nom::character::complete::char;

pub fn parse_string(input: &str) -> IResult<&str, String> {
    delimited(
        char('"'),
        escaped_transform(
            is_not("\\\""),
            '\\',
            alt((
                map(char('"'),  |_| "\""),
                map(char('\\'), |_| "\\"),
                map(char('n'),  |_| "\n"),
                map(char('t'),  |_| "\t"),
            ))
        ),
        char('"'),
    )(input)
}

Note: escaped_transform returns String (owned), not &str, because it must allocate when escape sequences are expanded.

Tricky case: an empty string ""is_not requires at least one character. Test it explicitly.

Tests:

assert_eq!(parse_string(r#""hello""#), Ok(("", "hello".to_string())));
assert_eq!(parse_string(r#""a\nb""#), Ok(("", "a\nb".to_string())));
assert_eq!(parse_string(r#""""#), Ok(("", "".to_string())));

Comment parser

Comments are consumed and discarded — they produce no AST node.

use nom::{IResult, bytes::complete::is_not, sequence::pair,
          character::complete::{char, line_ending}, combinator::opt,
          combinator::value};

pub fn parse_comment(input: &str) -> IResult<&str, ()> {
    value((), pair(char(';'), opt(is_not("\n\r"))))(input)
}

Exercises

  1. Extend the integer parser to also recognise hexadecimal literals prefixed with 0x — use alt and map_res with i64::from_str_radix.
  2. Extend the symbol parser to reject the single character - followed immediately by a digit (since that should be parsed as a negative integer).

Both exercises should have collapsible reference solutions.

Style notes

  • One subsection per atom type, in the order they will appear in the alt in §8
  • Every code block must be self-contained with use statements
  • Show tricky cases and why they are tricky before showing the solution — the reader should understand the pitfall, not just copy the fix
  • nom version note: use nom::bytes::complete (not nom::bytes::streaming) throughout