vibed

You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

5.8 KiB

Raw Blame History Unescape Escape

title	status	type	priority	created_at	updated_at
§6 Recognizing Atoms: Integers, Booleans, Strings, Symbols	completed	task	normal	2026-03-10T23:30:01Z	2026-03-10T23:30:01Z

§6 Recognizing Atoms: Integers, Booleans, Strings, Symbols — Stub to fill

File: edu/src/lisp-compiler.md, section ### 6. Recognizing Atoms: Integers, Booleans, Strings, Symbols

Replace the stub line with full content. Target 800–1100 words. This is a hands-on section that builds one atom parser at a time. Each parser is developed in isolation before being combined in §8.

Learning objectives

Write a nom parser for each MiniLisp atom type
Use map_res, recognize, opt, alt, tag, char, take_while1, is_not, escaped_transform
Understand how to test parsers with assert_eq! on the full IResult
Know the tricky cases: negative integers vs symbol -, #t/#f ambiguity, string escapes

Content to write

Work through each atom parser in a subsection with: explanation, full code, tricky cases, and a test block.

Integer parser

A signed decimal integer: optional -, then one or more digits, converted to i64.

use nom::{IResult, combinator::{map_res, recognize, opt}, character::complete::{char, digit1}, sequence::pair};

pub fn parse_integer(input: &str) -> IResult<&str, i64> {
    map_res(
        recognize(pair(opt(char('-')), digit1)),
        |s: &str| s.parse::<i64>()
    )(input)
}

Tricky case: the symbol - and negative integers. Because opt(char('-')) allows a lone -, parse_integer("-") will try to parse - as an integer and fail at map_res (because "-" does not parse as i64). This is fine — the failure is recoverable and alt in the atom parser will fall through to the symbol parser. However, this means the integer parser must be tried before the symbol parser in the alt.

Tests:

assert_eq!(parse_integer("42 rest"), Ok((" rest", 42)));
assert_eq!(parse_integer("-7"), Ok(("", -7)));
assert!(parse_integer("abc").is_err());

Boolean parser

use nom::{IResult, branch::alt, bytes::complete::tag, combinator::value};

pub fn parse_bool(input: &str) -> IResult<&str, bool> {
    alt((
        value(true,  tag("#t")),
        value(false, tag("#f")),
    ))(input)
}

Explain value(output, parser) — discards the parser's output and returns a fixed value instead. This avoids a map that ignores its argument.

Tricky case: #t and #f must not be valid symbol characters, otherwise a symbol starting with # would be ambiguous. Confirm that # is not in the symbol character set (per §2).

Symbol parser

Symbols start with a sym_start character and continue with zero or more sym_cont characters. Use recognize to return the input slice.

use nom::{IResult, combinator::recognize, sequence::pair,
          character::complete::{alpha1, alphanumeric1},
          bytes::complete::take_while1, branch::alt};

fn is_sym_start(c: char) -> bool {
    c.is_alphabetic() || "-_?!+*/=<>".contains(c)
}

fn is_sym_cont(c: char) -> bool {
    c.is_alphanumeric() || "-_?!+*/=<>".contains(c)
}

pub fn parse_symbol(input: &str) -> IResult<&str, &str> {
    recognize(pair(
        nom::bytes::complete::take_while_m_n(1, 1, is_sym_start),
        nom::bytes::complete::take_while(is_sym_cont),
    ))(input)
}

Tricky case: +, *, /, =, <, > are valid single-character symbols (used as operator names). The parser must handle them.

Tests:

assert_eq!(parse_symbol("my-var rest"), Ok((" rest", "my-var")));
assert_eq!(parse_symbol("+"), Ok(("", "+")));
assert!(parse_symbol("42").is_err());

String parser

Double-quoted strings with escape sequences \", \\, \n, \t.

use nom::{IResult, bytes::complete::{tag, is_not}, sequence::delimited,
          combinator::map, branch::alt};
use nom::bytes::complete::escaped_transform;
use nom::character::complete::char;

pub fn parse_string(input: &str) -> IResult<&str, String> {
    delimited(
        char('"'),
        escaped_transform(
            is_not("\\\""),
            '\\',
            alt((
                map(char('"'),  |_| "\""),
                map(char('\\'), |_| "\\"),
                map(char('n'),  |_| "\n"),
                map(char('t'),  |_| "\t"),
            ))
        ),
        char('"'),
    )(input)
}

Note: escaped_transform returns String (owned), not &str, because it must allocate when escape sequences are expanded.

Tricky case: an empty string "" — is_not requires at least one character. Test it explicitly.

Tests:

assert_eq!(parse_string(r#""hello""#), Ok(("", "hello".to_string())));
assert_eq!(parse_string(r#""a\nb""#), Ok(("", "a\nb".to_string())));
assert_eq!(parse_string(r#""""#), Ok(("", "".to_string())));

Comment parser

Comments are consumed and discarded — they produce no AST node.

use nom::{IResult, bytes::complete::is_not, sequence::pair,
          character::complete::{char, line_ending}, combinator::opt,
          combinator::value};

pub fn parse_comment(input: &str) -> IResult<&str, ()> {
    value((), pair(char(';'), opt(is_not("\n\r"))))(input)
}

Exercises

Extend the integer parser to also recognise hexadecimal literals prefixed with 0x — use alt and map_res with i64::from_str_radix.
Extend the symbol parser to reject the single character - followed immediately by a digit (since that should be parsed as a negative integer).

Both exercises should have collapsible reference solutions.

Style notes

One subsection per atom type, in the order they will appear in the alt in §8
Every code block must be self-contained with use statements
Show tricky cases and why they are tricky before showing the solution — the reader should understand the pitfall, not just copy the fix
nom version note: use nom::bytes::complete (not nom::bytes::streaming) throughout

5.8 KiB Raw Blame History Unescape Escape

§6 Recognizing Atoms: Integers, Booleans, Strings, Symbols — Stub to fill

Learning objectives

Content to write

Integer parser

Boolean parser

Symbol parser

String parser

Comment parser

Exercises

Style notes

5.8 KiB

Raw Blame History Unescape Escape