5.8 KiB
+++ title = "§6 Recognizing Atoms: Integers, Booleans, Strings, Symbols" priority = 5 status = "todo" ticket_type = "task" dependencies = [] +++
§6 Recognizing Atoms: Integers, Booleans, Strings, Symbols — Stub to fill
File: edu/src/lisp-compiler.md, section ### 6. Recognizing Atoms: Integers, Booleans, Strings, Symbols
Replace the stub line with full content. Target 800–1100 words. This is a hands-on section that builds one atom parser at a time. Each parser is developed in isolation before being combined in §8.
Learning objectives
- Write a nom parser for each MiniLisp atom type
- Use
map_res,recognize,opt,alt,tag,char,take_while1,is_not,escaped_transform - Understand how to test parsers with
assert_eq!on the fullIResult - Know the tricky cases: negative integers vs symbol
-,#t/#fambiguity, string escapes
Content to write
Work through each atom parser in a subsection with: explanation, full code, tricky cases, and a test block.
Integer parser
A signed decimal integer: optional -, then one or more digits, converted to i64.
use nom::{IResult, combinator::{map_res, recognize, opt}, character::complete::{char, digit1}, sequence::pair};
pub fn parse_integer(input: &str) -> IResult<&str, i64> {
map_res(
recognize(pair(opt(char('-')), digit1)),
|s: &str| s.parse::<i64>()
)(input)
}
Tricky case: the symbol - and negative integers. Because opt(char('-')) allows a lone -, parse_integer("-") will try to parse - as an integer and fail at map_res (because "-" does not parse as i64). This is fine — the failure is recoverable and alt in the atom parser will fall through to the symbol parser. However, this means the integer parser must be tried before the symbol parser in the alt.
Tests:
assert_eq!(parse_integer("42 rest"), Ok((" rest", 42)));
assert_eq!(parse_integer("-7"), Ok(("", -7)));
assert!(parse_integer("abc").is_err());
Boolean parser
use nom::{IResult, branch::alt, bytes::complete::tag, combinator::value};
pub fn parse_bool(input: &str) -> IResult<&str, bool> {
alt((
value(true, tag("#t")),
value(false, tag("#f")),
))(input)
}
Explain value(output, parser) — discards the parser's output and returns a fixed value instead. This avoids a map that ignores its argument.
Tricky case: #t and #f must not be valid symbol characters, otherwise a symbol starting with # would be ambiguous. Confirm that # is not in the symbol character set (per §2).
Symbol parser
Symbols start with a sym_start character and continue with zero or more sym_cont characters. Use recognize to return the input slice.
use nom::{IResult, combinator::recognize, sequence::pair,
character::complete::{alpha1, alphanumeric1},
bytes::complete::take_while1, branch::alt};
fn is_sym_start(c: char) -> bool {
c.is_alphabetic() || "-_?!+*/=<>".contains(c)
}
fn is_sym_cont(c: char) -> bool {
c.is_alphanumeric() || "-_?!+*/=<>".contains(c)
}
pub fn parse_symbol(input: &str) -> IResult<&str, &str> {
recognize(pair(
nom::bytes::complete::take_while_m_n(1, 1, is_sym_start),
nom::bytes::complete::take_while(is_sym_cont),
))(input)
}
Tricky case: +, *, /, =, <, > are valid single-character symbols (used as operator names). The parser must handle them.
Tests:
assert_eq!(parse_symbol("my-var rest"), Ok((" rest", "my-var")));
assert_eq!(parse_symbol("+"), Ok(("", "+")));
assert!(parse_symbol("42").is_err());
String parser
Double-quoted strings with escape sequences \", \\, \n, \t.
use nom::{IResult, bytes::complete::{tag, is_not}, sequence::delimited,
combinator::map, branch::alt};
use nom::bytes::complete::escaped_transform;
use nom::character::complete::char;
pub fn parse_string(input: &str) -> IResult<&str, String> {
delimited(
char('"'),
escaped_transform(
is_not("\\\""),
'\\',
alt((
map(char('"'), |_| "\""),
map(char('\\'), |_| "\\"),
map(char('n'), |_| "\n"),
map(char('t'), |_| "\t"),
))
),
char('"'),
)(input)
}
Note: escaped_transform returns String (owned), not &str, because it must allocate when escape sequences are expanded.
Tricky case: an empty string "" — is_not requires at least one character. Test it explicitly.
Tests:
assert_eq!(parse_string(r#""hello""#), Ok(("", "hello".to_string())));
assert_eq!(parse_string(r#""a\nb""#), Ok(("", "a\nb".to_string())));
assert_eq!(parse_string(r#""""#), Ok(("", "".to_string())));
Comment parser
Comments are consumed and discarded — they produce no AST node.
use nom::{IResult, bytes::complete::is_not, sequence::pair,
character::complete::{char, line_ending}, combinator::opt,
combinator::value};
pub fn parse_comment(input: &str) -> IResult<&str, ()> {
value((), pair(char(';'), opt(is_not("\n\r"))))(input)
}
Exercises
- Extend the integer parser to also recognise hexadecimal literals prefixed with
0x— usealtandmap_reswithi64::from_str_radix. - Extend the symbol parser to reject the single character
-followed immediately by a digit (since that should be parsed as a negative integer).
Both exercises should have collapsible reference solutions.
Style notes
- One subsection per atom type, in the order they will appear in the
altin §8 - Every code block must be self-contained with
usestatements - Show tricky cases and why they are tricky before showing the solution — the reader should understand the pitfall, not just copy the fix
- nom version note: use
nom::bytes::complete(notnom::bytes::streaming) throughout