|
|
---
|
|
|
# edu-16fy
|
|
|
title: '§6 Recognizing Atoms: Integers, Booleans, Strings, Symbols'
|
|
|
status: completed
|
|
|
type: task
|
|
|
priority: normal
|
|
|
created_at: 2026-03-10T23:30:01Z
|
|
|
updated_at: 2026-03-10T23:30:01Z
|
|
|
---
|
|
|
|
|
|
## §6 Recognizing Atoms: Integers, Booleans, Strings, Symbols — Stub to fill
|
|
|
|
|
|
File: `edu/src/lisp-compiler.md`, section `### 6. Recognizing Atoms: Integers, Booleans, Strings, Symbols`
|
|
|
|
|
|
Replace the stub line with full content. Target 800–1100 words. This is a hands-on section that builds one atom parser at a time. Each parser is developed in isolation before being combined in §8.
|
|
|
|
|
|
## Learning objectives
|
|
|
|
|
|
- Write a nom parser for each MiniLisp atom type
|
|
|
- Use `map_res`, `recognize`, `opt`, `alt`, `tag`, `char`, `take_while1`, `is_not`, `escaped_transform`
|
|
|
- Understand how to test parsers with `assert_eq!` on the full `IResult`
|
|
|
- Know the tricky cases: negative integers vs symbol `-`, `#t`/`#f` ambiguity, string escapes
|
|
|
|
|
|
## Content to write
|
|
|
|
|
|
Work through each atom parser in a subsection with: explanation, full code, tricky cases, and a test block.
|
|
|
|
|
|
### Integer parser
|
|
|
|
|
|
A signed decimal integer: optional `-`, then one or more digits, converted to `i64`.
|
|
|
|
|
|
```rust
|
|
|
use nom::{IResult, combinator::{map_res, recognize, opt}, character::complete::{char, digit1}, sequence::pair};
|
|
|
|
|
|
pub fn parse_integer(input: &str) -> IResult<&str, i64> {
|
|
|
map_res(
|
|
|
recognize(pair(opt(char('-')), digit1)),
|
|
|
|s: &str| s.parse::<i64>()
|
|
|
)(input)
|
|
|
}
|
|
|
```
|
|
|
|
|
|
Tricky case: the symbol `-` and negative integers. Because `opt(char('-'))` allows a lone `-`, `parse_integer("-")` will try to parse `-` as an integer and fail at `map_res` (because `"-"` does not parse as i64). This is fine — the failure is recoverable and `alt` in the atom parser will fall through to the symbol parser. However, this means the integer parser must be tried *before* the symbol parser in the `alt`.
|
|
|
|
|
|
Tests:
|
|
|
```rust
|
|
|
assert_eq!(parse_integer("42 rest"), Ok((" rest", 42)));
|
|
|
assert_eq!(parse_integer("-7"), Ok(("", -7)));
|
|
|
assert!(parse_integer("abc").is_err());
|
|
|
```
|
|
|
|
|
|
### Boolean parser
|
|
|
|
|
|
```rust
|
|
|
use nom::{IResult, branch::alt, bytes::complete::tag, combinator::value};
|
|
|
|
|
|
pub fn parse_bool(input: &str) -> IResult<&str, bool> {
|
|
|
alt((
|
|
|
value(true, tag("#t")),
|
|
|
value(false, tag("#f")),
|
|
|
))(input)
|
|
|
}
|
|
|
```
|
|
|
|
|
|
Explain `value(output, parser)` — discards the parser's output and returns a fixed value instead. This avoids a `map` that ignores its argument.
|
|
|
|
|
|
Tricky case: `#t` and `#f` must not be valid symbol characters, otherwise a symbol starting with `#` would be ambiguous. Confirm that `#` is not in the symbol character set (per §2).
|
|
|
|
|
|
### Symbol parser
|
|
|
|
|
|
Symbols start with a `sym_start` character and continue with zero or more `sym_cont` characters. Use `recognize` to return the input slice.
|
|
|
|
|
|
```rust
|
|
|
use nom::{IResult, combinator::recognize, sequence::pair,
|
|
|
character::complete::{alpha1, alphanumeric1},
|
|
|
bytes::complete::take_while1, branch::alt};
|
|
|
|
|
|
fn is_sym_start(c: char) -> bool {
|
|
|
c.is_alphabetic() || "-_?!+*/=<>".contains(c)
|
|
|
}
|
|
|
|
|
|
fn is_sym_cont(c: char) -> bool {
|
|
|
c.is_alphanumeric() || "-_?!+*/=<>".contains(c)
|
|
|
}
|
|
|
|
|
|
pub fn parse_symbol(input: &str) -> IResult<&str, &str> {
|
|
|
recognize(pair(
|
|
|
nom::bytes::complete::take_while_m_n(1, 1, is_sym_start),
|
|
|
nom::bytes::complete::take_while(is_sym_cont),
|
|
|
))(input)
|
|
|
}
|
|
|
```
|
|
|
|
|
|
Tricky case: `+`, `*`, `/`, `=`, `<`, `>` are valid single-character symbols (used as operator names). The parser must handle them.
|
|
|
|
|
|
Tests:
|
|
|
```rust
|
|
|
assert_eq!(parse_symbol("my-var rest"), Ok((" rest", "my-var")));
|
|
|
assert_eq!(parse_symbol("+"), Ok(("", "+")));
|
|
|
assert!(parse_symbol("42").is_err());
|
|
|
```
|
|
|
|
|
|
### String parser
|
|
|
|
|
|
Double-quoted strings with escape sequences `\"`, `\\`, `\n`, `\t`.
|
|
|
|
|
|
```rust
|
|
|
use nom::{IResult, bytes::complete::{tag, is_not}, sequence::delimited,
|
|
|
combinator::map, branch::alt};
|
|
|
use nom::bytes::complete::escaped_transform;
|
|
|
use nom::character::complete::char;
|
|
|
|
|
|
pub fn parse_string(input: &str) -> IResult<&str, String> {
|
|
|
delimited(
|
|
|
char('"'),
|
|
|
escaped_transform(
|
|
|
is_not("\\\""),
|
|
|
'\\',
|
|
|
alt((
|
|
|
map(char('"'), |_| "\""),
|
|
|
map(char('\\'), |_| "\\"),
|
|
|
map(char('n'), |_| "\n"),
|
|
|
map(char('t'), |_| "\t"),
|
|
|
))
|
|
|
),
|
|
|
char('"'),
|
|
|
)(input)
|
|
|
}
|
|
|
```
|
|
|
|
|
|
Note: `escaped_transform` returns `String` (owned), not `&str`, because it must allocate when escape sequences are expanded.
|
|
|
|
|
|
Tricky case: an empty string `""` — `is_not` requires at least one character. Test it explicitly.
|
|
|
|
|
|
Tests:
|
|
|
```rust
|
|
|
assert_eq!(parse_string(r#""hello""#), Ok(("", "hello".to_string())));
|
|
|
assert_eq!(parse_string(r#""a\nb""#), Ok(("", "a\nb".to_string())));
|
|
|
assert_eq!(parse_string(r#""""#), Ok(("", "".to_string())));
|
|
|
```
|
|
|
|
|
|
### Comment parser
|
|
|
|
|
|
Comments are consumed and discarded — they produce no AST node.
|
|
|
|
|
|
```rust
|
|
|
use nom::{IResult, bytes::complete::is_not, sequence::pair,
|
|
|
character::complete::{char, line_ending}, combinator::opt,
|
|
|
combinator::value};
|
|
|
|
|
|
pub fn parse_comment(input: &str) -> IResult<&str, ()> {
|
|
|
value((), pair(char(';'), opt(is_not("\n\r"))))(input)
|
|
|
}
|
|
|
```
|
|
|
|
|
|
## Exercises
|
|
|
|
|
|
1. Extend the integer parser to also recognise hexadecimal literals prefixed with `0x` — use `alt` and `map_res` with `i64::from_str_radix`.
|
|
|
2. Extend the symbol parser to reject the single character `-` followed immediately by a digit (since that should be parsed as a negative integer).
|
|
|
|
|
|
Both exercises should have collapsible reference solutions.
|
|
|
|
|
|
## Style notes
|
|
|
|
|
|
- One subsection per atom type, in the order they will appear in the `alt` in §8
|
|
|
- Every code block must be self-contained with `use` statements
|
|
|
- Show tricky cases and why they are tricky before showing the solution — the reader should understand the pitfall, not just copy the fix
|
|
|
- nom version note: use `nom::bytes::complete` (not `nom::bytes::streaming`) throughout
|