vibed/edu/.beans/archive/edu-16fy--6-recognizing-ato...

---
# edu-16fy
title: '§6 Recognizing Atoms: Integers, Booleans, Strings, Symbols'
status: completed
type: task
priority: normal
created_at: 2026-03-10T23:30:01Z
updated_at: 2026-03-10T23:30:01Z
---

## §6 Recognizing Atoms: Integers, Booleans, Strings, Symbols — Stub to fill

File: `edu/src/lisp-compiler.md`, section `### 6. Recognizing Atoms: Integers, Booleans, Strings, Symbols`

Replace the stub line with full content. Target 800–1100 words. This is a hands-on section that builds one atom parser at a time. Each parser is developed in isolation before being combined in §8.

## Learning objectives

- Write a nom parser for each MiniLisp atom type
- Use `map_res`, `recognize`, `opt`, `alt`, `tag`, `char`, `take_while1`, `is_not`, `escaped_transform`
- Understand how to test parsers with `assert_eq!` on the full `IResult`
- Know the tricky cases: negative integers vs symbol `-`, `#t`/`#f` ambiguity, string escapes

## Content to write

Work through each atom parser in a subsection with: explanation, full code, tricky cases, and a test block.

### Integer parser

A signed decimal integer: optional `-`, then one or more digits, converted to `i64`.

```rust
use nom::{IResult, combinator::{map_res, recognize, opt}, character::complete::{char, digit1}, sequence::pair};

pub fn parse_integer(input: &str) -> IResult<&str, i64> {
    map_res(
        recognize(pair(opt(char('-')), digit1)),
        |s: &str| s.parse::<i64>()
    )(input)
}
```

Tricky case: the symbol `-` and negative integers. Because `opt(char('-'))` allows a lone `-`, `parse_integer("-")` will try to parse `-` as an integer and fail at `map_res` (because `"-"` does not parse as i64). This is fine — the failure is recoverable and `alt` in the atom parser will fall through to the symbol parser. However, this means the integer parser must be tried *before* the symbol parser in the `alt`.

Tests:
```rust
assert_eq!(parse_integer("42 rest"), Ok((" rest", 42)));
assert_eq!(parse_integer("-7"), Ok(("", -7)));
assert!(parse_integer("abc").is_err());
```

### Boolean parser

```rust
use nom::{IResult, branch::alt, bytes::complete::tag, combinator::value};

pub fn parse_bool(input: &str) -> IResult<&str, bool> {
    alt((
        value(true,  tag("#t")),
        value(false, tag("#f")),
    ))(input)
}
```

Explain `value(output, parser)` — discards the parser's output and returns a fixed value instead. This avoids a `map` that ignores its argument.

Tricky case: `#t` and `#f` must not be valid symbol characters, otherwise a symbol starting with `#` would be ambiguous. Confirm that `#` is not in the symbol character set (per §2).

### Symbol parser

Symbols start with a `sym_start` character and continue with zero or more `sym_cont` characters. Use `recognize` to return the input slice.

```rust
use nom::{IResult, combinator::recognize, sequence::pair,
          character::complete::{alpha1, alphanumeric1},
          bytes::complete::take_while1, branch::alt};

fn is_sym_start(c: char) -> bool {
    c.is_alphabetic() || "-_?!+*/=<>".contains(c)
}

fn is_sym_cont(c: char) -> bool {
    c.is_alphanumeric() || "-_?!+*/=<>".contains(c)
}

pub fn parse_symbol(input: &str) -> IResult<&str, &str> {
    recognize(pair(
        nom::bytes::complete::take_while_m_n(1, 1, is_sym_start),
        nom::bytes::complete::take_while(is_sym_cont),
    ))(input)
}
```

Tricky case: `+`, `*`, `/`, `=`, `<`, `>` are valid single-character symbols (used as operator names). The parser must handle them.

Tests:
```rust
assert_eq!(parse_symbol("my-var rest"), Ok((" rest", "my-var")));
assert_eq!(parse_symbol("+"), Ok(("", "+")));
assert!(parse_symbol("42").is_err());
```

### String parser

Double-quoted strings with escape sequences `\"`, `\\`, `\n`, `\t`.

```rust
use nom::{IResult, bytes::complete::{tag, is_not}, sequence::delimited,
          combinator::map, branch::alt};
use nom::bytes::complete::escaped_transform;
use nom::character::complete::char;

pub fn parse_string(input: &str) -> IResult<&str, String> {
    delimited(
        char('"'),
        escaped_transform(
            is_not("\\\""),
            '\\',
            alt((
                map(char('"'),  |_| "\""),
                map(char('\\'), |_| "\\"),
                map(char('n'),  |_| "\n"),
                map(char('t'),  |_| "\t"),
            ))
        ),
        char('"'),
    )(input)
}
```

Note: `escaped_transform` returns `String` (owned), not `&str`, because it must allocate when escape sequences are expanded.

Tricky case: an empty string `""` — `is_not` requires at least one character. Test it explicitly.

Tests:
```rust
assert_eq!(parse_string(r#""hello""#), Ok(("", "hello".to_string())));
assert_eq!(parse_string(r#""a\nb""#), Ok(("", "a\nb".to_string())));
assert_eq!(parse_string(r#""""#), Ok(("", "".to_string())));
```

### Comment parser

Comments are consumed and discarded — they produce no AST node.

```rust
use nom::{IResult, bytes::complete::is_not, sequence::pair,
          character::complete::{char, line_ending}, combinator::opt,
          combinator::value};

pub fn parse_comment(input: &str) -> IResult<&str, ()> {
    value((), pair(char(';'), opt(is_not("\n\r"))))(input)
}
```

## Exercises

1. Extend the integer parser to also recognise hexadecimal literals prefixed with `0x` — use `alt` and `map_res` with `i64::from_str_radix`.
2. Extend the symbol parser to reject the single character `-` followed immediately by a digit (since that should be parsed as a negative integer).

Both exercises should have collapsible reference solutions.

## Style notes

- One subsection per atom type, in the order they will appear in the `alt` in §8
- Every code block must be self-contained with `use` statements
- Show tricky cases and why they are tricky before showing the solution — the reader should understand the pitfall, not just copy the fix
- nom version note: use `nom::bytes::complete` (not `nom::bytes::streaming`) throughout