|
|
---
|
|
|
# edu-mmbr
|
|
|
title: §8 Parsing Atoms with nom
|
|
|
status: completed
|
|
|
type: task
|
|
|
priority: normal
|
|
|
created_at: 2026-03-10T23:30:02Z
|
|
|
updated_at: 2026-03-10T23:30:02Z
|
|
|
---
|
|
|
|
|
|
## §8 Parsing Atoms with nom — Stub to fill
|
|
|
|
|
|
File: `edu/src/lisp-compiler.md`, section `### 8. Parsing Atoms with nom`
|
|
|
|
|
|
Replace the stub line with full content. Target 600–800 words. This section takes the individual atom parsers from §6 and the AST from §7 and combines them into a single `parse_atom` function that returns `IResult<&str, Expr>`. Includes tests.
|
|
|
|
|
|
## Learning objectives
|
|
|
|
|
|
- Combine individual atom parsers into a single `alt` using `map` to produce `Expr` values
|
|
|
- Understand how to add `src/parser.rs` to the project properly
|
|
|
- Write comprehensive unit tests for atom parsing
|
|
|
- Handle the ordering constraint in `alt`: integers before symbols
|
|
|
|
|
|
## Content to write
|
|
|
|
|
|
### The `parse_atom` function
|
|
|
|
|
|
In `src/parser.rs`, import the atom parsers from §6 and the `Expr` type from `src/ast.rs`, then combine them:
|
|
|
|
|
|
```rust
|
|
|
use nom::{IResult, branch::alt, combinator::map};
|
|
|
use crate::ast::Expr;
|
|
|
|
|
|
/// Parse any MiniLisp atom: integer, boolean, string, or symbol.
|
|
|
pub fn parse_atom(input: &str) -> IResult<&str, Expr> {
|
|
|
alt((
|
|
|
map(parse_integer, Expr::Int),
|
|
|
map(parse_bool, Expr::Bool),
|
|
|
map(parse_string, Expr::Str),
|
|
|
map(parse_symbol, |s: &str| Expr::Symbol(s.to_string())),
|
|
|
))(input)
|
|
|
}
|
|
|
```
|
|
|
|
|
|
Explain the ordering:
|
|
|
1. **Integer before symbol**: `-7` must match as integer, not as a symbol starting with `-`. Because `parse_integer` consumes the full `-7` before `parse_symbol` is tried, the ordering ensures correct behavior.
|
|
|
2. **Boolean before symbol**: `#t` and `#f` are not valid symbols (since `#` is not a symbol-start character), so ordering here does not matter — but it is cleaner to try booleans first.
|
|
|
3. **String last among atoms**: no overlap with the others since strings start with `"`.
|
|
|
|
|
|
### Module organisation
|
|
|
|
|
|
Show the complete `src/parser.rs` header at this point:
|
|
|
|
|
|
```rust
|
|
|
//! Parser for MiniLisp source code.
|
|
|
//!
|
|
|
//! Entry point: [`parse`] which accepts a `&str` and returns `Vec<Expr>`.
|
|
|
|
|
|
use nom::{
|
|
|
IResult,
|
|
|
branch::alt,
|
|
|
bytes::complete::{escaped_transform, is_not, tag, take_while, take_while_m_n},
|
|
|
character::complete::{char, digit1, multispace0, line_ending},
|
|
|
combinator::{map, map_res, opt, recognize, value},
|
|
|
sequence::{delimited, pair},
|
|
|
};
|
|
|
|
|
|
use crate::ast::Expr;
|
|
|
```
|
|
|
|
|
|
### Whitespace-aware atom parser
|
|
|
|
|
|
Wrap `parse_atom` in the `ws` combinator so callers do not have to think about surrounding whitespace:
|
|
|
|
|
|
```rust
|
|
|
pub fn parse_atom_ws(input: &str) -> IResult<&str, Expr> {
|
|
|
ws(parse_atom)(input)
|
|
|
}
|
|
|
```
|
|
|
|
|
|
### Unit tests
|
|
|
|
|
|
Write a `#[cfg(test)]` module in `src/parser.rs` testing every atom type with multiple cases:
|
|
|
|
|
|
```rust
|
|
|
#[cfg(test)]
|
|
|
mod tests {
|
|
|
use super::*;
|
|
|
use crate::ast::Expr;
|
|
|
|
|
|
#[test]
|
|
|
fn test_integer_atom() {
|
|
|
assert_eq!(parse_atom("42"), Ok(("", Expr::Int(42))));
|
|
|
assert_eq!(parse_atom("-7 "), Ok((" ", Expr::Int(-7))));
|
|
|
assert_eq!(parse_atom("0"), Ok(("", Expr::Int(0))));
|
|
|
}
|
|
|
|
|
|
#[test]
|
|
|
fn test_bool_atom() {
|
|
|
assert_eq!(parse_atom("#t"), Ok(("", Expr::Bool(true))));
|
|
|
assert_eq!(parse_atom("#f"), Ok(("", Expr::Bool(false))));
|
|
|
}
|
|
|
|
|
|
#[test]
|
|
|
fn test_string_atom() {
|
|
|
assert_eq!(parse_atom(r#""hello""#), Ok(("", Expr::Str("hello".into()))));
|
|
|
assert_eq!(parse_atom(r#""a\nb""#), Ok(("", Expr::Str("a\nb".into()))));
|
|
|
}
|
|
|
|
|
|
#[test]
|
|
|
fn test_symbol_atom() {
|
|
|
assert_eq!(parse_atom("my-var"), Ok(("", Expr::Symbol("my-var".into()))));
|
|
|
assert_eq!(parse_atom("+"), Ok(("", Expr::Symbol("+".into()))));
|
|
|
assert_eq!(parse_atom("factorial rest"), Ok((" rest", Expr::Symbol("factorial".into()))));
|
|
|
}
|
|
|
|
|
|
#[test]
|
|
|
fn test_negative_integer_vs_symbol() {
|
|
|
// -7 must be an integer, not a symbol
|
|
|
assert_eq!(parse_atom("-7"), Ok(("", Expr::Int(-7))));
|
|
|
// lone - is a symbol
|
|
|
assert_eq!(parse_atom("- "), Ok((" ", Expr::Symbol("-".into()))));
|
|
|
}
|
|
|
}
|
|
|
```
|
|
|
|
|
|
### Run the tests
|
|
|
|
|
|
```sh
|
|
|
cargo test parser
|
|
|
```
|
|
|
|
|
|
All tests should pass before proceeding to §9.
|
|
|
|
|
|
## Style notes
|
|
|
|
|
|
- The ordering section is the most important teaching moment here — make it explicit
|
|
|
- Show how `map` is used to lift a primitive value into an `Expr` variant
|
|
|
- The test for `-7` vs `-` (lone minus) is critical — flag it as something to get right
|