You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
vibed/edu/.beans/edu-mmbr--8-parsing-atoms-w...

140 lines
4.3 KiB
Markdown

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

---
# edu-mmbr
title: §8 Parsing Atoms with nom
status: completed
type: task
priority: normal
created_at: 2026-03-10T23:30:02Z
updated_at: 2026-03-10T23:30:02Z
---
## §8 Parsing Atoms with nom — Stub to fill
File: `edu/src/lisp-compiler.md`, section `### 8. Parsing Atoms with nom`
Replace the stub line with full content. Target 600800 words. This section takes the individual atom parsers from §6 and the AST from §7 and combines them into a single `parse_atom` function that returns `IResult<&str, Expr>`. Includes tests.
## Learning objectives
- Combine individual atom parsers into a single `alt` using `map` to produce `Expr` values
- Understand how to add `src/parser.rs` to the project properly
- Write comprehensive unit tests for atom parsing
- Handle the ordering constraint in `alt`: integers before symbols
## Content to write
### The `parse_atom` function
In `src/parser.rs`, import the atom parsers from §6 and the `Expr` type from `src/ast.rs`, then combine them:
```rust
use nom::{IResult, branch::alt, combinator::map};
use crate::ast::Expr;
/// Parse any MiniLisp atom: integer, boolean, string, or symbol.
pub fn parse_atom(input: &str) -> IResult<&str, Expr> {
alt((
map(parse_integer, Expr::Int),
map(parse_bool, Expr::Bool),
map(parse_string, Expr::Str),
map(parse_symbol, |s: &str| Expr::Symbol(s.to_string())),
))(input)
}
```
Explain the ordering:
1. **Integer before symbol**: `-7` must match as integer, not as a symbol starting with `-`. Because `parse_integer` consumes the full `-7` before `parse_symbol` is tried, the ordering ensures correct behavior.
2. **Boolean before symbol**: `#t` and `#f` are not valid symbols (since `#` is not a symbol-start character), so ordering here does not matter — but it is cleaner to try booleans first.
3. **String last among atoms**: no overlap with the others since strings start with `"`.
### Module organisation
Show the complete `src/parser.rs` header at this point:
```rust
//! Parser for MiniLisp source code.
//!
//! Entry point: [`parse`] which accepts a `&str` and returns `Vec<Expr>`.
use nom::{
IResult,
branch::alt,
bytes::complete::{escaped_transform, is_not, tag, take_while, take_while_m_n},
character::complete::{char, digit1, multispace0, line_ending},
combinator::{map, map_res, opt, recognize, value},
sequence::{delimited, pair},
};
use crate::ast::Expr;
```
### Whitespace-aware atom parser
Wrap `parse_atom` in the `ws` combinator so callers do not have to think about surrounding whitespace:
```rust
pub fn parse_atom_ws(input: &str) -> IResult<&str, Expr> {
ws(parse_atom)(input)
}
```
### Unit tests
Write a `#[cfg(test)]` module in `src/parser.rs` testing every atom type with multiple cases:
```rust
#[cfg(test)]
mod tests {
use super::*;
use crate::ast::Expr;
#[test]
fn test_integer_atom() {
assert_eq!(parse_atom("42"), Ok(("", Expr::Int(42))));
assert_eq!(parse_atom("-7 "), Ok((" ", Expr::Int(-7))));
assert_eq!(parse_atom("0"), Ok(("", Expr::Int(0))));
}
#[test]
fn test_bool_atom() {
assert_eq!(parse_atom("#t"), Ok(("", Expr::Bool(true))));
assert_eq!(parse_atom("#f"), Ok(("", Expr::Bool(false))));
}
#[test]
fn test_string_atom() {
assert_eq!(parse_atom(r#""hello""#), Ok(("", Expr::Str("hello".into()))));
assert_eq!(parse_atom(r#""a\nb""#), Ok(("", Expr::Str("a\nb".into()))));
}
#[test]
fn test_symbol_atom() {
assert_eq!(parse_atom("my-var"), Ok(("", Expr::Symbol("my-var".into()))));
assert_eq!(parse_atom("+"), Ok(("", Expr::Symbol("+".into()))));
assert_eq!(parse_atom("factorial rest"), Ok((" rest", Expr::Symbol("factorial".into()))));
}
#[test]
fn test_negative_integer_vs_symbol() {
// -7 must be an integer, not a symbol
assert_eq!(parse_atom("-7"), Ok(("", Expr::Int(-7))));
// lone - is a symbol
assert_eq!(parse_atom("- "), Ok((" ", Expr::Symbol("-".into()))));
}
}
```
### Run the tests
```sh
cargo test parser
```
All tests should pass before proceeding to §9.
## Style notes
- The ordering section is the most important teaching moment here — make it explicit
- Show how `map` is used to lift a primitive value into an `Expr` variant
- The test for `-7` vs `-` (lone minus) is critical — flag it as something to get right