You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

158 lines
5.7 KiB
Markdown

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

+++
title = "§15 Generating C: Control Flow and Sequencing"
priority = 5
status = "todo"
ticket_type = "task"
dependencies = []
+++
## §15 Generating C: Control Flow and Sequencing — Stub to fill
File: `edu/src/lisp-compiler.md`, section `### 15. Generating C: Control Flow and Sequencing`
Replace the stub line with full content. Target 700900 words. Handle the remaining forms: `let`, `begin`, and `display`/`newline`/`error` as statements. Introduces the expression-vs-statement distinction in code generation.
## Learning objectives
- Understand when to emit C expressions vs. C statements
- Implement `gen_stmt` for side-effecting expressions
- Generate `let` as a C block with local variable declarations
- Generate `begin` as a sequence of statements with the last value forwarded
- Generate `display`, `newline`, `error` as C function calls
## Content to write
### The expression-vs-statement problem
`gen_expr` from §13 generates C *expressions* — code that produces a value. But some MiniLisp constructs are used for their *side effects*: `display` prints something; `begin` sequences multiple expressions; `let` introduces a new scope. These map more naturally to C *statements*.
The solution: introduce `gen_stmt(expr: &Expr) -> String` that generates a C statement (terminated with `;` or wrapped in `{}`) for forms that are used in statement position. `gen_expr` handles forms in expression position. Some forms (like `if`) can appear in either position and need both paths.
### `gen_stmt` — the statement generator
```rust
/// Generate a C statement from a MiniLisp expression.
///
/// Used for: body expressions in functions, let bodies, begin sequences.
pub fn gen_stmt(expr: &Expr) -> String {
match expr {
// Side-effecting built-ins
Expr::Call { func, args } if is_builtin_stmt(func) => gen_display_stmt(func, args),
// Everything else: evaluate as an expression and discard the value
_ => format!("(void){};", gen_expr(expr)),
}
}
fn is_builtin_stmt(func: &Expr) -> bool {
matches!(func, Expr::Symbol(s) if matches!(s.as_str(), "display" | "newline" | "error"))
}
```
### Generating `display`, `newline`, `error`
```rust
fn gen_display_stmt(func: &Expr, args: &[Expr]) -> String {
match func {
Expr::Symbol(s) => match s.as_str() {
"display" => {
// We emit ml_display_int for all non-string arguments.
// A type-aware compiler would choose ml_display_str for string expressions.
let arg = gen_expr(&args[0]);
match &args[0] {
Expr::Str(_) => format!("ml_display_str({});", arg),
Expr::Bool(_) => format!("ml_display_bool({});", arg),
_ => format!("ml_display_int({});", arg),
}
}
"newline" => "ml_newline();".to_string(),
"error" => format!("ml_error({});", gen_expr(&args[0])),
_ => unreachable!(),
}
_ => unreachable!(),
}
}
```
Note the simplification: `display` picks the C variant based on the *static* form of the argument. `(display x)` where `x` is a symbol always emits `ml_display_int(ml_x)`, even if `x` holds a boolean at runtime. For the programs in this course, this is acceptable. A production compiler would use a tagged union or a format string approach.
### Generating `let`
`let` compiles to a C block with local variable declarations:
```lisp
(let ((x 1) (y 2)) (+ x y))
```
```c
({
ml_int ml_x = 1;
ml_int ml_y = 2;
(ml_x + ml_y);
})
```
This uses GCC's *statement expression* extension: `({ ... })` is a block that returns the value of its last statement. This extension is supported by GCC and Clang but is not standard C99. Discuss the trade-off and the alternative (using a helper function per `let`).
```rust
fn gen_let(bindings: &[(String, Expr)], body: &[Expr]) -> String {
let mut out = String::from("({\n");
for (name, val) in bindings {
out.push_str(&format!(" ml_int {} = {};\n", mangle(name), gen_expr(val)));
}
for expr in &body[..body.len() - 1] {
out.push_str(&format!(" {};\n", gen_stmt(expr)));
}
out.push_str(&format!(" {};\n", gen_expr(body.last().unwrap())));
out.push_str("})");
out
}
```
### Generating `begin`
`begin` in expression position uses the C comma operator; in statement position it is a sequence of statements:
```rust
fn gen_begin_expr(exprs: &[Expr]) -> String {
// Comma operator: (e1, e2, ..., eN) evaluates all, returns eN
let parts: Vec<String> = exprs.iter().map(gen_expr).collect();
format!("({})", parts.join(", "))
}
```
In `gen_expr`, add:
```rust
Expr::Begin(exprs) => gen_begin_expr(exprs),
Expr::Let { bindings, body } => gen_let(bindings, body),
```
### Tests
```rust
#[test]
fn test_gen_let() {
let src = "(define (f) (let ((x 1) (y 2)) (+ x y)))";
let c = generate(parse(src).unwrap());
assert!(c.contains("ml_int ml_x = 1"));
assert!(c.contains("ml_int ml_y = 2"));
}
#[test]
fn test_gen_begin() {
let src = "(define (f) (begin (display 1) (display 2) 3))";
let c = generate(parse(src).unwrap());
assert!(c.contains("ml_display_int(1)"));
assert!(c.contains("ml_display_int(2)"));
assert!(c.contains("return 3"));
}
```
## Style notes
- The expression-vs-statement distinction is the key concept here — explain it at the top before any code
- The statement expression `({...})` extension for `let` is a real trade-off — acknowledge it honestly
- The `display` type dispatch simplification should be called out clearly — readers will ask "what if I display a boolean stored in a variable?"
- End with a checkpoint: generate C for the complete factorial example; it should be correct and compilable