You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
vibed/edu/.beans/archive/edu-unus--15-generating-c-c...

5.8 KiB

title status type priority created_at updated_at
§15 Generating C: Control Flow and Sequencing completed task normal 2026-03-10T23:30:02Z 2026-03-10T23:30:02Z

§15 Generating C: Control Flow and Sequencing — Stub to fill

File: edu/src/lisp-compiler.md, section ### 15. Generating C: Control Flow and Sequencing

Replace the stub line with full content. Target 700900 words. Handle the remaining forms: let, begin, and display/newline/error as statements. Introduces the expression-vs-statement distinction in code generation.

Learning objectives

  • Understand when to emit C expressions vs. C statements
  • Implement gen_stmt for side-effecting expressions
  • Generate let as a C block with local variable declarations
  • Generate begin as a sequence of statements with the last value forwarded
  • Generate display, newline, error as C function calls

Content to write

The expression-vs-statement problem

gen_expr from §13 generates C expressions — code that produces a value. But some MiniLisp constructs are used for their side effects: display prints something; begin sequences multiple expressions; let introduces a new scope. These map more naturally to C statements.

The solution: introduce gen_stmt(expr: &Expr) -> String that generates a C statement (terminated with ; or wrapped in {}) for forms that are used in statement position. gen_expr handles forms in expression position. Some forms (like if) can appear in either position and need both paths.

gen_stmt — the statement generator

/// Generate a C statement from a MiniLisp expression.
///
/// Used for: body expressions in functions, let bodies, begin sequences.
pub fn gen_stmt(expr: &Expr) -> String {
    match expr {
        // Side-effecting built-ins
        Expr::Call { func, args } if is_builtin_stmt(func) => gen_display_stmt(func, args),
        // Everything else: evaluate as an expression and discard the value
        _ => format!("(void){};", gen_expr(expr)),
    }
}

fn is_builtin_stmt(func: &Expr) -> bool {
    matches!(func, Expr::Symbol(s) if matches!(s.as_str(), "display" | "newline" | "error"))
}

Generating display, newline, error

fn gen_display_stmt(func: &Expr, args: &[Expr]) -> String {
    match func {
        Expr::Symbol(s) => match s.as_str() {
            "display" => {
                // We emit ml_display_int for all non-string arguments.
                // A type-aware compiler would choose ml_display_str for string expressions.
                let arg = gen_expr(&args[0]);
                match &args[0] {
                    Expr::Str(_) => format!("ml_display_str({});", arg),
                    Expr::Bool(_) => format!("ml_display_bool({});", arg),
                    _ => format!("ml_display_int({});", arg),
                }
            }
            "newline" => "ml_newline();".to_string(),
            "error"   => format!("ml_error({});", gen_expr(&args[0])),
            _ => unreachable!(),
        }
        _ => unreachable!(),
    }
}

Note the simplification: display picks the C variant based on the static form of the argument. (display x) where x is a symbol always emits ml_display_int(ml_x), even if x holds a boolean at runtime. For the programs in this course, this is acceptable. A production compiler would use a tagged union or a format string approach.

Generating let

let compiles to a C block with local variable declarations:

(let ((x 1) (y 2)) (+ x y))

({
    ml_int ml_x = 1;
    ml_int ml_y = 2;
    (ml_x + ml_y);
})

This uses GCC's statement expression extension: ({ ... }) is a block that returns the value of its last statement. This extension is supported by GCC and Clang but is not standard C99. Discuss the trade-off and the alternative (using a helper function per let).

fn gen_let(bindings: &[(String, Expr)], body: &[Expr]) -> String {
    let mut out = String::from("({\n");
    for (name, val) in bindings {
        out.push_str(&format!("    ml_int {} = {};\n", mangle(name), gen_expr(val)));
    }
    for expr in &body[..body.len() - 1] {
        out.push_str(&format!("    {};\n", gen_stmt(expr)));
    }
    out.push_str(&format!("    {};\n", gen_expr(body.last().unwrap())));
    out.push_str("})");
    out
}

Generating begin

begin in expression position uses the C comma operator; in statement position it is a sequence of statements:

fn gen_begin_expr(exprs: &[Expr]) -> String {
    // Comma operator: (e1, e2, ..., eN) evaluates all, returns eN
    let parts: Vec<String> = exprs.iter().map(gen_expr).collect();
    format!("({})", parts.join(", "))
}

In gen_expr, add:

Expr::Begin(exprs) => gen_begin_expr(exprs),
Expr::Let { bindings, body } => gen_let(bindings, body),

Tests

#[test]
fn test_gen_let() {
    let src = "(define (f) (let ((x 1) (y 2)) (+ x y)))";
    let c = generate(parse(src).unwrap());
    assert!(c.contains("ml_int ml_x = 1"));
    assert!(c.contains("ml_int ml_y = 2"));
}

#[test]
fn test_gen_begin() {
    let src = "(define (f) (begin (display 1) (display 2) 3))";
    let c = generate(parse(src).unwrap());
    assert!(c.contains("ml_display_int(1)"));
    assert!(c.contains("ml_display_int(2)"));
    assert!(c.contains("return 3"));
}

Style notes

  • The expression-vs-statement distinction is the key concept here — explain it at the top before any code
  • The statement expression ({...}) extension for let is a real trade-off — acknowledge it honestly
  • The display type dispatch simplification should be called out clearly — readers will ask "what if I display a boolean stored in a variable?"
  • End with a checkpoint: generate C for the complete factorial example; it should be correct and compilable