You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

5.6 KiB

+++ title = "§14 Generating C: Definitions and Functions" priority = 5 status = "done" ticket_type = "task" dependencies = [] +++

§14 Generating C: Definitions and Functions — Stub to fill

File: edu/src/lisp-compiler.md, section ### 14. Generating C: Definitions and Functions

Replace the stub line with full content. Target 700900 words. Implement code generation for top-level define forms and lambda expressions, including forward declarations for mutual recursion.

Learning objectives

  • Emit forward declarations for all functions before their definitions
  • Generate a correct C function signature from a Lambda with named parameters
  • Handle variable define vs. function define
  • Understand C's requirement for forward declarations and why MiniLisp needs them

Content to write

Why forward declarations?

In C, a function must be declared before it is called. If even? calls odd? and odd? calls even?, whichever is defined first will try to call a symbol that has not yet been declared. Forward declarations — just the function signature with no body — solve this by telling the C compiler the signature exists before the definition appears.

MiniLisp makes no guarantees about definition order, so we emit forward declarations for every top-level function before any definition.

Two-pass code generation

The code generator uses two passes over the top-level Vec<Expr>:

  1. Forward declaration pass: emit ml_int ml_name(ml_int param1, ...); for every top-level define that wraps a lambda.
  2. Definition pass: emit the full function body (or variable initializer) for every top-level define.

Type signatures

MiniLisp has no type annotations. All values compile to ml_int (which is int64_t). This includes:

  • Integers: trivially ml_int
  • Booleans: stored as ml_int (0 or 1)
  • Strings: a limitation — string-returning functions are declared as ml_int too, which is technically wrong but will compile for our simple programs. Acknowledge this simplification.

A more honest approach would be to use void* or a tagged union — note this in the "What's Next" section.

Generating a forward declaration

fn gen_forward_decl(name: &str, lambda: &Expr) -> String {
    if let Expr::Lambda { params, .. } = lambda {
        let c_name = mangle(name);
        let param_list: Vec<String> = params.iter()
            .map(|p| format!("ml_int {}", mangle(p)))
            .collect();
        format!("ml_int {}({});\n", c_name, param_list.join(", "))
    } else {
        String::new() // variable define; no forward declaration needed
    }
}

Generating a function definition

fn gen_function_def(name: &str, params: &[String], body: &[Expr]) -> String {
    let c_name = mangle(name);
    let param_list: Vec<String> = params.iter()
        .map(|p| format!("ml_int {}", mangle(p)))
        .collect();
    let mut out = format!("ml_int {}({}) {{\n", c_name, param_list.join(", "));

    // All body expressions except the last are statements (side effects)
    for expr in &body[..body.len() - 1] {
        out.push_str(&format!("    {};\n", gen_stmt(expr)));
    }
    // Last body expression is the return value
    let last = body.last().unwrap();
    out.push_str(&format!("    return {};\n", gen_expr(last)));
    out.push_str("}\n");
    out
}

Explain the idiom: all but the last body expression are evaluated as statements (for side effects like display); the last is used as the return value. This mirrors Lisp's implicit return of the last expression.

Generating a variable definition

fn gen_variable_def(name: &str, value: &Expr) -> String {
    format!("ml_int {} = {};\n", mangle(name), gen_expr(value))
}

Variable definitions at top level become global C variables.

The full generate function

pub fn generate(exprs: Vec<Expr>) -> String {
    let mut out = String::new();
    out.push_str(PREAMBLE);

    // Pass 1: forward declarations for all top-level functions
    for expr in &exprs {
        if let Expr::Define { name, value } = expr {
            out.push_str(&gen_forward_decl(name, value));
        }
    }
    out.push('\n');

    // Pass 2: definitions
    for expr in &exprs {
        match expr {
            Expr::Define { name, value } => match value.as_ref() {
                Expr::Lambda { params, body } =>
                    out.push_str(&gen_function_def(name, params, body)),
                _ =>
                    out.push_str(&gen_variable_def(name, value)),
            }
            // Top-level non-define expressions: emit in main()
            _ => {} // handled in §16
        }
    }

    out
}

Tests

#[test]
fn test_simple_function() {
    let src = "(define (square x) (* x x))";
    let exprs = parse(src).unwrap();
    let c = generate(exprs);
    assert!(c.contains("ml_int ml_square(ml_int ml_x)"));
    assert!(c.contains("return (ml_x * ml_x)"));
}

#[test]
fn test_forward_decl_present() {
    let src = "(define (f x) (g x))\n(define (g x) x)";
    let c = generate(parse(src).unwrap());
    // f's forward decl must appear before g's definition
    let fwd_pos = c.find("ml_int ml_f(").unwrap();
    let def_pos = c.find("ml_int ml_g(ml_int ml_x) {").unwrap();
    assert!(fwd_pos < def_pos);
}

Style notes

  • Lead with the forward declaration problem — it's the "aha" moment of this section
  • The two-pass structure is conceptually important; diagram it clearly
  • Acknowledge the "everything is ml_int" simplification explicitly; readers will notice it
  • The body[..body.len()-1] slice for all-but-last is a small Rust trick worth calling out