You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

126 lines
4.8 KiB
Markdown

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

+++
title = "§2 MiniLisp Language Specification"
priority = 5
status = "done"
ticket_type = "task"
dependencies = []
+++
## §2 MiniLisp Language Specification — Stub to fill
File: `edu/src/lisp-compiler.md`, section `### 2. MiniLisp Language Specification`
Replace the stub line with full content. Target 7001000 words. This is a reference section — clear, precise, complete. Define the entire language before the reader writes a single parser.
## Learning objectives
- Know every data type and its literal syntax
- Understand every special form and its evaluation rule
- Know which built-in operators and functions exist
- Know what is explicitly out of scope
- Have seen a complete, realistic MiniLisp program
## Content to write
### EBNF Grammar
Lead with the grammar as a compact reference:
```ebnf
program = expr* EOF
expr = atom | list | comment
atom = integer | boolean | string | symbol
integer = '-'? DIGIT+
boolean = '#t' | '#f'
string = '"' (char | escape)* '"'
escape = '\\' ('"' | '\\' | 'n' | 't')
symbol = sym_start sym_cont*
sym_start = ALPHA | '-' | '_' | '?' | '!' | '+' | '*' | '/' | '=' | '<' | '>'
sym_cont = sym_start | DIGIT
list = '(' expr* ')'
comment = ';' (not NEWLINE)* NEWLINE
```
Note: symbol must not match `-` followed by a digit (that is a negative integer).
### Data Types
**Integers.** 64-bit signed. Optional `-` followed by one or more decimal digits. Examples: `42`, `-7`, `0`.
**Booleans.** `#t` (true) and `#f` (false).
**Strings.** Double-quoted. Supported escapes: `\"`, `\\`, `\n`, `\t`. Example: `"hello, world\n"`.
**Symbols.** Identifiers made of letters, digits, and `-_?!+*/=<>`. Must not begin with a digit. Must not begin with `-` followed by a digit. Examples: `x`, `my-var`, `factorial`, `zero?`, `+`.
**Comments.** `;` to end of line. No block comments.
### Special Forms
For each form: syntax pattern, evaluation rule, and one or two examples.
**`(define <name> <expr>)`** — Bind `<name>` to the value of `<expr>`. At top level, creates a global. Inside a function body, creates a local.
**`(define (<name> <param>...) <body>...)`** — Shorthand for `(define <name> (lambda (<param>...) <body>...))`. Requires at least one body expression.
**`(lambda (<param>...) <body>...)`** — Creates a function. Parameters and body are evaluated in a new scope. Returns the value of the last body expression. In MiniLisp, lambdas may only reference their own parameters and top-level names (no closures over enclosing function locals).
**`(if <cond> <then> <else>)`** — Evaluates `<cond>`; returns `<then>` or `<else>` depending on truthiness. Both branches are required.
**`(let ((<name> <expr>)...) <body>...)`** — Evaluates each `<expr>` in the current scope, then binds results to the corresponding `<name>` in a new inner scope, then evaluates `<body>...` in that scope. Returns the last body value.
**`(begin <expr>...)`** — Evaluates each expression in order; returns the last value. Used when multiple side effects are needed inside an `if` branch.
### Built-in Operators
These compile directly to C infix operators. All take exactly two arguments except `not` (one argument).
| Form | C equivalent | Return type |
|---|---|---|
| `(+ a b)` | `a + b` | integer |
| `(- a b)` | `a - b` | integer |
| `(* a b)` | `a * b` | integer |
| `(/ a b)` | `a / b` | integer (truncating) |
| `(= a b)` | `a == b` | boolean |
| `(< a b)` | `a < b` | boolean |
| `(> a b)` | `a > b` | boolean |
| `(<= a b)` | `a <= b` | boolean |
| `(>= a b)` | `a >= b` | boolean |
| `(not a)` | `!a` | boolean |
### Built-in Functions
These compile to C function calls defined in the preamble.
| Form | Behaviour |
|---|---|
| `(display expr)` | Print value to stdout; integers with `%ld`, booleans as `true`/`false`, strings with `%s` |
| `(newline)` | Print `\n` to stdout |
| `(error msg)` | Print `msg` to stderr and call `exit(1)` |
### What Is NOT Supported
Be explicit. This sets expectations and prevents confusion:
- No floating-point numbers
- No pairs or lists (`cons`, `car`, `cdr`)
- No closures (lambdas cannot capture enclosing function locals)
- No tail-call optimisation
- No garbage collector
- No variadic functions
- No macros
- No `quote` or `quasiquote`
Section 18 discusses how these could be added.
### Complete Example Program
End with a realistic multi-function program that exercises: `define` (function and variable), `if`, recursion, `let`, `begin`, `display`, `newline`, and arithmetic. A good choice: define `even?`, `odd?`, and `collatz-length`, then print results for a few inputs.
## Style notes
- Grammar first (dense reference), then prose elaboration
- Tables for operators and built-ins
- Complete example is the last thing — a reward for reading the spec
- Be precise about symbol character set; ambiguity here causes parser bugs