From 3089d71c8bfad6be4ee6bfd23afa2e70db823de5 Mon Sep 17 00:00:00 2001 From: "Jonathan D.A. Jewell" <6759885+hyperpolymath@users.noreply.github.com> Date: Mon, 11 May 2026 01:49:53 +0200 Subject: [PATCH 1/3] docs: clarify codegen environment rules for top-level const and fn bindings (closes #89) Add const_decl, extern_type_decl, extern_fn_decl to the top_level grammar in SPEC.md 2.1. Add sections 2.9 and 2.10 covering their syntax. Add section 8 (Codegen Module Environment) documenting the func_indices dual-use encoding: non-negative keys are Wasm function indices, negative keys are -(global_idx+1) for constants. Update the func_indices field comment in lib/codegen.ml to match. Co-Authored-By: Claude Sonnet 4.6 --- docs/specs/SPEC.md | 60 +++++++++++++++++++++++++++++++++++++++++++++- lib/codegen.ml | 6 ++++- 2 files changed, 64 insertions(+), 2 deletions(-) diff --git a/docs/specs/SPEC.md b/docs/specs/SPEC.md index 9c6f878..72e1a65 100644 --- a/docs/specs/SPEC.md +++ b/docs/specs/SPEC.md @@ -25,7 +25,7 @@ row_var = ".." lower_ident ### 1.2 Keywords ``` -fn let mut own ref type struct enum trait impl effect handle +fn let const extern mut own ref type struct enum trait impl effect handle resume handler match if else while for return break continue in true false where total module use pub as unsafe assume transmute forget Nat Int Bool Float String Type Row @@ -68,6 +68,7 @@ Special: \ (row restriction) ```ebnf program = [module_decl] {import_decl} {top_level} top_level = fn_decl | type_decl | trait_decl | impl_block | effect_decl + | const_decl | extern_type_decl | extern_fn_decl ``` ### 2.2 Type Declarations @@ -194,6 +195,27 @@ impl_block = "impl" [type_params] [trait_ref "for"] type_expr [where_clause] "{" {impl_item} "}" ``` +### 2.9 Constant Declarations + +```ebnf +const_decl = [visibility] "const" LOWER_IDENT ":" type_expr "=" expr ";" +``` + +Top-level `const` bindings are evaluated at compile time and emitted as +immutable WebAssembly globals. Both function names and const names are +registered in the codegen name environment (see §5.1). + +### 2.10 Extern Declarations + +```ebnf +extern_type_decl = "extern" "type" UPPER_IDENT ";" +extern_fn_decl = "extern" "fn" LOWER_IDENT "(" [param_list] ")" ["->" type_expr] ";" +``` + +`extern type` declares an opaque host-provided type. `extern fn` declares a +function whose implementation is supplied by the host environment at link time +(a WebAssembly import). Both are top-level only and carry no body. + ## 3. Type System ### 3.1 Judgement Forms @@ -524,6 +546,42 @@ Compiles to (ownership removed): (call $close (local.get $file))) ``` +## 8. Codegen Module Environment + +This section describes how the WebAssembly code generator (`lib/codegen.ml`) +builds its name environment. It is implementation documentation aimed at +contributors; the language semantics are fully specified in §2–4. + +### 8.1 Name Environment (`func_indices`) + +The codegen context maintains a single association list `func_indices : +(string * int) list` that maps every top-level name visible at call sites to an +integer key. Two distinct kinds of binding share this table: + +| Source declaration | Key value | Meaning | +|--------------------|-----------|---------| +| `fn f(…) { … }` | `k ≥ 0` | WebAssembly function index (import + defined functions combined) | +| `const C: T = e` | `-(g+1)` where `g` is the global index | Negative sentinel; caller must emit `global.get g` not `call k` | + +The negative-index encoding lets call-site code distinguish constants from +functions with a single sign test before emitting the appropriate instruction. + +**Population order.** Both `TopFn` and `TopConst` are processed by `gen_decl` +in declaration order (the single pass in `gen_program`). Each inserts its name +into `func_indices` before any later declaration can reference it. Forward +references to functions are therefore not supported in the current +single-pass design. + +### 8.2 Extern Bindings + +`TopExternFn` declarations are added to the WebAssembly import section and +their names are registered in `func_indices` with the resulting import function +index. `TopExternType` declarations register an opaque type name and generate +no code. + +The WebAssembly module name for an `extern fn` import defaults to `"env"` unless +overridden by a `@module("…")` pragma on the declaration (not yet implemented). + ## Appendix: Grammar Reference See the full specification at `affinescript-spec.md` for: diff --git a/lib/codegen.ml b/lib/codegen.ml index dde386c..18a3c32 100644 --- a/lib/codegen.ml +++ b/lib/codegen.ml @@ -28,7 +28,11 @@ type context = { locals : (string * int) list; (** local variable name to index map *) next_local : int; (** next available local index *) loop_depth : int; (** current loop nesting depth *) - func_indices : (string * int) list; (** function name to index map *) + func_indices : (string * int) list; + (** Top-level name environment shared by functions and constants. + - [k >= 0]: Wasm function index (imports + defined functions). + - [k < 0]: Constant (global): actual global index is [-(k+1)]. + Both [TopFn] and [TopConst] insert into this table in declaration order. *) lambda_funcs : func list; (** lifted lambda functions *) next_lambda_id : int; (** next lambda function ID *) heap_ptr : int option; (** global index for heap pointer, if initialized *) From b29a1dd7442b4ceb33fb1a8c787caae7d03ad6c8 Mon Sep 17 00:00:00 2001 From: "Jonathan D.A. Jewell" <6759885+hyperpolymath@users.noreply.github.com> Date: Mon, 11 May 2026 03:43:23 +0200 Subject: [PATCH 2/3] docs: tighten SPEC.md codegen-environment section, drop unmerged extern grammar MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Review pass on the #89 spec changes against the current main: - 1.2 / 2.1 / 2.10: remove `extern`, `extern_type_decl`, and `extern_fn_decl`. `extern` is not yet a reserved keyword on main and is not parsed by the current lib/parser.mly; the AST has no `TopExternFn` / `TopExternType` variants. Documenting these forms here advertises features that arrive with a separate, unmerged PR (#92, fix/issue-42-extern-parsing). Spec re-introduces these sections when the implementation lands. - 2.9: fix the cross-reference (was "see §5.1" which is Primitive Types; correct target is §8). Rephrase to make clear that the initializer must reduce to a Wasm constant expression in the linear-memory backend. - 8.1: fix `gen_program` -> `generate_module` (the actual fold-over-decls entry point in lib/codegen.ml). Rewrite the section to: * separate the encoding (data layout) from the lookup (instruction selection) so the reader understands what's currently implemented vs. what issue #73 still needs; * call out that `ExprApp (ExprVar _, _)` currently emits `call k` unconditionally, so the sign-test decode path for the negative sentinel is the remaining piece tracked under #73 rather than being described as already implemented; * spell out per-case `gen_decl` behaviour (TopFn registers before generating its body; TopConst registers after appending the global). - 8.2: removed alongside the extern grammar — same rationale as 2.10. The lib/codegen.ml `func_indices` field doc-comment from the original PR was accurate and stays. Co-Authored-By: Claude Sonnet 4.6 --- docs/specs/SPEC.md | 84 +++++++++++++++++++++++++--------------------- 1 file changed, 46 insertions(+), 38 deletions(-) diff --git a/docs/specs/SPEC.md b/docs/specs/SPEC.md index 72e1a65..02f8531 100644 --- a/docs/specs/SPEC.md +++ b/docs/specs/SPEC.md @@ -25,7 +25,7 @@ row_var = ".." lower_ident ### 1.2 Keywords ``` -fn let const extern mut own ref type struct enum trait impl effect handle +fn let const mut own ref type struct enum trait impl effect handle resume handler match if else while for return break continue in true false where total module use pub as unsafe assume transmute forget Nat Int Bool Float String Type Row @@ -68,7 +68,7 @@ Special: \ (row restriction) ```ebnf program = [module_decl] {import_decl} {top_level} top_level = fn_decl | type_decl | trait_decl | impl_block | effect_decl - | const_decl | extern_type_decl | extern_fn_decl + | const_decl ``` ### 2.2 Type Declarations @@ -201,20 +201,15 @@ impl_block = "impl" [type_params] [trait_ref "for"] type_expr const_decl = [visibility] "const" LOWER_IDENT ":" type_expr "=" expr ";" ``` -Top-level `const` bindings are evaluated at compile time and emitted as -immutable WebAssembly globals. Both function names and const names are -registered in the codegen name environment (see §5.1). +A top-level `const` binding compiles to an immutable WebAssembly global. The +initializer expression must reduce to a Wasm constant expression (a literal +or a constant arithmetic combination thereof); non-constant initializers are +not yet supported by the linear-memory backend. -### 2.10 Extern Declarations - -```ebnf -extern_type_decl = "extern" "type" UPPER_IDENT ";" -extern_fn_decl = "extern" "fn" LOWER_IDENT "(" [param_list] ")" ["->" type_expr] ";" -``` - -`extern type` declares an opaque host-provided type. `extern fn` declares a -function whose implementation is supplied by the host environment at link time -(a WebAssembly import). Both are top-level only and carry no body. +Both function names and const names are registered in the same codegen name +environment so that later top-level declarations may refer to either kind of +binding by name. See §8 (*Codegen Module Environment*) for the encoding and +the current single-pass population order. ## 3. Type System @@ -554,33 +549,46 @@ contributors; the language semantics are fully specified in §2–4. ### 8.1 Name Environment (`func_indices`) -The codegen context maintains a single association list `func_indices : -(string * int) list` that maps every top-level name visible at call sites to an +The codegen context maintains a single association list + +```ocaml +func_indices : (string * int) list +``` + +that maps every top-level name visible at later declaration sites to an integer key. Two distinct kinds of binding share this table: | Source declaration | Key value | Meaning | |--------------------|-----------|---------| -| `fn f(…) { … }` | `k ≥ 0` | WebAssembly function index (import + defined functions combined) | -| `const C: T = e` | `-(g+1)` where `g` is the global index | Negative sentinel; caller must emit `global.get g` not `call k` | - -The negative-index encoding lets call-site code distinguish constants from -functions with a single sign test before emitting the appropriate instruction. - -**Population order.** Both `TopFn` and `TopConst` are processed by `gen_decl` -in declaration order (the single pass in `gen_program`). Each inserts its name -into `func_indices` before any later declaration can reference it. Forward -references to functions are therefore not supported in the current -single-pass design. - -### 8.2 Extern Bindings - -`TopExternFn` declarations are added to the WebAssembly import section and -their names are registered in `func_indices` with the resulting import function -index. `TopExternType` declarations register an opaque type name and generate -no code. - -The WebAssembly module name for an `extern fn` import defaults to `"env"` unless -overridden by a `@module("…")` pragma on the declaration (not yet implemented). +| `fn f(…) { … }` | `k ≥ 0` | WebAssembly function index (imports + defined functions, combined) | +| `const C: T = e` | `-(g + 1)`, where `g` is the global's index in the Wasm `globals` vector | Negative sentinel reserved for constants | + +Sign-based partitioning is deliberate: `k ≥ 0` decodes directly as a Wasm +`funcidx`, and `k < 0` recovers the global index as `g = -(k + 1)`. A +single integer per name keeps the lookup uniform across both kinds of binding. + +**Population.** Top-level declarations are visited in source order by +`gen_decl`, which is folded over `prog.prog_decls` from `generate_module`. +The two relevant cases are: + +- `TopFn fd` — registers `(fd.fd_name.name, func_idx)` in `func_indices` + *before* generating the function body, so the body may recursively refer + to its own name. +- `TopConst tc` — generates the global initializer, appends the global to + `ctx.globals`, then registers `(tc.tc_name.name, -(global_idx + 1))` in + `func_indices`. + +Because population is strictly single-pass and in declaration order, +forward references (to either functions or constants declared later in the +file) are not supported by the current backend. + +**Call-site lookup.** The `ExprApp (ExprVar id, _)` branch of `gen_expr` +consults `func_indices` to translate a direct call into a Wasm `call k` +instruction. Decoding the negative sentinel back to a `global.get` — +needed to make a bare `const` identifier usable inside another top-level +declaration's body — is tracked as a known gap in issue #73. The encoding +documented in this section is the data layout the fix relies on; the +call-site decode path will land alongside that fix. ## Appendix: Grammar Reference From d7c9e6240da28a43d477e03a26f38e70b19fe198 Mon Sep 17 00:00:00 2001 From: "Jonathan D.A. Jewell" <6759885+hyperpolymath@users.noreply.github.com> Date: Mon, 11 May 2026 04:25:20 +0200 Subject: [PATCH 3/3] docs: restore extern coverage after rebase onto main (#90 merged) PR #90 landed `extern fn` / `extern type` parsing and codegen on main while this branch was in review. Restore the SPEC.md coverage to match what actually shipped: - 1.2: re-add `extern` to the keyword list. - 2.1: re-add `extern_fn_decl` and `extern_type_decl` to `top_level`. Note the parser uses two separate productions that both feed back into `TopFn` / `TopType` AST variants (with `FnExtern` / `TyExtern` as the body kind); the spec describes the surface grammar, not the AST shape. - 2.10: re-introduce, but with the actual parsed shape (the productions accept `type_params`, the fn form accepts `effects`, and both accept optional `visibility`) rather than the simplified form from the original PR. Clarify the runtime contract: extern fn lowers to a Wasm import, extern type generates no artifact. - 8.1 / 8.2: split the `TopFn` population case into the `FnExtern` / non-`FnExtern` variants so the description matches the guard pattern in lib/codegen.ml. Hard-coded `"env"` Wasm module name is called out explicitly. Also update the lib/codegen.ml doc-comment on `func_indices` to mention both `TopFn` paths (defined and extern) and clarify that insertion order is source order. Co-Authored-By: Claude Sonnet 4.6 --- docs/specs/SPEC.md | 64 +++++++++++++++++++++++++++++++++++++++++----- lib/codegen.ml | 5 +++- 2 files changed, 61 insertions(+), 8 deletions(-) diff --git a/docs/specs/SPEC.md b/docs/specs/SPEC.md index 02f8531..57c51ee 100644 --- a/docs/specs/SPEC.md +++ b/docs/specs/SPEC.md @@ -25,7 +25,7 @@ row_var = ".." lower_ident ### 1.2 Keywords ``` -fn let const mut own ref type struct enum trait impl effect handle +fn let const extern mut own ref type struct enum trait impl effect handle resume handler match if else while for return break continue in true false where total module use pub as unsafe assume transmute forget Nat Int Bool Float String Type Row @@ -68,7 +68,7 @@ Special: \ (row restriction) ```ebnf program = [module_decl] {import_decl} {top_level} top_level = fn_decl | type_decl | trait_decl | impl_block | effect_decl - | const_decl + | const_decl | extern_fn_decl | extern_type_decl ``` ### 2.2 Type Declarations @@ -211,6 +211,28 @@ environment so that later top-level declarations may refer to either kind of binding by name. See §8 (*Codegen Module Environment*) for the encoding and the current single-pass population order. +### 2.10 Extern Declarations + +```ebnf +extern_fn_decl = [visibility] "extern" "fn" LOWER_IDENT + [type_params] "(" [param_list] ")" + ["->" type_expr] ["/" effects] ";" +extern_type_decl = [visibility] "extern" "type" UPPER_IDENT + [type_params] ";" +``` + +`extern fn` declares a function whose implementation is supplied by the host +environment at link time. The linear-memory WebAssembly backend lowers each +`extern fn` to an `(import "env" "" (func …))` entry; the import slot +is registered in the codegen name environment so call sites resolve through +`call k` exactly as for locally-defined functions (see §8). + +`extern type` declares an opaque, host-provided type. It carries no runtime +representation and generates no Wasm artifact; the typechecker treats the +name as a nominal opaque type whose internal structure is unknown. + +Both forms are terminated by `;` and carry no body. + ## 3. Type System ### 3.1 Judgement Forms @@ -569,11 +591,18 @@ single integer per name keeps the lookup uniform across both kinds of binding. **Population.** Top-level declarations are visited in source order by `gen_decl`, which is folded over `prog.prog_decls` from `generate_module`. -The two relevant cases are: - -- `TopFn fd` — registers `(fd.fd_name.name, func_idx)` in `func_indices` - *before* generating the function body, so the body may recursively refer - to its own name. +The relevant cases are: + +- `TopFn fd` with `fd.fd_body <> FnExtern` — picks the next Wasm function + index (`import_func_count ctx + List.length ctx.funcs`), registers + `(fd.fd_name.name, func_idx)` in `func_indices` *before* generating the + body so the body may recursively refer to its own name, then appends the + emitted function to `ctx.funcs`. +- `TopFn fd` with `fd.fd_body = FnExtern` — emits a Wasm import (module + `"env"`, name `fd.fd_name.name`) and registers + `(fd.fd_name.name, import_func_idx)` in `func_indices`, where + `import_func_idx` is the number of imports before adding this one. No + function body is generated. See §8.2. - `TopConst tc` — generates the global initializer, appends the global to `ctx.globals`, then registers `(tc.tc_name.name, -(global_idx + 1))` in `func_indices`. @@ -590,6 +619,27 @@ declaration's body — is tracked as a known gap in issue #73. The encoding documented in this section is the data layout the fix relies on; the call-site decode path will land alongside that fix. +### 8.2 Extern Bindings + +An `extern fn name(…) -> Ret;` declaration produces a `TopFn` with +`fd_body = FnExtern`. Codegen lowers it to a Wasm import: + +``` +(import "env" "" (func (param …) (result …))) +``` + +The resulting import function index is positive (it counts among the +combined "imports + defined functions" view used by every other call +site), so the name is registered in `func_indices` with `k ≥ 0` and call +sites resolve through `call k` indistinguishably from a locally-defined +function. The Wasm module name is currently hard-coded to `"env"`, +matching the convention adopted by the Node-CJS host shim. + +An `extern type Name;` declaration produces a `TopType` with +`td_body = TyExtern`. It generates no Wasm artifact — opaque types are +purely a typechecker concern — and the codegen `TopType TyExtern` case +returns the unchanged context. + ## Appendix: Grammar Reference See the full specification at `affinescript-spec.md` for: diff --git a/lib/codegen.ml b/lib/codegen.ml index 18a3c32..0863a04 100644 --- a/lib/codegen.ml +++ b/lib/codegen.ml @@ -31,8 +31,11 @@ type context = { func_indices : (string * int) list; (** Top-level name environment shared by functions and constants. - [k >= 0]: Wasm function index (imports + defined functions). + Populated by both [TopFn] (defined function) and + [TopFn _ with fd_body = FnExtern] (host-supplied import). - [k < 0]: Constant (global): actual global index is [-(k+1)]. - Both [TopFn] and [TopConst] insert into this table in declaration order. *) + Populated by [TopConst]. + Entries are inserted in source declaration order by [gen_decl]. *) lambda_funcs : func list; (** lifted lambda functions *) next_lambda_id : int; (** next lambda function ID *) heap_ptr : int option; (** global index for heap pointer, if initialized *)