From 6669acb87a9e0dcdb9d78d7fe40c4704e7fd6c34 Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 12 May 2026 00:05:46 +0000 Subject: [PATCH 1/2] =?UTF-8?q?docs:=20thin=20SPEC.adoc=20=C2=A78=20to=20f?= =?UTF-8?q?orward-reference=20codegen-environment.adoc?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PR #94 landed the full codegen environment reference as docs/specs/codegen-environment.adoc. §8 in SPEC.adoc is now redundant; replace its body with a one-paragraph forward-reference to avoid maintaining two copies. Co-Authored-By: Claude Sonnet 4.6 --- docs/specs/SPEC.adoc | 87 ++------------------------------------------ 1 file changed, 3 insertions(+), 84 deletions(-) diff --git a/docs/specs/SPEC.adoc b/docs/specs/SPEC.adoc index 7add831..f959394 100644 --- a/docs/specs/SPEC.adoc +++ b/docs/specs/SPEC.adoc @@ -652,90 +652,9 @@ Compiles to (ownership removed): == 8. Codegen Module Environment -This section describes how the WebAssembly code generator (`lib/codegen.ml`) -builds its name environment. It is implementation documentation aimed at -contributors; the language semantics are fully specified in §2–4. - -=== 8.1 Name Environment (`func_indices`) - -The codegen context maintains a single association list - -[source,ocaml] ----- -func_indices : (string * int) list ----- - -that maps every top-level name visible at later declaration sites to an -integer key. Two distinct kinds of binding share this table: - -[cols="2,2,3", options="header"] -|=== -| Source declaration | Key value | Meaning - -| `fn f(…) { … }` -| `k ≥ 0` -| WebAssembly function index (imports + defined functions, combined) - -| `const C: T = e` -| `-(g + 1)`, where `g` is the global's index in the Wasm `globals` vector -| Negative sentinel reserved for constants -|=== - -Sign-based partitioning is deliberate: `k ≥ 0` decodes directly as a Wasm -`funcidx`, and `k < 0` recovers the global index as `g = -(k + 1)`. A -single integer per name keeps the lookup uniform across both kinds of binding. - -*Population.* Top-level declarations are visited in source order by -`gen_decl`, which is folded over `prog.prog_decls` from `generate_module`. -The relevant cases are: - -- `TopFn fd` with `fd.fd_body <> FnExtern` — picks the next Wasm function - index (`import_func_count ctx + List.length ctx.funcs`), registers - `(fd.fd_name.name, func_idx)` in `func_indices` _before_ generating the - body so the body may recursively refer to its own name, then appends the - emitted function to `ctx.funcs`. -- `TopFn fd` with `fd.fd_body = FnExtern` — emits a Wasm import (module - `"env"`, name `fd.fd_name.name`) and registers - `(fd.fd_name.name, import_func_idx)` in `func_indices`, where - `import_func_idx` is the number of imports before adding this one. No - function body is generated. See §8.2. -- `TopConst tc` — generates the global initializer, appends the global to - `ctx.globals`, then registers `(tc.tc_name.name, -(global_idx + 1))` in - `func_indices`. - -Because population is strictly single-pass and in declaration order, -forward references (to either functions or constants declared later in the -file) are not supported by the current backend. - -*Call-site lookup.* The `ExprApp (ExprVar id, _)` branch of `gen_expr` -consults `func_indices` to translate a direct call into a Wasm `call k` -instruction. Decoding the negative sentinel back to a `global.get` — -needed to make a bare `const` identifier usable inside another top-level -declaration's body — is tracked as a known gap in issue #73. The encoding -documented in this section is the data layout the fix relies on; the -call-site decode path will land alongside that fix. - -=== 8.2 Extern Bindings - -An `extern fn name(…) -> Ret;` declaration produces a `TopFn` with -`fd_body = FnExtern`. Codegen lowers it to a Wasm import: - -[source] ----- -(import "env" "" (func (param …) (result …))) ----- - -The resulting import function index is positive (it counts among the -combined "imports + defined functions" view used by every other call -site), so the name is registered in `func_indices` with `k ≥ 0` and call -sites resolve through `call k` indistinguishably from a locally-defined -function. The Wasm module name is currently hard-coded to `"env"`, -matching the convention adopted by the Node-CJS host shim. - -An `extern type Name;` declaration produces a `TopType` with -`td_body = TyExtern`. It generates no Wasm artifact — opaque types are -purely a typechecker concern — and the codegen `TopType TyExtern` case -returns the unchanged context. +See link:codegen-environment.adoc[`docs/specs/codegen-environment.adoc`] for the +full codegen module environment reference, including the `func_indices` +dual-use encoding, population order, and extern binding rules. == Appendix: Grammar Reference From 21988c548120a8ab22e371889eb6ae20ade18077 Mon Sep 17 00:00:00 2001 From: "Jonathan D.A. Jewell" <6759885+hyperpolymath@users.noreply.github.com> Date: Tue, 12 May 2026 21:13:20 +0200 Subject: [PATCH 2/2] =?UTF-8?q?docs:=20promote=20SPEC=20=C2=A78=20to=20rea?= =?UTF-8?q?l=20spec;=20expand=20codegen-environment=20to=20full=20realisat?= =?UTF-8?q?ion?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Reverses direction from the prior commit on this branch (which thinned §8 to a one-paragraph forward-reference). SPEC.adoc §8 — "Top-Level Binding Environment" - Renamed from "Codegen Module Environment" (which was wrong layer — it described implementation, not language). - Now a proper, target-agnostic specification: - 8.1 Top-Level Kinds — table of all eight constructors of `top_level`, what each binds, whether it has a runtime artefact. - 8.2 Declaration Order and Visibility — source-order processing, no forward references, recommended ordering. - 8.3 Identifier Resolution — local → variant tag → top-level lookup order with positional rules. - 8.4 Cross-Module Bindings — fn/extern fn flow, const restriction documented as known gap. - 8.5 Conformance Criteria — six MUST clauses (C1–C6) that any code generator has to satisfy. - Voice matches §1–§7 (judgement-form clauses, target-agnostic). codegen-environment.adoc — full WebAssembly realisation reference - Reframed as "WebAssembly Realisation of SPEC §8". - Added `ctx` record reproduction with field-by-field semantics. - Promoted `func_indices` encoding to its own §3 with a decode table. - §4 walks every `gen_decl` arm (TopFn, FnExtern legacy, TopExternFn, TopConst, TopType, TopExternType, TopEffect/Trait/Impl) in implementation order, naming concrete steps and side-effects. - §5 documents `gen_imports` end-to-end (load, find, intern, import, register) plus glob expansion. - §6 documents the actual ExprVar/ExprApp resolution paths. - §7 cross-walks the SPEC §8.5 criteria C1–C6 against codegen.ml sites. - §8 per-target matrix covers js/rust/ocaml/codegen_gc/codegen_node plus the loud-fail policy for the remaining backends. - §9 worked example traces a const-then-fn program through codegen. - §10 records #73 as CLOSED, since the negative-sentinel ExprVar arm at lib/codegen.ml:442–445 resolves it. Net effect: SPEC.adoc gains a real §8 instead of a placeholder pointer; codegen-environment.adoc becomes a usable implementation manual for contributors landing new back-ends. --- docs/specs/SPEC.adoc | 145 ++++++++++- docs/specs/codegen-environment.adoc | 383 ++++++++++++++++++++++++---- 2 files changed, 472 insertions(+), 56 deletions(-) diff --git a/docs/specs/SPEC.adoc b/docs/specs/SPEC.adoc index f959394..7c3604a 100644 --- a/docs/specs/SPEC.adoc +++ b/docs/specs/SPEC.adoc @@ -650,11 +650,148 @@ Compiles to (ownership removed): (call $close (local.get $file))) ---- -== 8. Codegen Module Environment +== 8. Top-Level Binding Environment -See link:codegen-environment.adoc[`docs/specs/codegen-environment.adoc`] for the -full codegen module environment reference, including the `func_indices` -dual-use encoding, population order, and extern binding rules. +This section specifies how top-level declarations populate the binding +environment that subsequent declarations and their bodies resolve against. +It complements §3 (which gives the typing judgements) by fixing the +*operational* rules every conforming code generator must obey, independent +of any concrete target. + +The WebAssembly realisation of these rules is documented at +link:codegen-environment.adoc[`docs/specs/codegen-environment.adoc`]. + +=== 8.1 Top-Level Kinds + +A program is an ordered sequence of top-level declarations. Each kind +contributes to the binding environment as follows: + +[cols="1,2,2", options="header"] +|=== +| Declaration | Binds | Runtime artefact + +| `fn f(…) { … }` (§2.4) +| `f` as a function value +| Yes — function in the target module + +| `extern fn f(…) -> T;` (§2.10) +| `f` as a function value +| Yes — host-supplied import + +| `const c: T = e;` (§2.9) +| `c` as an immutable value of type `T` +| Yes — same-type immutable cell + +| `type T = …;` (§2.2) +| `T` as a type +| No — compile-time only + +| `extern type T;` (§2.10) +| `T` as an opaque type +| No — compile-time only + +| `effect E { … }` (§2.7) +| `E` and its operations +| No — handled by effect lowering (§7) + +| `trait Tr { … }` (§2.8) +| `Tr` as a trait +| No — compile-time only + +| `impl Tr for T { … }` (§2.8) +| Trait dictionary for `(Tr, T)` +| No — compile-time only +|=== + +=== 8.2 Declaration Order and Visibility + +Top-level declarations are processed in source order. The binding +environment in scope when a declaration is processed contains exactly +the names of declarations that *precede* it in source order, together +with the names introduced by the module's `use` clauses (§8.4). + +A reference to a name not yet bound at its use site is a static error: + +[source,affinescript] +---- +fn use_pi() -> Float { pi } // ERROR: `pi` not yet declared +const pi: Float = 3.141592; +---- + +Reordering resolves it: + +[source,affinescript] +---- +const pi: Float = 3.141592; +fn use_pi() -> Float { pi } // OK: `pi` is in scope here +---- + +The recommended source ordering — types, effects, constants, traits, +impls, functions — is sufficient (though not necessary) to avoid every +forward-reference error. + +=== 8.3 Identifier Resolution + +A bare identifier `x` in expression position resolves by the following +lookup order, in the context where the expression appears: + +. Local bindings introduced by enclosing `let`, lambda parameters, or + function parameters. +. Variant constructors of any in-scope `enum` type, resolved to their + tag (§2.2). +. Top-level bindings: `fn`, `extern fn`, and `const` names registered + under §8.1. + +A call expression `f(args)` resolves `f` against the same environment. +A name bound to a `const` may appear in expression position only; a +name bound to a `fn` or `extern fn` may appear in either expression or +call position. The well-formedness of these positions is established by +the type system (§3); a backend may rely on the typechecker having +rejected ill-positioned references. + +=== 8.4 Cross-Module Bindings + +Imports (`use M::{…}`, `use M::*`) extend the binding environment of +the importer with the public top-level names of the imported module, +under the alias chosen by the import form (§2.1). The order in which +import-introduced names enter the environment is *before* every +local top-level declaration of the importer. + +Status of cross-module flow at v0.1: + +* `fn` and `extern fn` items flow across module boundaries. +* `const` items do not yet flow across module boundaries; a `const` + declared in module `M` and named by `use M::{c}` is a known + restriction, not a language-level prohibition. + +=== 8.5 Conformance Criteria + +A conforming code generator MUST: + +C1:: Process top-level declarations in source order. + +C2:: Register a top-level name in its environment *before* generating +the body of any later declaration that may reference it. + +C3:: For each runtime-bearing declaration (`fn`, `extern fn`, `const`) +emit an artefact whose denotation matches §2 and §3 — functions are +first-class values, constants are immutable cells of their declared +type. + +C4:: For each compile-time declaration (`type`, `extern type`, `effect`, +`trait`, `impl`) record sufficient information in the environment for +subsequent declarations to typecheck and lower correctly, without +necessarily emitting any target artefact. + +C5:: Report a static error (rather than emit a malformed module) when +an identifier escapes its lexical scope or names a binding not yet in +the environment under C1–C2. + +C6:: If a binding kind is unsupported by the target, raise +`UnsupportedFeature` at the declaration site, never silently drop it. + +Implementations MAY use any internal representation for the binding +environment; C1–C6 fix what is observable, not how it is stored. == Appendix: Grammar Reference diff --git a/docs/specs/codegen-environment.adoc b/docs/specs/codegen-environment.adoc index 1fc029f..cd490e0 100644 --- a/docs/specs/codegen-environment.adoc +++ b/docs/specs/codegen-environment.adoc @@ -1,98 +1,377 @@ // SPDX-License-Identifier: PMPL-1.0-or-later -= Codegen Environment Rules — Top-Level Bindings += Codegen Environment — WebAssembly Realisation of SPEC §8 :toc: macro -:toclevels: 2 +:toclevels: 3 :source-highlighter: rouge -This spec clarifies how top-level declarations — `fn`, `const`, `type`, `effect`, `trait`, `impl` — are threaded through the codegen environment, with particular attention to the two declarations that emit runtime artefacts: `fn` and `const`. +This document is the *realisation reference* for SPEC §8 (Top-Level +Binding Environment): how the WebAssembly back-end in `lib/codegen.ml` +satisfies C1–C6, what data it carries to satisfy them, and what every +other back-end shipped in this repository does in the same role. -This is the doc-companion to https://github.com/hyperpolymath/affinescript/issues/89[#89]; the implementation reference for the WASM target is the `gen_decl` cases in `lib/codegen.ml`. The intra-module `const`-referenced-by-`fn` failure mode is tracked in https://github.com/hyperpolymath/affinescript/issues/73[#73]. +Where the spec is target-agnostic, this doc is concrete: it names OCaml +types, files, line-level structure, and the loud-fail discipline. +Behavioural claims in this document are claims about *current code*; the +spec is the authority on behavioural *requirements*. + +Companion to: +- link:SPEC.adoc#_top_level_binding_environment[`SPEC.adoc` §8] — what every back-end must do. +- https://github.com/hyperpolymath/affinescript/issues/89[issue #89] — original env-rules ticket. toc::[] -== Overview +== 1. Overview -The codegen environment is a `ctx` record threaded through `gen_decl` for each top-level declaration in source order. The two fields that hold name → index bindings are: +The codegen environment is a `ctx` record (`lib/codegen.ml`, `type context`) +threaded through `gen_decl` for each top-level declaration in source order. +Two fields hold the bindings required by SPEC §8: [cols="1,3", options="header"] |=== | Field | Purpose -| `func_indices` | Maps both function names AND constant names to a signed integer. *Positive* indices point into `funcs` (a function); *negative* indices encode `-(global_idx + 1)` to point into `globals` (a constant). -| `globals` | The WASM globals table — one entry per `TopConst`, holding type + mutability + initialiser expression. + +| `func_indices : (string * int) list` +| The single name → integer index map for *all* runtime-bearing top-level + bindings. Positive indices point into the combined imports+`funcs` + view (a function); negative indices encode `-(global_idx + 1)` to + point into `globals` (a constant). See §3. + +| `globals : global list` +| The WASM globals vector. One entry per `TopConst`, holding type, + mutability, and the constant-expression initialiser. |=== -The signed-int sentinel trick is **internal**: every `ExprVar` lookup in fn bodies must consult `func_indices` and dispatch on the sign to emit either a `call` or a `global.get`. +Compile-time-only declarations (`type`, `effect`, `trait`, `impl`, +`extern type`) populate auxiliary fields (`struct_layouts`, +`variant_tags`, …) but do not enter `func_indices`. + +The signed-integer trick in `func_indices` is *internal*: every +identifier lookup in expression position consults `func_indices` and +dispatches on the sign of the result to emit either `call` or +`global.get`. + +== 2. The `ctx` Record + +Reproduced from `lib/codegen.ml`: + +[source,ocaml] +---- +type context = { + types : func_type list; (* type section *) + funcs : func list; (* function definitions *) + exports : export list; (* exports *) + imports : import list; (* imports *) + globals : global list; (* global variables *) + locals : (string * int) list; (* local variable name -> index map *) + next_local : int; + loop_depth : int; + func_indices : (string * int) list; + (* Top-level name environment shared by functions and constants: + - k >= 0: Wasm function index (imports + defined functions). + - k < 0: Constant (global); actual global index is -(k+1). + Entries inserted in source declaration order by gen_decl. *) + lambda_funcs : func list; (* lifted lambdas *) + next_lambda_id : int; + heap_ptr : int option; + field_layouts : (string * (string * int) list) list; + struct_layouts : (string * (string * int) list) list; + fn_ret_structs : (string * string) list; + variant_tags : (string * int) list; + string_data : (string * int) list; + next_string_offset : int; + datas : data list; + ownership_annots : (int * ownership_kind list * ownership_kind) list; +} +---- + +Only the fields touched by SPEC §8 conformance are discussed here; +`heap_ptr`, `string_data`, `datas`, and `ownership_annots` are part of +unrelated lowering passes. -== TopFn (top-level function) +== 3. The `func_indices` Encoding -Mechanism in `lib/codegen.ml`: +`func_indices` is a single association list keyed by the AffineScript +identifier. Two binding flavours share it; the sign of the value tells +them apart: -. The function body is generated against a fresh context derived from `ctx` (`gen_fn_body`), producing a list of WASM instructions and any lambda-funcs created during generation. -. A new entry is appended to `ctx.funcs` with a fresh index `func_idx = List.length ctx.funcs`. -. `func_indices` gains `(fd.fd_name.name, func_idx)`. -. If `Public` or matches a game-hook export name (`init`, `update`, `draw`, ...), an `ExportFunc` export is added. +[cols="2,2,3", options="header"] +|=== +| Source declaration | Key value | Decode -ExprVar lookup against a TopFn: `List.assoc name func_indices` returns a non-negative `func_idx`; the call site emits `Call func_idx`. +| `fn f(…) { … }` +| `k ≥ 0` +| `funcidx = k` (imports come first, then defined functions) -== TopConst (top-level constant) +| `extern fn f(…);` / `TopFn fd` with `fd_body = FnExtern` +| `k ≥ 0` +| `funcidx = k` (lives in the imports prefix) -Mechanism in `lib/codegen.ml` (see the `| TopConst tc ->` case in `gen_decl`): +| `const c: T = e;` +| `k = -(g + 1)` +| `globalidx = g = -(k + 1)` +|=== -. The initialiser expression is compiled against the current ctx via `gen_expr`. The result must be a constant expression — a single `I32Const` (or analogue). Non-constant initialisers are not yet supported and will fail at WASM validation. -. A new entry is appended to `ctx.globals`: +The encoding satisfies SPEC §8 C1–C2 with a single source-order insertion +and resolves both kinds with a single `List.assoc_opt`. No separate +`const_indices` table is needed. + +== 4. `gen_decl` — Per-Kind Algorithm + +`gen_decl : context -> top_level -> context result` is invoked by +`generate_module` via `List.fold_left` over `prog.prog_decls`. Each +case is summarised below; consult `lib/codegen.ml` for the canonical +implementation. + +=== 4.1 `TopFn` (defined function) + +1. Build the WASM function type from the parameter list (all params are + `i32`; result is `i32`). Append to `ctx.types`; record its index. +2. Compute `func_idx = import_func_count ctx + List.length ctx.funcs`. + This is the *future* WASM function index of the function about to + be emitted. +3. Register `(fd.fd_name.name, func_idx)` in `func_indices` **before** + generating the body — satisfies SPEC §8 C2 and admits self-recursion. +4. Also record ownership annotations and, if the return type is a known + struct, the return-struct mapping (`fn_ret_structs`). +5. Generate the body against the augmented context. +6. Append the emitted function to `ctx.funcs`. +7. If `Public` or its name is a reserved game-loop hook + (`main`, `init_state`, `step_state`, `get_state`, `mission_active`), + add an `ExportFunc` export. + +ExprApp call-site lookup (in `gen_expr`): `List.assoc name func_indices` +returns a non-negative `func_idx`; the call site emits `Call func_idx`. + +=== 4.2 `TopFn` with `fd_body = FnExtern` (legacy `extern fn`) + +Emit a WASM import under module `"env"` and name `fd.fd_name.name`, +register the alias in `func_indices` with the *positive* import index. +The body is not generated. Mirrors `gen_imports` (§5). + +=== 4.3 `TopExternFn` (current `extern fn` syntax) + +Behaviourally identical to §4.2. The parser produces a `TopExternFn` +record for the contemporary surface syntax; the legacy +`TopFn _ with FnExtern` branch remains for compatibility with older +front-end paths. + +=== 4.4 `TopConst` + +1. Compile the initialiser against the current context via `gen_expr`. + The result must be a constant expression (a single `I32Const` or + analogous form); non-constant initialisers fail at WASM validation, + per the loud-fail policy. +2. Append a new `global` entry: + -[source] +[source,ocaml] ---- { g_type = I32; g_mutable = false; g_init = init_code } ---- -. The const's name is registered in `func_indices` with a *negative* index: `(tc.tc_name.name, -(global_idx + 1))`. This sentinel encoding lets ExprVar lookup dispatch on the sign without a separate `const_indices` table. +3. Register `(tc.tc_name.name, -(global_idx + 1))` in `func_indices`. + +ExprVar lookup (in `gen_expr`, `ExprVar` arm): if `List.assoc_opt name +ctx.func_indices` returns `Some k` with `k < 0`, decode +`global_idx = -(k + 1)` and emit `GlobalGet global_idx`. This is the +path that closed https://github.com/hyperpolymath/affinescript/issues/73[#73]. + +=== 4.5 `TopType` -ExprVar lookup against a TopConst: `List.assoc name func_indices` returns a negative index `i`; the call site decodes `global_idx = -i - 1` and emits `GlobalGet global_idx`. +* `TyEnum` — assign sequential tags to each variant and record them in + `variant_tags`. +* `TyStruct` — compute the field layout (sequential 4-byte offsets, + matching the `ExprRecord` store path) and record it in + `struct_layouts`. +* `TyAlias` — no environment change. -== Order matters +None of these enter `func_indices`; they are compile-time bindings +under SPEC §8 C4. -Top-level declarations are processed in source order. A `fn` body that references a `const` defined *later* in the file currently fails at codegen with `Codegen.UnboundVariable`, because `func_indices` only contains entries for declarations already processed. This forward-reference limitation is a known constraint. +=== 4.6 `TopExternType` -Recommended source ordering: +No WASM artefact. Type is available to the type checker via the +resolver; codegen returns the context unchanged. Compile-time-only +under SPEC §8 C4. -. Top-level types (`type`) -. Top-level effects (`effect`) -. Top-level constants (`const`) -. Top-level traits / impls -. Top-level functions +=== 4.7 `TopEffect`, `TopTrait`, `TopImpl` -This ordering matches how the typechecker resolves dependencies and avoids forward-reference failures in codegen. +No WASM artefact at codegen time. Effect lowering happens earlier +in the pipeline (§7 of the spec); traits and impls are resolved +during typechecking and trait-dictionary insertion. -== Cross-module: imported top-level bindings +== 5. Cross-Module Imports — `gen_imports` -Cross-module imports (`use OtherModule::{thing}`) are handled by two passes: +`gen_imports : Module_loader.t -> import_decl list -> context -> context result` +walks `prog.prog_imports` once at the start of `generate_module`, +*before* any local `gen_decl` call. For every imported function: -* **WASM and WasmGC targets** — `Codegen.gen_imports` walks `prog.prog_imports`, loads each referenced module via `Module_loader`, and emits one `(import "" "" (func ...))` entry per imported function. The imported name lands in `func_indices` with a positive index pointing into the imports section. -* **All other codegens** (Rust, JS, Lua, OCaml, Julia, C, WGSL, Faust, ONNX, Bash, Nickel, ReScript, LLVM, Verilog, Gleam, CUDA, Metal, OpenCL, MLIR, Why3, Lean, SPIR-V) — `Module_loader.flatten_imports` prepends imported public `TopFn` declarations into the importer's `prog_decls` (deduplicating against local fn names). The non-WASM codegens then see the imports as if they were locally-defined and emit them inline. +1. Load the referenced module via `Module_loader`. +2. Find the matching `TopFn` (or fail silently if absent — the resolver + would have already errored). +3. Intern the function type into `ctx.types`. +4. Append a WASM import: module name = dotted module path + (`String.concat "." mod_path`), function name = original + declaration name, function type = interned index. +5. Register the local alias (or original name) in `func_indices` with + the positive `import_func_idx`. -Imported `TopConst`s are **not** currently threaded by either path. This is a known gap; only `TopFn` flows across module boundaries today. +Glob imports (`use M::*`) expand to one entry per public `TopFn` / +`PubCrate TopFn` in `M`'s `prog_decls`. -== Non-WASM targets +`TopConst` items are *not* threaded across module boundaries by +`gen_imports`. This is the cross-module gap acknowledged in +SPEC §8.4 — addressing it requires emitting the constant as a global +in the importing module (or via WASM module-linking) and remains +future work. -Each non-WASM codegen has its own `gen_decl` (or equivalent) and may or may not implement `TopConst`. The expected behaviour, target-by-target: +== 6. Identifier Resolution at Use Sites -[cols="1,2", options="header"] +The two relevant arms in `gen_expr`: + +=== 6.1 `ExprVar id` + +[source,ocaml] +---- +match lookup_local ctx id.name with +| Ok idx -> Ok (ctx, [LocalGet idx]) +| Error _ -> + match List.assoc_opt id.name ctx.variant_tags with + | Some tag -> Ok (ctx, [I32Const tag]) + | None -> + match List.assoc_opt id.name ctx.func_indices with + | Some k when k < 0 -> Ok (ctx, [GlobalGet (-(k + 1))]) + | _ -> Error (UnboundVariable id.name) +---- + +Matches SPEC §8.3: local → enum tag → top-level. A function name +encountered in *expression* position falls through to `UnboundVariable`, +which is correct: bare function references in expression position are +not yet representable in WASM without closure boxing, so they are +rejected at the typechecker. + +=== 6.2 `ExprApp (ExprVar id, args)` + +[source,ocaml] +---- +match List.assoc_opt id.name ctx.func_indices with +| Some func_idx -> ... emit (Call func_idx) +| None -> + match lookup_local ctx id.name with + | Ok local_idx -> ... emit indirect closure call + | Error _ -> Error (UnboundVariable ...) +---- + +A call site finds a positive `func_idx` and emits `Call func_idx`. The +typechecker rejects calls whose head is a constant, so a negative `k` +here would be a type-system bug; the codegen does not defensively check +the sign at call sites. + +== 7. Conformance of the WASM Back-End + +Cross-walking SPEC §8.5 against `lib/codegen.ml`: + +[cols="1,4,3", options="header"] |=== -| Target | TopConst handling -| `js_codegen.ml` | Emits `const = ;` at module top-level. Same-file fn bodies reference it directly. -| `rust_codegen.ml` | Emits `pub const : = ;` at module top-level (or `static` for non-`Copy` types). -| `ocaml_codegen.ml` | Emits `let = ` at module top-level. -| Other backends | If unimplemented, `gen_decl` may silently skip `TopConst`, producing a successful build whose runtime then fails with an unbound-name error. Codegens that explicitly do not support `TopConst` should raise `CodegenError UnsupportedFeature` per the loud-fail policy. +| Criterion | How the WASM target satisfies it | Site + +| C1 | `generate_module` folds `gen_decl` over `prog.prog_decls` in source order | `lib/codegen.ml`, `generate_module` +| C2 | TopFn registers `(name, func_idx)` *before* `gen_function`; TopConst registers `(name, -(g+1))` after the initialiser but before any later body sees it | §4.1, §4.4 +| C3 | TopFn → WASM function; TopExternFn → import; TopConst → immutable global with the constant initialiser | §4.1, §4.3, §4.4 +| C4 | TopType populates `variant_tags` / `struct_layouts`; TopEffect/TopTrait/TopImpl are no-ops at codegen (handled upstream) | §4.5–§4.7 +| C5 | `UnboundVariable` raised by `ExprVar` / `ExprApp` when lookup fails | `lib/codegen.ml`, lines 444 and 802 +| C6 | `UnsupportedFeature` raised wherever a structural feature isn't lowered (catch arms, multi-unsafe, non-variable patterns in constructors, etc.) | `lib/codegen.ml`, multiple sites |=== -When auditing or adding a backend, the env-population rule is the same: **register the name and emit a target-appropriate definition before any fn body that might reference it**. +== 8. Other Back-Ends — Per-Target Matrix -== Known issues +Each non-WASM back-end has its own `gen_decl` (or equivalent). Status +of the SPEC §8 contract per target: -* https://github.com/hyperpolymath/affinescript/issues/73[#73] — `Codegen.UnboundVariable` for top-level `const` bindings: typechecker accepts the reference, codegen ExprVar lookup fails. Implementation status varies by target; verify against the current `gen_decl` for the target in question. +[cols="1,2,2", options="header"] +|=== +| Target | `TopConst` | `TopFn` + +| `js_codegen.ml` +| `const = ;` at module top +| `function (…) { … }` (with `export` for `pub`) + +| `rust_codegen.ml` +| `pub const : = ;` (or `static` for non-`Copy` types) +| `pub fn (…) -> { … }` + +| `ocaml_codegen.ml` +| `let = ` at module top +| `let (…) = …` + +| `codegen_gc.ml` (WasmGC) +| Same negative-sentinel discipline as `codegen.ml` §3 +| `funcref`-typed entry plus WasmGC struct layout + +| `codegen_node.ml` +| As `js_codegen.ml`, plus CJS shim for `extern fn` +| As `js_codegen.ml` + +| Other (Lua, Julia, C, WGSL, Faust, ONNX, Bash, Nickel, ReScript, LLVM, Verilog, Gleam, CUDA, Metal, OpenCL, MLIR, Why3, Lean, SPIR-V) +| Implementation-specific. If a back-end cannot lower `TopConst`, it + MUST raise `CodegenError UnsupportedFeature` per SPEC §8 C6 rather + than silently skip the declaration. +| Each emits a target-appropriate function definition; cross-module + flow uses `Module_loader.flatten_imports` to inline public `TopFn`s + into the importer's `prog_decls`. +|=== + +When auditing or adding a back-end the env-population rule is the same +as for WASM: **register the name and emit a target-appropriate +definition before any body that might reference it.** + +== 9. Worked Example -== References +[source,affinescript] +---- +const inputSuffix: String = ":in"; + +fn withInput(port: String) -> String { + port ++ inputSuffix +} + +pub fn main() -> () { + let p = withInput("front_left"); + println(p) +} +---- -* `lib/codegen.ml` — WASM `gen_decl`, especially the `TopFn`, `TopConst`, and `gen_imports` cases. -* `lib/module_loader.ml` — `flatten_imports` for non-WASM cross-module threading. -* `bin/main.ml` — pipeline wiring (parse → resolve → typecheck → codegen with loader threading). +WASM realisation, step by step (under `lib/codegen.ml`): + +1. `gen_imports` runs (no imports in this module). +2. `gen_decl TopConst inputSuffix` — + `globals` gains `{ I32, immutable, init = }`, + `func_indices = [("inputSuffix", -1)]`. +3. `gen_decl TopFn withInput` — + `func_idx = 1` (imports + funcs so far), registered in + `func_indices` *before* body generation. + The body's `inputSuffix` reference goes through ExprVar (§6.1), + finds `k = -1`, decodes `global_idx = 0`, emits `GlobalGet 0`. +4. `gen_decl TopFn main` — same recipe, body's `withInput(...)` call + resolves to `Call 1`. `pub` triggers `ExportFunc 2`. + +Resulting `func_indices` (most-recent-first by `::` cons except `TopFn` +which appends): `[("main", 2); ("withInput", 1); ("inputSuffix", -1)]`. + +== 10. Closed Issues + +* https://github.com/hyperpolymath/affinescript/issues/73[#73] — + `Codegen.UnboundVariable` for top-level `const` bindings. **Closed.** + Resolved by the `Some k when k < 0` arm in `ExprVar` (`lib/codegen.ml`, + line 442–445). The negative-sentinel encoding is the load-bearing + invariant; new back-ends adopting `func_indices` must preserve it. + +== 11. References + +* `lib/codegen.ml` — `type context`, `gen_decl`, `gen_imports`, + `generate_module`; the WASM `ExprVar`/`ExprApp` arms. +* `lib/codegen_gc.ml` — WasmGC variant; same env discipline. +* `lib/module_loader.ml` — `flatten_imports` for non-WASM cross-module + threading. +* `lib/ast.ml` — `type top_level` constructors enumerated in + link:SPEC.adoc#_top_level_binding_environment[SPEC §8.1]. +* `bin/main.ml` — pipeline wiring (parse → resolve → typecheck → + codegen with loader threading).