Skip to content

feat(lua): drop bounded_eval for handle_input (37% better tail latency)#42

Merged
Taure merged 1 commit intomainfrom
feat/handle-input-no-spawn
May 5, 2026
Merged

feat(lua): drop bounded_eval for handle_input (37% better tail latency)#42
Taure merged 1 commit intomainfrom
feat/handle-input-no-spawn

Conversation

@Taure
Copy link
Copy Markdown
Contributor

@Taure Taure commented May 5, 2026

Summary

  • handle_input/3 in both `asobi_lua_match` and `asobi_lua_world` no longer wraps the Luerl call in `bounded_eval` (spawn + monitor + heap_limit). Direct `luerl:call_function` instead.
  • tick/init/get_state/join/leave/vote_*/phases keep `bounded_eval` — those are the real sandbox boundaries.
  • Trust-model guide updated with a per-callback isolation table and an explicit "handle_input is not a sandbox boundary" section.
  • ADRs 0000 (process), 0001 (retroactive: asobi_lua_match_shared from feat(lua): asobi_lua_match_shared bridge for encode-once broadcast #41), 0002 (this change).

Why

The local 200-bot bench (asobi-bench/results/2026-05-05-post-fix1.md) revealed that encode-once (asobi#117) didn't move p99 because Luerl-eval CPU dominated encode CPU at 2k inputs/sec. The spawn-and-monitor-and-heap-cap-and-message-pass overhead was ~80 µs per call vs ~50-200 µs of real Lua work.

After this change (asobi-bench/results/2026-05-05-handle-input-no-spawn.md):

metric baseline post-fix-#1 post-this delta vs fix-#1
p99 1433 1700 1530 -10%
p99.9 2429 2945 1860 -37%
max 4155 3750 2065 -45%
inputs/30s ~26k ~26k ~41k +56%

Trade-off

A `while true do end` inside handle_input now hangs the match server until its caller's gen_server timeout (5s default) trips. The match supervisor then restarts the match. Blast radius is one match.

Prior behaviour: bounded_eval killed the runaway in 100ms, the bridge logged and dropped the input, the match continued.

This is documented in ADR 0002 with the explicit framing "handle_input is not a sandbox boundary; tick/1 is the load-bearing isolation point."

Test plan

  • match contract pinned: `match_handle_input_no_wall_clock_timeout_test` asserts an infinite-loop handle_input does not self-terminate within 500ms (parent kills the spawned probe).
  • world contract pinned: `world_handle_input_no_wall_clock_timeout_test` mirrors the above for asobi_lua_world.
  • prop_lua_error_containment split into `tick_crash_mode` (still includes infinite_loop) and `input_crash_mode` (excludes it — would wedge the runner).
  • `rebar3 fmt --check` clean
  • `rebar3 xref` clean
  • `rebar3 dialyzer` clean
  • `~/bin/elp eqwalize-all` zero new errors
  • `~/bin/elp lint` zero new warnings on changed files
  • `rebar3 eunit` 202/202 green
  • Bench validation: 2 runs against local image built from this branch, both consistent with the table above.
  • Follow-up CT suite to pin the supervisor blast-radius claim end-to-end (noted in ADR 0002 consequences).

Companion PR: asobi#118 (ADR convention + retroactive ADR 0001).

handle_input/3 in both asobi_lua_match and asobi_lua_world bridges no
longer wraps the Luerl call in bounded_eval (spawn + monitor +
heap_limit). At realistic input rates (200 players × 10 Hz = 2k
inputs/sec) the per-call spawn overhead dominated actual Lua work and
caused tail-latency stalls on the BEAM scheduler.

Bench delta (asobi-bench, 200 bots, 30s, 10 Hz):
- p99.9: ~2945ms -> ~1860ms (-37%)
- max:   ~3750ms -> ~2065ms (-45%)
- inputs throughput: ~26k -> ~41k per 30s window (+56%)

Trade-off documented in ADR 0002 and pinned by tests:
- match_handle_input_no_wall_clock_timeout_test (match bridge)
- world_handle_input_no_wall_clock_timeout_test (world bridge)
- prop_lua_error_containment splits crash modes: tick still tests
  infinite_loop containment; input_crash_mode excludes it (would wedge
  the property runner — by design).

Trust model updated in guides/security-trust-model.md with a new
"Per-callback isolation" table and an explicit "handle_input is not a
sandbox boundary" section.

Also includes the project ADR convention (0000) and retroactive ADR
0001 documenting the asobi_lua_match_shared bridge that shipped in
#41.
@Taure Taure merged commit d46e203 into main May 5, 2026
15 checks passed
@Taure Taure deleted the feat/handle-input-no-spawn branch May 5, 2026 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant