Skip to content

BridgeJS: Optimize string encoding for JS-to-Swift crossings#748

Closed
krodak wants to merge 1 commit into
swiftwasm:mainfrom
PassiveLogic:kr/string-encoding-optimization
Closed

BridgeJS: Optimize string encoding for JS-to-Swift crossings#748
krodak wants to merge 1 commit into
swiftwasm:mainfrom
PassiveLogic:kr/string-encoding-optimization

Conversation

@krodak
Copy link
Copy Markdown
Member

@krodak krodak commented May 20, 2026

Overview

String parameters crossing from JS to Swift go through TextEncoder.encode() + object heap retain/release on every call. This is measurably slow for repeated strings and for string arrays. Two independent optimizations target the two string-encoding paths in the generated JS glue, without touching BridgeType or the codegen structure.

Related: #677, #700 (different approach - adds JSString as a new BridgeType; this PR avoids that)

What changed

1. LRU encoding cache for parameter and return paths

When JS passes a string to an exported Swift function (or returns a string from an imported JS function), the glue calls textEncoder.encode(string) to get a Uint8Array, then retains it in the object heap. The same string encoded 100k times means 100k allocations.

A 256-entry LRU cache (Map<string, Uint8Array>) now sits in front of textEncoder.encode(). Repeated strings get a cache hit and skip encoding. The cache uses JS Map insertion-order semantics for O(1) LRU eviction - delete-and-reinsert on hit, delete-first on eviction.

Affected fragments: stringLowerParameter, stringLowerReturn

2. Direct string retain + encodeInto() for the stack ABI

Arrays, struct fields, enum payloads, and dictionary entries use the stack ABI, which encodes each string element independently. Instead of allocating a Uint8Array per element, the JS glue now retains the JS string itself in the object heap and passes string.length * 3 as the buffer capacity (worst-case UTF-8).

On the Swift side, _swift_js_init_memory detects the string via typeof and writes UTF-8 directly into the WASM linear memory buffer using TextEncoder.encodeInto(). It returns the actual byte count written, which String(unsafeUninitializedCapacity:) uses for the final string length.

Affected fragments: stackLowerFragment for .string / .rawValueEnum(_, .string)

One ABI change: _swift_js_init_memory returns Int32 instead of Void. The return value is the byte count actually written - needed because the stack ABI passes a worst-case capacity, not the exact byte count.

Benchmarks

100k iterations, Node.js v22, 15-run average:

Benchmark Before After Change
StringRoundtrip/takeString 33.35 ms 26.48 ms -21%
ArrayRoundtrip/takeStringArray 162.35 ms 106.18 ms -35%
ArrayRoundtrip/roundtripStringArray 223.98 ms 158.87 ms -29%
StringRoundtrip/makeString 10.53 ms 10.29 ms neutral
ArrayRoundtrip/makeStringArray 59.07 ms 57.64 ms neutral

The make* benchmarks (Swift-to-JS direction) are unaffected - those paths already use direct memory reads via decodeString(ptr, len).

Independence of the two techniques

The two optimizations are independent. If the _swift_js_init_memory return type change is a concern, technique #2 can be dropped and the stack ABI reverted to use _cachedEncode() (same as the parameter path). That still gives the ~21% improvement on takeString with zero Swift-side changes.

Files changed (excluding snapshots)

  • Sources/JavaScriptKit/BridgeJSIntrinsics.swift - _swift_js_init_memory returns Int32; bridgeJSLiftParameter uses the returned count
  • Plugins/BridgeJS/Sources/BridgeJSLink/BridgeJSLink.swift - LRU cache preamble; _swift_js_init_memory handler with string detection
  • Plugins/BridgeJS/Sources/BridgeJSLink/JSGlueGen.swift - stringLowerParameter and stringLowerReturn use cache; stackLowerFragment uses direct retain; reserved variable names for cache

Two techniques applied to all JS-to-Swift string paths:

1. LRU encoding cache for parameter and return paths - avoids
   re-encoding repeated strings via a Map<string, Uint8Array> with
   256-entry LRU eviction.

2. Direct string retain + encodeInto() for stack ABI paths (arrays,
   structs, enums, dictionaries) - skips the intermediate Uint8Array
   allocation entirely by retaining the JS string and encoding directly
   into the WASM linear memory buffer.

_swift_js_init_memory now returns the actual byte count written, which
the stack ABI path needs since it passes a worst-case buffer size
(string.length * 3) rather than the exact UTF-8 byte count.

Benchmarks (100k iterations, Node.js):
  StringRoundtrip/takeString:          -21%
  ArrayRoundtrip/takeStringArray:      -35%
  ArrayRoundtrip/roundtripStringArray: -29%
@krodak
Copy link
Copy Markdown
Member Author

krodak commented May 20, 2026

Opened against wrong repo - meant for PassiveLogic fork for now 🙏🏻

@krodak krodak closed this May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant