BridgeJS: Optimize string encoding for JS-to-Swift crossings#10
Conversation
0334cfa to
5e55f1c
Compare
Add a size-limited encoding cache (Map<string, Uint8Array>) in front of textEncoder.encode() for the ExportSwift parameter path and the ImportTS return path. Repeated strings skip encoding entirely on cache hit. No BridgeType changes. No Swift-side changes.
For arrays, struct fields, enum payloads, and dictionary entries, the stack ABI now retains the JS string directly (instead of encoding to a Uint8Array first) and passes the worst-case UTF-8 byte length via _maxUTF8Len() as buffer capacity. A new dedicated swift_js_init_memory_from_string import handles the string path - it encodes UTF-8 directly into the WASM buffer via encodeInto() and returns the actual byte count written. This avoids modifying the existing swift_js_init_memory contract. This eliminates the intermediate Uint8Array allocation for every string element in arrays and struct fields.
5e55f1c to
4d11610
Compare
|
@krodak thanks for this work and for starting the discussion! for the record: my targeted use case is mostly about passing strings from Swift -> JS. especially in the context of a UI framework, you'll be passing "p" or "div" - and many other very short and "static" strings - to JS a lot. my thinking was that we could maybe use the "isImmortal" and maybe "isSmall" bits (https://github.com/swiftlang/swift/blob/e727c07687a55d628af3f63a988d1b66947f3e0f/stdlib/public/core/StringObject.swift#L20-L69) to auto-manage a "staticStringCache" in JS. this way we will not have to hash and compare every string, but only work with the String's representation. (eg: instead of passing address + length, we could pass high + low and work both cases through the same ABI) if we don't want to add this whole magic to JavaScriptKit, supporting |
Overview
String parameters crossing from JS to Swift go through
TextEncoder.encode()+ object heapretain/releaseon every call. This is measurably slow for repeated strings and for string arrays. Two independent optimizations target the two string-encoding paths in the generated JS glue, without touching BridgeType or the codegen structure.Related: swiftwasm#677, swiftwasm#700 (different approach - adds
JSStringas a new BridgeType; this PR avoids that)What changed
1. encoding cache for parameter and return paths (commit 1 - no Swift changes)
When JS passes a string to an exported Swift function (or returns a string from an imported JS function), the glue calls
textEncoder.encode(string)to get aUint8Array, then retains it in the object heap. The same string encoded 100k times means 100k allocations.A 4096-entry encoding cache (
Map<string, Uint8Array>) now sits in front oftextEncoder.encode(). Repeated strings get a cache hit and skip encoding. The cache uses JSMapinsertion-order semantics for O(1) eviction - delete-and-reinsert on hit, delete-first on eviction. 4096 entries covers realistic vocabularies without pathological eviction churn that smaller caches exhibit.Affected fragments:
stringLowerParameter,stringLowerReturn2. Direct string retain +
encodeInto()for the stack ABI (commit 2)Arrays, struct fields, enum payloads, and dictionary entries use the stack ABI, which encodes each string element independently. Instead of allocating a
Uint8Arrayper element, the JS glue now retains the JS string itself and passes a worst-case buffer capacity via_maxUTF8Len()(str.length * 3).A new dedicated
swift_js_init_memory_from_stringWASM import handles this path - it encodes UTF-8 directly into the WASM buffer viaencodeInto()and returns the actual byte count written. The existingswift_js_init_memoryis unchanged.On
str.length * 3: Each UTF-16 code unit can produce at most 3 UTF-8 bytes. Surrogate pairs (2 code units) produce 4 UTF-8 bytes, so the per-unit ratio stays <= 3. This is the standard worst-case estimate - wasm-bindgen usesstr.length * 3withencodeInto, thenreallocto shrink. Emscripten'sstringToUTF8Arraydocuments "at moststr.length*4+1bytes" (their*4accounts for code points rather than UTF-16 units,+1for null terminator). We don't need wasm-bindgen's shrink step because Swift'sString(unsafeUninitializedCapacity:)uses the returned byte count as the string length regardless of the buffer capacity.Affected fragments:
stackLowerFragmentfor.string/.rawValueEnum(_, .string)Benchmarks
100k iterations, Node.js v22, 15-run average:
StringRoundtrip/takeStringArrayRoundtrip/takeStringArrayArrayRoundtrip/roundtripStringArrayStringRoundtrip/makeStringArrayRoundtrip/makeStringArrayThe
make*benchmarks (Swift-to-JS direction) are unaffected - those paths already use direct memory reads viadecodeString(ptr, len).Independence of the two techniques
The two commits are independent. Commit 1 (cache) has zero Swift-side changes and can be cherry-picked alone for ~21% improvement on
takeString. Commit 2 adds a new WASM import (swift_js_init_memory_from_string) without modifying the existingswift_js_init_memory.Comparison with alternative approaches
We benchmarked three other approaches for comparison.
Approach: PR swiftwasm#700 JSString (new BridgeType)
PR swiftwasm#700 adds
JSStringas a newBridgeType. Users opt in per-parameter by changingStringtoJSString. It passes a JS object heap reference (single i32) with no encoding.takeStringtakeJSStringmakeStringmakeJSStringtakeStringArraytakeJSStringArraymakeJSStringArrayroundtripStringArrayroundtripJSStringArrayPR swiftwasm#700 wins on the
takedirection (no encoding at all), but has significant regressions on allmake/return paths due to object heap management overhead. It also only helps callers who rewrite their API surface fromStringtoJSString. This PR improves all existingStringusage transparently.Approach: switching String to Stack ABI
Per Yuta's suggestion in swiftwasm#677, we tested replacing the inline WASM params
(bytes: i32, length: i32)with the stack ABI (push to JS arrays, zero WASM params). Result: no measurable difference.StringRoundtrip/takeStringArrayRoundtrip/takeStringArrayThe bottleneck is encoding and heap management, not the parameter passing mechanism. Branch
string-opt/stack-abi-testhas this experiment.Files changed (excluding snapshots)
Sources/JavaScriptKit/BridgeJSIntrinsics.swift- new_swift_js_init_memory_from_stringimport;bridgeJSStackPopuses it; existing_swift_js_init_memoryunchangedPlugins/BridgeJS/Sources/BridgeJSLink/BridgeJSLink.swift- encoding cache preamble; newswift_js_init_memory_from_stringhandler;_maxUTF8LenhelperPlugins/BridgeJS/Sources/BridgeJSLink/JSGlueGen.swift-stringLowerParameterandstringLowerReturnuse cache;stackLowerFragmentuses direct retain +_maxUTF8Len; reserved variable namesPlugins/PackageToJS/Templates/instantiate.js- stub for newswift_js_init_memory_from_stringimport