perf: use ASCII bitsets for strip chars#789
Conversation
9004941 to
7f64a1d
Compare
|
Closing as low priority after rebase: current master already has the main strip-char fast path, and this PR shows no stable standout delta on the go_suite strip docs cases or Native hyperfine. |
|
Reopened. This is not negative; keep it as draft as a small GC-friendly strip delimiter cleanup. Current docs strip benchmarks do not show standout gains, so it should not be moved to ready yet. |
7f64a1d to
4addea1
Compare
|
Rebased to latest |
7f0efb1 to
17bee23
Compare
Port the useful part of He-Pin/sjsonnet jit commit dd90b11 onto current master. The new path avoids Set[Int] allocation for ASCII-only stripChars, lstripChars, and rstripChars calls, preserves the existing codepoint-aware fallback for non-ASCII strip sets, and uses platform-specific masks selected from local measurements. Upstream-source: dd90b11
Motivation: The ASCII bitset optimization made stripChars, lstripChars, and rstripChars validate chars before str, contradicting the builtin call path and changing observable error ordering. Modification: Restore str-before-chars extraction for all three strip helpers while keeping the optimized strip implementation. Update the regression test to assert the existing first-argument error behavior. Result: The optimization keeps existing builtin semantics for type and error ordering. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
17bee23 to
5e93b03
Compare
|
Rebased to Hyperfine
Master already has the BMP Recommendation: close as obsolete. The optimization this PR aimed to introduce has been overtaken by master's strip path. |
Motivation:
std.stripChars,std.lstripChars, andstd.rstripCharsalready avoid boxedSet[Int]for single/BMP delimiter sets on currentmaster. This draft keeps that Unicode-aware path and narrows the ASCII-only multi-character delimiter case by replacing the per-calljava.util.BitSetallocation with primitive masks.Key Design Decision:
Intmasks; Scala Native uses twoLongmasks.BitSetfast path.charsis forced beforestr, preserving existing error evaluation order.Modification:
stripChars,lstripChars, andrstripCharsthrough the ASCII mask path whencharscontains only ASCII delimiters.Benchmark Results:
Rebased onto current
upstream/masteratca61a7a391e0a6910fa51a8891c69e86d708227e.JMH command:
lstripChars.jsonnetrstripChars.jsonnetstripChars.jsonnetbench.09.jsonnetScala Native hyperfine, original go_suite strip files, 50 runs, compared against source-built jrsonnet
0.5.0-pre98:lstripChars.jsonnetrstripChars.jsonnetstripChars.jsonnetScaled temporary Native strip stress cases, 30 runs,
--shell=none, outputs checked equal across master/PR/jrsonnet:lstrip-scaled.jsonnetrstrip-scaled.jsonnetstrip-scaled.jsonnetAnalysis:
Current
masteralready has the main BMPBitSetstrip fast path, so this PR no longer closes a visible go_suite strip gap on JVM JMH. The remaining value is narrower: a GC-friendly ASCII delimiter cleanup that can be slightly positive in scaled Native strip stress cases, while original go_suite cases are too small/noisy to justify moving this out of draft.References:
Result:
Keep as draft. This is safe after rebase and tests, but it should not be marked ready unless a future benchmark shows a stable user-visible improvement or we explicitly decide the GC/allocation cleanup is worth merging on its own.
Validation:
17bee2380f5b1df02a9f975ccb7807a71241f39d./mill --no-server --ticker false --color false -j 1 __.test->Tests: 441, Passed: 441, Failed: 0