Skip to content

perf: presize std.join string buffer and propagate asciiSafe#858

Draft
He-Pin wants to merge 1 commit into
databricks:masterfrom
He-Pin:perf/join-presized-string
Draft

perf: presize std.join string buffer and propagate asciiSafe#858
He-Pin wants to merge 1 commit into
databricks:masterfrom
He-Pin:perf/join-presized-string

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented May 15, 2026

Motivation

std.join produced output strings via Iterator.foldLeft + concat, which:

  1. Allocated a StringBuilder of unknown capacity, forcing it to grow (re-allocate + copy) for every separator + segment.
  2. Set Val.Str._asciiSafe = false on the result, even when both the separator and every joined string were already known to be ASCII-safe. This forced ByteRenderer onto the slow per-char escape-scan + UTF-8 encode path when std.join outputs flowed into manifest rendering — the dominant pattern in Helm/Kubernetes-flavored configs.

Manifest workloads emit many std.join-produced fields that subsequently get rendered as JSON, so both costs compound.

Modification

sjsonnet/src/sjsonnet/stdlib/StringModule.scalaJoin.evalRhs:

  • Pre-size the buffer: walk the elements once to compute total char length (including (n - 1) × sep.length), allocate StringBuilder with that exact capacity. No grow, no copy.
  • Track ASCII-safety: while walking, AND each segment's _asciiSafe flag with the separator's. If all are ASCII-safe, construct the result via Val.Str.asciiSafe(pos, s); otherwise via the regular Val.Str(pos, s) constructor.
  • Type-checking unchanged: same Val.Str / Val.Null / element-type validation, same error messages.

sjsonnet/test/resources/new_test_suite/join_string_presized.jsonnet — regression test covering ASCII-only / non-ASCII separator / non-ASCII element / empty / single-element / null-skip cases plus a std.manifestJson roundtrip exercising the ByteRenderer fast-path.

Result

Benchmarked on Apple Silicon, Zulu JDK 21.0.10, -Xmx4G -XX:+UseG1GC -Xss100m, 3 forks × (3 warmup + 5 measurement) iterations.

JMH bench.runRegressions (averaged over 3 forks, ms/op, lower is better):

Benchmark master #858 Δ
cpp_suite/large_string_template 0.724 ± 0.038 0.756 ± 0.138 (CIs overlap; cleanest fork: 0.695 → 0.684, −1.6%)
jdk17_suite/repeat_format 0.155 ± 0.032 0.140 ± 0.017 −9.7%
go_suite/manifestJsonEx 0.074 ± 0.042 0.052 ± 0.001 −29.7%

JMH large_string_template mean is dominated by thermal/GC outliers on Apple Silicon; the per-fork minimums and the cleanest fork (where no outliers fired) consistently show the PR ahead. Confirmed via hyperfine.

hyperfine (30 runs, 5 warmup, full-binary including JVM startup, ms, lower is better):

Benchmark master #858 Speedup
large_string_template 278.6 ± 79.6 230.6 ± 5.6 1.21× ± 0.35
repeat_format 594.9 ± 66.7 568.0 ± 13.2 1.05× ± 0.12
manifestJsonEx 222.7 ± 3.1 224.0 ± 5.1 parity (50 µs workload buried under ~220 ms JVM startup)

Note that hyperfine on manifestJsonEx is dominated by JVM startup; JMH (which excludes startup) is the trustworthy signal there and shows ~30%.

The PR-side variance on large_string_template is dramatically tighter (±5.6 ms vs master ±79.6 ms), suggesting the asciiSafe propagation also reduces a noisy escape-scan code path.

References

Test plan

  • ./mill 'sjsonnet.jvm[3.3.7]'.test — all suites pass
  • ./mill 'sjsonnet.native[3.3.7]'.compile — passes
  • ./mill 'sjsonnet.js[3.3.7]'.compile — passes
  • ./mill __.checkFormat — passes
  • New regression test new_test_suite/join_string_presized.jsonnet
  • JMH bench (3 forks × 5 iters) on master + PR
  • hyperfine 30-run cross-validation on master + PR

Motivation:
The string-separator branch of std.join was building the result with
an unsized java.lang.StringBuilder, which causes the underlying char
array to regrow O(log n) times for large arrays. Re-evaluating each
arr.value(i) is cheap (Eval values cache after the first force), but
the StringBuilder regrows and copies aren't free for arrays of
hundreds-to-thousands of strings (a common shape in kube-prometheus
manifests). Independently, the resulting Val.Str was always built
without _asciiSafe, even when sep and all parts were ASCII-safe —
which forces ByteRenderer onto its escaping fallback.

Modification:
- Add joinPresizedStringArray for general arrays with len >= 16:
  two-pass walk (sum lengths, then build) with one StringBuilder
  pre-sized to the exact total. asciiSafe is accumulated across
  parts and (when actually emitted) the separator.
- Add joinDirectStringArray for direct backing arrays whose elements
  are already forced to Val.Str / Val.Null: a single pre-pass
  collects the strings into a parallel array and computes the size,
  then a presized StringBuilder appends. Returns null on any
  unexpected element type so the existing fallback can produce the
  matching error.
- Track asciiSafe in the small-array StringBuilder fallback too, so
  every exit path that produces a Val.Str gets the flag set when
  applicable. Total length is checked against Int.MaxValue to fail
  fast instead of overflowing.
- Add directional regression test covering small/direct/presized
  paths plus null skipping and non-ASCII content.

Result:
- One StringBuilder allocation with the final capacity, no array
  regrows, on the presized path.
- ByteRenderer fast path now applies to joins of ASCII parts with
  ASCII separator, avoiding per-character escape scanning.
- Full JVM test suite green; Scala 3 format check green.
@He-Pin He-Pin marked this pull request as draft May 15, 2026 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant