perf(7pre): enablement — donor outer/inner dfast Level 3 + cross-level FSE cache by polaz · Pull Request #146 · structured-world/structured-zstd

polaz · 2026-05-17T10:54:09Z

Summary

Phase 7pre — cross-level enablement / groundwork (see #111 for the restructured Phase 7 plan, default-first per-strategy ordering).

This PR replaces #145 (closed) — same final code state, history squashed to 3 logical commits, scope clarified. The original PR was titled "close level 3 parity gap" but grew well beyond Level 3 into cross-level work; under the restructured taxonomy this is Phase 7pre: enablement, prerequisite for the subsequent per-strategy / per-axis sub-phases.

Honest framing on the "Compressed size" column below: our output is 7% smaller than upstream on the decodecorpus-z000033 corpus. That is NOT confirmation that we beat upstream — it is more likely a signal of algorithmic divergence (we are doing extra match-finding work the upstream fast path skips), which is consistent with us still being 5.5× slower. The deviation may turn out to be the cause of the speed gap, not a separate win. To confirm what upstream would have emitted on the same input we still need the sequence-stream comparator (Phase 7-tooling-seq-cmp in the restructured #111).

Cumulative deltas (compress/level_3_dfast/decodecorpus-z000033, 1 022 035 B input)

Axis	Before this PR	After this PR	FFI 1.5.7	Status
Compress time	32.07 ms (11.0×)	16.11 ms (5.5×)	2.94 ms	−50% closed
Compressed size	556 121 B (+5.5%)	489 846 B (−7.07%)	527 148 B	diverged; likely a cost source, not a win — see note above
Decompress time	4.43 ms (5.3×)	4.43 ms	0.84 ms	Phase 7-decompress lane in #111
Memory (small ≤4 KiB)	378 KiB	249 KiB (−34%)	178 KiB	partial close
Memory (medium 1 MiB)	5.01 MiB	unchanged	3.80 MiB	Phase 7-memory lane in #111
Memory (large 100 MiB stream)	9.92 MiB	unchanged	3.80 MiB	Phase 7-memory lane in #111

Small-input speedup (FSE cache, all levels)

FrameCompressor::new used to rebuild the three predefined LL/ML/OF FSE tables on every call (~12 µs × 3 = ~38 µs). Two-stage fix in commit 65d4cbdb:

Cache tables in process-local AtomicPtr<FSETable>, clone on each ctor — drops the build cost but pays ~4-5 µs per clone of three 256-element SymbolStates arrays.
Switch the cache return type to &'static FSETable and store the static reference inside FseTables directly — the clone disappears, the field becomes a pointer-sized copy.

Effect on small-1k-random/matrix/pure_rust at level 2 dfast (criterion, darwin/aarch64):

	Initial baseline	After cache (clone)	After static refactor	FFI 1.5.7
`compress_slice_to_vec`	44.25 us (22.07 MiB/s, 13.4x FFI)	12.67 us (77.11 MiB/s, 3.9x FFI)	6.08 us (160.75 MiB/s, 1.83x FFI)	3.31 us (298.24 MiB/s)

7.3x total compress-time speedup on the small-input scenario. Applies to every level. The residual ~2.8 us gap to FFI is per-frame setup not covered by this change; covered by Phase 7-compress-default in #111.

Per-scenario sanity at level_3_dfast (no regressions)

Scenario	Rust (this PR)	FFI 1.5.7	Delta
small-1k-random	1 038 B	1 038 B	=
small-10k-random	10 254 B	10 254 B	=
small-4k-log-lines	150 B	156 B	-4%
decodecorpus-z000033	489 846 B	527 148 B	-7%
high-entropy-1m	1 048 616 B	1 048 613 B	+3 B (noise)
low-entropy-1m	150 B	159 B	-6%

Commits

This PR is squashed from 30 commits in #145 into 3 logical commits — same final tree, cleaner review surface:

feat(dfast): donor outer/inner loop rewrite for Level 3 baseline (8c5e7ae)
perf(fse,encoding): cache FSE default tables as &static + tighten ctor (65d4cbd)
test(zstd): interop 7-byte regression + bench-API alignment + gitignore scratch dirs (fd22dd3)

Phase 7 sub-phase plan (per #111)

This PR is the entry point for the restructured Phase 7 sub-phase taxonomy. After merge, the remaining sub-phases are:

Lane A — Compress (default-first): 7-compress-default (Dfast Level 3 residual) -> 7-compress-better (Lazy Level 7) -> 7-compress-best (BtOpt/BtUltra 16-19) -> 7-compress-fastest -> 7-compress-dfast-2 -> 7-compress-greedy -> 7-compress-lazy-lower -> 7-compress-lazy-upper -> 7-compress-btultra2.

Lane B — Decompress (NEW): 7-decompress-baseline (block_content_buffer 0-fill skip + NEON-overshoot match copy) -> 7-decompress-fse (state machine register-resident + BMI2 pdep/pext + NEON bit-extract) -> 7-decompress-huffman.

Lane C — Memory (NEW): 7-memory-tracker (per-alloc-site backtrace tracker, prerequisite) -> 7-memory-medium -> 7-memory-large.

Lane D — Tooling (NEW): 7-tooling-seq-cmp (sequence-stream comparator, prerequisite for 7-compress-default audit).

Verification gates

cargo nextest run --lib --features hash,std,dict_builder — 481 passed, 0 failed (includes interop_7_byte_input_does_not_oob_in_dfast_fast_loop regression).
cargo clippy --all-targets --features hash,std,dict_builder — 0 warnings.
cargo fmt --check — clean.
level22_sequences_match_donor_on_corpus_proxy — donor-parity canary at level 22 (untouched by this PR).
huff0::level22_emitted_literal_sections_are_accepted_by_donor_huf_reader — donor reader-acceptance canary.
1 000-iteration roundtrip fuzzes (roundtrip_compressible_data_1000_iterations, roundtrip_random_data_1000_iterations, roundtrip_streaming_api_1000_iterations) — all green.
Linux fuzz interop smoke job — passes after the HASH_READ_SIZE guard fix; previously failed on the 7-byte input.

Test plan

CI green on all matrix shards (test/lint/build/codecov/fuzz/CodeRabbit)
No regression on level 22 sequence parity
No regression on per-scenario size delta at level 3 dfast
Donor outer/inner rewrite reviewed for borrow-checker / safety invariants (raw-pointer aliasing invariant block in start_matching_fast_loop)
rep1-only inline peek isolated to dfast fast loop; repcode_candidate_shared unchanged for lazy/opt callers
cargo bench --bench compare_ffi re-run on fresh checkout, numbers match the table above
Phase 7 plan in feat: encoder architecture rewrite (split monolith, const-generic Strategy, arena allocator) #111 updated to reflect this PR as 7pre and reordered sub-phases default-first

Summary by CodeRabbit

New Features
- Added an opt-in feature to support constrained targets for table caching.
Bug Fixes
- Fixed an out-of-bounds crash in the fast compression path.
Performance Improvements
- Reduced peak memory during compression via smarter buffer sizing.
- Cached entropy tables to avoid repeated recomputation.
- Added a benchmark build profile with LTO for more representative results.
Tests
- Added regression tests to guard compressor stability.
Chores
- Updated ignore rules to skip local fuzz artifacts and temp dirs.

Port `start_matching_fast_loop` to upstream zstd 1.5.7's outer/inner structure (`zstd_double_fast.c`). The outer `while` runs once per match-found-and-stored; the inner loop carries `hl0`/`idxl0`/`hl1`/ `idxl1` between iterations with two-position lookahead so the long hash is computed once per two positions scanned, not per position. Hash tables are updated at `curr` BEFORE the match check (upstream parity at `zstd_double_fast.c:187`); the short check is a 4-byte gate only with forward `common_prefix_len` running only on hit; the `_search_next_long` retry with precomputed `idxl1` follows the preferred-longer-match policy. Match-finder probes inline into the hot loop via raw-pointer arithmetic; `probe_slot_match` is preserved for the cold callers (`hash_candidate`, dictionary prime, retry/ seed). The inner-loop exit shape is an explicit `enum InnerExit { Committed(MatchCandidate), Tail(usize) }`; every `break 'inner` picks a variant so the post-loop match is exhaustive, removing the implicit pairing the previous draft had between `Option<MatchCandidate>` and a sibling `tail_seed_anchor`. Supporting changes inside the same fast-loop region: - DFAST_HASH_BITS ceiling lowered from 20 to 17 per upstream clevels.h:31 Level 3 sizing; slot storage moves to single u32 + `position_base` rebase scheme (upstream `ZSTD_window_reduce` parity). - DFAST_MIN_MATCH_LEN drops from 6 to 5 per upstream `clevels.h:31` default Level 3 mls=5 — we were silently dropping every 5-byte match. DFAST_REP_MIN_MATCH_LEN stays at 4 (rep extensions inherit the 4-byte upstream floor since rep encoding carries no on-wire offset cost). - Hash kernel switches to upstream's scalar multiply (`v.wrapping_mul(prime8bytes)` then shift), dropping the CRC32d kernel dispatch on the dfast hot path; the CRC32d kernel was 3-4 instructions of dispatch overhead per insert before the inline, upstream's one-mul-shift is strictly cheaper. - Repcode peek inlined at ip+1 with a 4-byte gate (upstream zstd_double_fast.c:190); only `offset_1` is probed inline, the full rep0/rep1/rep2 walk lives in the lazy/btopt strategies. Removing the rep2/rep3 walk from the dfast inline saved both speed (16% on level_3_dfast/decodecorpus-z000033) and ratio (−494 B; fewer rep2/rep3 false-positive picks displacing better hash matches). - Skip-step now grows by scan distance rather than consecutive-miss threshold (upstream zstd_double_fast.c:224-228 parity): step increases every `DFAST_SKIP_STEP_GROWTH_INTERVAL = 64` positions traveled (upstream uses 256 — see "Donor-deviation audit" in PR body), so blocks the strict `block_looks_incompressible_strict` gate used to bail out of early now scan through the standard ramp. The strict helper itself stays for `levels/fastest.rs` and `incompressible.rs`. - Fast-loop boundary tail-position probe (`probe_tail_ip0_only`) covers the case where `ip0` is still hashable but `ip1` falls past the end (`pos == current_len - HASH_READ_SIZE`). Mirrors the inner loop's long+short probe at ip0; the rep peek and `_search_next_long` retry are skipped (both depend on ip1), matching upstream's iend-boundary tradeoff. - Short-hash hit enforces `DFAST_MIN_MATCH_LEN` floor before committing. The 4-byte equality gate plus forward extension can produce `match_len < 5`; the `_search_next_long` retry can still upgrade to a valid long match, but with no upgrade the below-floor hit is now discarded and the loop continues. Matches the floor `probe_slot_match` already enforces on the long-hash main path. - Fast-loop guards tightened from `+ DFAST_MIN_MATCH_LEN (5)` to `+ HASH_READ_SIZE (8)`. Previous guard let the unconditional 8-byte `u64::read_unaligned` for the long-hash probe read up to 3 bytes past the live history end; Linux fuzz ASan caught it on a 7-byte input. Regression test added in `tests/fuzz_regressions.rs`. - Fast-loop entry calls `ensure_room_for` to rebase position_base before any inner-loop slot pack. The inner loop's `packed_curr` cast `(abs_ip0 - position_base) as u32` could truncate after a long stream of all-miss / non-hashable blocks (fast-loop bypasses `insert_position`'s rebase). - `trim_to_window` now drops the dead prefix via `split_off` instead of `compact_history`'s `drain` (which leaves the original Vec's capacity pinned); the trim boundary pays one realloc in exchange for actually shedding the prefix to the system allocator. Signature also drops the unused `_reuse_space` closure (Dfast's history-only storage has no per-block Vec to recycle — unlike HC / Row which need the callback). - `skip_matching` / `skip_matching_dense` / `start_matching` early- return on `current_len == 0`. `add_data` short-circuits empty input without pushing to `window_blocks`; without the guard these methods would re-seed the previous block's retained tail on every empty flush. - `get_last_space` (trait dispatch entry point) returns `&[]` on empty `window_blocks` instead of panicking, mirroring the inline gate pattern used by `skip_matching` / `start_matching` / `emit_candidate` / `emit_trailing_literals`. - `probe_slot_match` takes a `gate_len` parameter (8 for long-hash callers, 4 for short-hash). Replaces the prior 1-byte precheck which let 8-byte-collision short matches slip past the gate. `Cargo.toml` adds `[profile.bench]` overrides (`lto = "fat"`, `codegen-units = 1`) scoped to bench builds only — full cross-crate inlining for benches without forcing 5-10× release build-time on `cli` and downstream consumers. Verification on `compress/level_3_dfast/decodecorpus-z000033` (1 022 035 B input): Compress time: 32.07 ms (11.0× FFI) -> 16.11 ms (5.5× FFI) Compressed size: 556 121 B (+5.5%) -> 489 846 B (−7.07%) Memory (small ≤4 KiB): 378 KiB -> 249 KiB (−34%) Compressed-size delta is honestly framed: smaller-than-donor does NOT necessarily mean we win — extra match-finding work the upstream fast path skips can produce smaller output AND slower encode time simultaneously. The sequence-stream comparator (#99, deferred to Lane D of restructured Phase 7 in #111) is the audit tooling needed to triage.

`FrameCompressor::new` used to rebuild the three predefined LL/ML/OF FSE tables on every call (~12 µs per `build_table_from_probabilities` × 3 = ~38 µs). On a 1 KiB frame the actual compression work is ~1-5 µs, so the predefined-table build dominated per-frame cost for tiny inputs. Two-stage fix: 1. Each `default_*_table()` helper owns a private `static AtomicPtr<FSETable>` initialized via lock-free compare-exchange (`core::sync::atomic`, works in std and no_std alike — only `alloc` is required). First call leaks a `Box<FSETable>` to the cache; subsequent calls return the same pointer. The slot is selected at compile time by the static identity of each helper, not by pointer comparison on the distribution slice — `LL_DIST`/`ML_DIST`/`OF_DIST` are `const`, not `static`, and rustc does not guarantee a single materialized address for const slice references (a future codegen change that duplicates the const per use site would silently bypass a pointer-keyed cache). The helper now returns `&'static FSETable`. 2. `FseTables` stores the static reference directly (`ll_default: &'static FSETable`) rather than an owned `FSETable`. `FrameCompressor::new`'s FSE setup becomes three pointer-sized copies instead of three deep clones of 256-element `SymbolStates` arrays. `clone_fse_tables` (block-split estimator path) becomes a pointer copy on the three default fields too; only the `*_previous` per-frame tables still need a deep clone. `get_or_init_cached_table` uses `Acquire` on the load and `AcqRel` on the compare-exchange success path. The success-path ordering is stronger than the minimal `Release` would suggest; matching the failure `Acquire` keeps the success/failure branches symmetric on rustc's atomic surface, and the path runs at most three times per process so any cost is in the noise. Verification on `compress/level_2_dfast/small-1k-random/matrix/ pure_rust` (criterion, darwin/aarch64, fresh checkout): Pre-cache baseline: 44.25 µs (22.07 MiB/s, 13.4× FFI) After AtomicPtr cache: 12.67 µs (77.11 MiB/s, 3.9× FFI) After &'static refactor: 6.08 µs (160.75 MiB/s, 1.83× FFI) FFI 1.5.7: 3.31 µs (298.24 MiB/s) Cumulative: 7.3× compress-time speedup on the small-input scenario. Applies to every level since `FseTables::new()` runs inside `FrameCompressor::new` regardless of strategy. Adjacent encoding tightening in the same commit: - `compress_to_vec` is split — slice-based work moves into a new public `compress_slice_to_vec` that accepts `&[u8]` directly, while the `Read`-based variant remains a thin wrapper. Both seed the output `Vec` with `min(compress_bound(src), 128 KiB)` instead of the prior `compress_bound(src)` upfront pin — saves ~100 MiB peak on `large-log-stream` (100 MiB highly-compressible input). `compress_slice_to_vec` rustdoc enumerates the RSS shape change and the `# Panics` modes (encoder error, OOM); `compress_to_vec` rustdoc explicitly flags that it's NOT a streaming API — the source is materialized fully before any compression runs, peak bounded by `source.len()`. Both surfaces point at [`StreamingEncoder`] for unbounded inputs. - `FrameCompressor::compress`'s `all_blocks` accumulator pre- reserves capacity scaled by the source-size hint instead of a flat 128 KiB. Saves ~130 KiB peak for sub-4 KiB inputs without regression on medium / large. The `start_matching_fast_loop` preamble grows a raw-pointer aliasing invariant SAFETY block enumerating which fields the inner loop touches through cached `as_mut_ptr` raw pointers (`short_hash`, `long_hash`, `history`) vs which it accesses through `&self` shared reads (`offset_hist`, slice views of `history`). The pattern is sound under Stacked Borrows / Tree Borrows because the fields are physically disjoint allocations; the comment locks the invariant in so a future refactor adding a `&mut self` call inside the loop has to either hoist the call or reload the cached raw pointers. needless-borrow clippy autofixes in `encoding/blocks/compressed.rs`, `fse/mod.rs`, and `encoding/match_generator.rs` (consumers of the now-`&'static` default tables no longer need an outer `&`).

…re scratch dirs - `tests/fuzz_regressions.rs`: new `interop_7_byte_input_does_not_oob_in_dfast_fast_loop` regression for the Linux-fuzz `interop` artifact that exposed the dfast fast-loop OOB read. Bytes inline; pins to `CompressionLevel::Level(3)` so a future retune of the `Default` alias cannot accidentally route the test off the dfast path. Roundtrips through the in-tree decoder. Docstring calls out the sanitizer caveat — a plain `cargo nextest run` may complete successfully against pre-fix code because the OOB `read_unaligned` usually lands inside the live `Vec`'s spare capacity; the regression reliably fires only under ASan (Linux fuzz CI) or `cargo +nightly miri test`. Treat green local test as a smoke check, the authoritative signal is the CI fuzz job. - `benches/compare_ffi.rs` + `compare_ffi_memory.rs`: minor alignment with the public-API split (the new `compress_slice_to_vec` entry point); no behaviour changes to the bench harness, just keeping the call sites in sync. - `.gitignore`: ignore `.local/` (per-branch scratch / helper-doc directory used for `PHASE7C_FINDINGS.md` etc.), cargo-fuzz artifacts under `zstd/fuzz/artifacts/**` (libfuzzer content-hash filenames change on each run; in-repo regression vectors live inline in `tests/fuzz_regressions.rs` instead).

coderabbitai · 2026-05-17T10:54:21Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8ac47496-61d5-4a97-a127-d317245b4eb1

📥 Commits

Reviewing files that changed from the base of the PR and between 256c28b and b86f51e.

📒 Files selected for processing (2)

zstd/src/encoding/match_generator.rs
zstd/src/tests/fuzz_regressions.rs

📝 Walkthrough

Walkthrough

Caches predefined FSE tables with cfg-specific backings, exposes ref-based accessors, pre-allocates compression outputs via a compress_bound heuristic, refactors Dfast to a contiguous history and single-slot hash tables with rewritten matching, and updates benches, tests, Cargo features, and .gitignore.

Changes

FSE Caching, Encoder Integration, and Dfast Matcher Refactor

Layer / File(s)	Summary
FSE table caching and default-table API `zstd/src/fse/fse_encoder.rs`, `zstd/src/fse/mod.rs`	Add cached/static and boxed fallback paths for default LL/ML/OF FSE tables and expose `FseDefaultTable` alias; update test callsites to deref when needed.
Encoder & blocks: use default refs and clone behavior `zstd/src/encoding/frame_compressor.rs`, `zstd/src/encoding/blocks/compressed.rs`	Change `FseTables` default fields to `FseDefaultTable`, add `*_default_ref()` accessors, update `clone_fse_tables`, and pass default refs into `choose_table`/`previous_table` and tests.
compress_bound and preallocation `zstd/src/encoding/mod.rs`	Add `compress_bound(src_size)` const fn; make `compress_to_vec` delegate to `compress_slice_to_vec` and pre-size output `Vec` with `min(compress_bound(len), OUTPUT_BLOCK_CAP)`.
FrameCompressor hint capture and all_blocks tiering `zstd/src/encoding/frame_compressor.rs`	Capture `source_size_hint` locally and compute tiered initial capacity for `all_blocks` from tiny/small/default thresholds.
Dfast state: history and single-slot hash tables `zstd/src/encoding/dfast/mod.rs`	Replace per-block retained `VecDeque<Vec<u8>>` with contiguous `history` + `window_blocks`; switch multi-slot buckets to single-slot `Vec<u32>` tables and add short/long bit sizing and index helpers.
Dfast matching loops, probe, rep-extension, and probing `zstd/src/encoding/dfast/mod.rs`	Rewrite `start_matching_fast_loop`, add `probe_tail_ip0_only`, centralize `probe_slot_match`, enforce rep-extension floor, and update insertion/probing semantics for single-slot tables.
Dfast constants, driver API, and match-generator integration `zstd/src/encoding/match_generator.rs`, `zstd/src/encoding/dfast/mod.rs`	Lower `DFAST_MIN_MATCH_LEN` 6→5, remove `DFAST_SEARCH_DEPTH`, add `DFAST_SHORT_HASH_BITS_DELTA`, change `reset`/`trim_to_window` contract to no-callback form, and fix eviction accounting to use `window_size` deltas.
Dfast tests and regression coverage `zstd/src/encoding/match_generator.rs`, `zstd/src/encoding/dfast/mod.rs`, `zstd/src/tests/fuzz_regressions.rs`	Update tests for single-slot equality/indexing and window_blocks state, remove callback-based eviction assertions, and add exact-5-byte and 7-byte regressions.

Benchmark Updates, Config, and Regression Tests

Layer / File(s)	Summary
Benchmark compression API migration `zstd/benches/compare_ffi.rs`, `zstd/benches/compare_ffi_memory.rs`	Benches switch Rust-side compression calls from `compress_to_vec` to `compress_slice_to_vec`.
Dfast fast-loop OOB regression test `zstd/src/tests/fuzz_regressions.rs`	Add regression test compressing a fixed 7-byte input and asserting successful roundtrip decode.
Cargo bench profile, zstd feature, and gitignore `Cargo.toml`, `zstd/Cargo.toml`, `.gitignore`	Add `[profile.bench]` (lto="fat", codegen-units=1), add optional `critical-section` dep and public feature, and ignore `zstd/fuzz/artifacts/**` and `.local/` in `.gitignore`.
Gate internal tests module `zstd/src/lib.rs`	Add `#[cfg(test)]` to `mod tests;` so internal tests compile only during test builds.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

feat: encoder architecture rewrite (split monolith, const-generic Strategy, arena allocator) #111: Overlaps large match-generator/dfast refactors and driver contract changes.
Roadmap: structured-zstd feature parity with C zstd #28: Related Dfast and FSE table usage changes and roadmap alignment.

Possibly related PRs

structured-world/structured-zstd#34: Related Dfast matcher wiring and behavior changes.
test(bench): expand zstd benchmark suite #38: Bench changes touching compare_ffi.rs and rust_compressed generation.
feat(encoder): FSE table reuse and offset history optimization #33: Related FSE table selection and choose_table/previous_table interactions.

Poem

🐰 I cached the tables beneath the moonlit tree,

I sized my vec to fit what soon would be,
From bucketed nests to single-slot delight,
Five-byte matches hop and keep the loop polite,
A rabbit hums: regressions fixed—goodnight.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'perf(7pre): enablement — donor outer/inner dfast Level 3 + cross-level FSE cache' directly and accurately describes the primary changes: performance improvements through donor loop rewrite for dfast Level 3 and FSE table caching across compression levels.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/7pre-enablement

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-17T10:56:44Z

Codecov Report

❌ Patch coverage is 94.46640% with 42 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
zstd/src/encoding/dfast/mod.rs	92.61%	41 Missing ⚠️
zstd/src/encoding/match_generator.rs	99.07%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

Phase 7pre enablement work to bring dfast Level 3 closer to donor behavior/perf and lay groundwork for cross-level improvements, including reducing per-frame setup costs and tightening memory behavior.

Changes:

Reworked dfast internals (min match length, table layout, fast-loop structure, history storage) to align more closely with donor semantics and reduce memory footprint.
Added a process-local cache for predefined LL/ML/OF FSE tables and refactored encoder state to store &'static FSETable references to avoid per-frame rebuild/clone cost.
Added a fuzz regression test for the 7-byte interop crash and aligned benches to use the new slice-to-Vec compression entry point.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
zstd/src/tests/fuzz_regressions.rs	Adds 7-byte regression test covering the prior dfast fast-loop OOB/UB scenario.
zstd/src/fse/mod.rs	Updates tests/call sites for predefined FSE helpers now returning `&'static FSETable`.
zstd/src/fse/fse_encoder.rs	Implements predefined FSE table caching and changes default-table helpers to return `&'static` references.
zstd/src/encoding/mod.rs	Adds `compress_bound()` and new `compress_slice_to_vec()`; routes `compress_to_vec()` through it and seeds output capacity.
zstd/src/encoding/match_generator.rs	Updates dfast constants and driver lifecycle logic to match dfast backend refactors and default-table signature changes.
zstd/src/encoding/frame_compressor.rs	Stores default FSE tables by reference; tweaks block accumulation capacity based on source size hint.
zstd/src/encoding/dfast/mod.rs	Major dfast matcher refactor: single-slot hash tables, history-only storage, fast-loop safety/perf changes, and guard tightening.
zstd/src/encoding/blocks/compressed.rs	Adapts FSE table selection/clone logic to `&'static` default tables.
zstd/benches/compare_ffi.rs	Uses `compress_slice_to_vec()` to avoid benchmarking input-buffering overhead.
zstd/benches/compare_ffi_memory.rs	Uses `compress_slice_to_vec()` for more representative peak-memory measurement.
Cargo.toml	Sets `[profile.bench]` to fat LTO + codegen-units=1 for more realistic bench results.
.gitignore	Ignores new local fuzz artifacts and `.local/` scratch output.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@zstd/src/encoding/dfast/mod.rs`:
- Around line 1438-1445: Add a debug assertion to ensure the structural
precondition that the trailing-block length is positive before slicing: in the
block where you compute last_len from
self.window_blocks.back().copied().unwrap_or(0) (used by emit_candidate), insert
a debug_assert!(last_len > 0) (or equivalent) immediately before creating
current = &self.history[self.history.len() - last_len..] so tests fail at the
source of the invariant violation rather than later when indexing current.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fe52333b-a095-4426-94af-042976806f2e

📥 Commits

Reviewing files that changed from the base of the PR and between 9d6a4fc and fd22dd3.

📒 Files selected for processing (12)

.gitignore
Cargo.toml
zstd/benches/compare_ffi.rs
zstd/benches/compare_ffi_memory.rs
zstd/src/encoding/blocks/compressed.rs
zstd/src/encoding/dfast/mod.rs
zstd/src/encoding/frame_compressor.rs
zstd/src/encoding/match_generator.rs
zstd/src/encoding/mod.rs
zstd/src/fse/fse_encoder.rs
zstd/src/fse/mod.rs
zstd/src/tests/fuzz_regressions.rs

…debug_assert - `fse_encoder`: gate the `AtomicPtr<FSETable>` lock-free cache behind `#[cfg(target_has_atomic = "ptr")]`. Targets that lack atomic pointer support (e.g. `thumbv6m-none-eabi`, AVR) were failing to compile against the unconditional `core::sync::atomic` import. Added a `#[cfg(not(target_has_atomic = "ptr"))]` fallback that uses a `static mut` raw pointer with single-threaded serial access — targets without atomic pointer support are by construction single-threaded embedded environments, so a non-atomic mutable static is sound. Same `Box::leak` lifetime, same `&'static FSETable` return shape as the atomic path, so consumers don't see a feature-flag-shaped API. Mirrors the pattern the crate already uses for `decoding::dictionary:: DictionaryHandle` (`Arc` vs `Rc`). Verified with `cargo check --target thumbv6m-none-eabi --no-default-features --features hash`. - `fse_encoder`: header comment block rewritten to describe both the atomic and no-atomic paths and to drop the stale "hand callers a clone" wording — the API now returns `&'static FSETable`, callers no longer clone. - `dfast::emit_candidate`: added `debug_assert!(last_len > 0)` immediately after the `window_blocks.back().copied().unwrap_or(0)` read. The function's structural precondition (it runs only after a successful match in the active block) was previously documented in a comment but not enforced in debug builds — the assert surfaces a violation at the source rather than via a confusing out-of-range panic on the `current[*literals_start..start]` subslice below.

Copilot

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 1 comment.

…uild Replaces the prior static-mut no-atomic fallback (which assumed non-concurrency on no-atomic targets — wrong, interrupt / preemption reentrancy on bare-metal can still cause concurrent first-call initialization, making the static-mut slot a data race surface and UB) with a three-way split driven by target_has_atomic = "ptr" and the new optional critical-section feature: 1. target_has_atomic = "ptr" (every modern desktop / server / mobile / Cortex-M3+ / RISC-V-A target) — unchanged lock-free AtomicPtr cache. Sound under multi-threaded access (rayon worker threads racing on first call all converge through the compare_exchange; subsequent reads of &static FSETable are immutable shared references which are Sync because FSETable has no interior mutability). 2. not(target_has_atomic = "ptr") + critical-section feature — static CACHE: CsCachedTablePtr (newtype wrapping UnsafeCell<*mut FSETable> with manual unsafe impl Sync), accessed exclusively under critical_section::with. CS impl supplied by the downstream embedded runtime (cortex-m-rt, riscv-rt, embassy-executor, esp-hal, ...). First call takes the CS, builds + Box::into_raws, publishes; subsequent calls return the cached pointer. CS held for one build_table_from_probabilities (~12 us) on the cold path, zero overhead afterwards. 3. not(target_has_atomic = "ptr") without critical-section — per-call build + Box::leak. No cache, accepts the memory accumulation tradeoff against the UB risk of a synchronization- free static-mut slot. Same as the pre-PR status quo on no-atomic, with the addition of intentional leak (which embedded users who repeatedly construct FrameCompressor should mitigate by enabling the critical-section feature). Cargo.toml gains the optional dep critical-section = "1.2" gated by a new feature of the same name. On atomic targets the dep is dead code and never instantiated. Header comment block rewritten to document all three paths, including the explicit "lack of atomic instructions is not a non-concurrency guarantee" rationale that prompted this revision. The previous draft's safety argument was wrong; the new comment makes the trade-offs explicit per target family. Verified host atomic path (cargo clippy / nextest 481 passed) and both no-atomic sub-paths (cargo check --target thumbv6m-none-eabi with and without critical-section feature) build clean.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@zstd/src/fse/fse_encoder.rs`:
- Around line 819-830: The current no-atomic fallback permanently leaks memory
by Box::leak in build_and_leak_table; replace the leaking behavior with an
owned, non-'static return so callers hold responsibility for dropping the table:
change build_and_leak_table to construct the table with
build_table_from_probabilities and return either Box<FSETable> or FSETable
(instead of &'static FSETable), update all call sites to accept and drop the
owned table (or call .into_inner() where appropriate), and remove the Box::leak
to avoid unbounded memory growth on non-atomic targets.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c533324b-2991-4c7b-bb9f-7e078b21bc3e

📥 Commits

Reviewing files that changed from the base of the PR and between 9001027 and bb70fbd.

📒 Files selected for processing (2)

zstd/Cargo.toml
zstd/src/fse/fse_encoder.rs

Copilot

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 1 comment.

Removes the per-call Box::leak from the no-atomic-without- critical-section fallback. The leak accumulated memory unboundedly across multiple FrameCompressor::new calls on the smallest supported targets (Cortex-M0/M0+, AVR, MSP430) — CodeRabbit and Copilot both flagged the unbounded-growth concern. Type-level fix via a new fse_encoder::FseDefaultTable cfg alias: - target_has_atomic = "ptr" OR feature "critical-section": &'static FSETable (cached singleton, zero-cost subsequent calls) - otherwise: alloc::boxed::Box<FSETable> (owned per-frame allocation, dropped with the owning FrameCompressor — same memory shape as the pre-PR status quo on no-atomic targets) FseTables struct fields switch from a hard-coded &'static FSETable to the type alias; both arms Deref to FSETable. Consumers in encoding/blocks/compressed.rs borrow uniformly through three new FseTables accessor methods (ll_default_ref / ml_default_ref / of_default_ref) so the per-target divergence stays inside the struct surface and downstream call sites do not need cfg. clone_fse_tables (block-split estimator path, levels 11+) uses an explicit per-arm clone via cfg-conditional struct-literal fields: on atomic / CS the field is Copy (zero-cost field-access copy); on no-atomic-no-CS the field is Box<FSETable> and needs Clone::clone for a deep copy. The naive uniform .clone() resolves via auto-deref to FSETable::clone (returns owned FSETable) which is the WRONG return type for the atomic arm, hence the explicit split. Verified: cargo clippy --lib --tests clean on host (atomic); cargo nextest run --lib passes 481/481. Pre-existing thumbv6m build failure in tests/roundtrip_integrity.rs (unconditional extern crate std at line 6 vs the cfg-gated mod declaration in tests/mod.rs:582) remains — same failure mode on main, not introduced by this commit. Tracked separately.

sw-release-bot

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'structured-zstd vs C FFI (x86_64-gnu)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.30.

Benchmark suite	Current: `b86f51e`	Previous: `1e0f695`	Ratio
`compress/level_22_btultra2/small-4k-log-lines/matrix/c_ffi`	`0.113` ms	`0.067` ms	`1.69`
`compress/level_22_btultra2/decodecorpus-z000033/matrix/c_ffi`	`248.653` ms	`171.961` ms	`1.45`

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

The `tests/` directory contains test-only modules with unconditional `extern crate std;` items (e.g. `roundtrip_integrity.rs:6`). With `mod tests;` declared unconditionally in `lib.rs`, those modules were pulled into every lib build — including no_std targets that do not have `std` available — producing E0463 "can't find crate for `std`" on `thumbv6m-none-eabi`, AVR, MSP430, etc. Gating the module declaration on `cfg(test)` is the canonical Rust pattern: the test code is part of the lib only when the harness is being built; the production lib stays clean. No other module in the crate references `crate::tests::*` from non-test code, so the gate is behaviour-preserving for the test harness path. Verified: - cargo check --lib --no-default-features --features hash --target thumbv6m-none-eabi — passes (previously failed with 3 E0463 / E0432 errors) - cargo check --lib --no-default-features --features hash,critical-section --target thumbv6m-none-eabi — passes - cargo nextest run --lib on host — 481/481 tests pass, unchanged - cargo build --lib on host (default features) — clean

Copilot

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (4)

zstd/src/encoding/match_generator.rs:4235

Same FseDefaultTable cfg-typing issue as above: Some(ll) only works when default_*_table() returns &'static FSETable. To keep these tests compiling on no-atomic/no-critical-section targets (where the alias is Box<FSETable>), borrow through Deref (e.g., Some(&*ll)).

    let ll = crate::fse::fse_encoder::default_ll_table();
    let ml = crate::fse::fse_encoder::default_ml_table();
    let of = crate::fse::fse_encoder::default_of_table();
    hc.seed_dictionary_entropy(None, Some(ll), Some(ml), Some(of));

zstd/src/encoding/match_generator.rs:4278

Same FseDefaultTable cfg-typing issue: Some(ll) will fail to compile when default_*_table() returns Box<FSETable> (no-atomic/no-critical-section). Borrow through Deref (e.g., Some(&*ll) / Some(&*ml) / Some(&*of)) to keep this test cfg-agnostic.

    let ll = crate::fse::fse_encoder::default_ll_table();
    let ml = crate::fse::fse_encoder::default_ml_table();
    let of = crate::fse::fse_encoder::default_of_table();
    hc.seed_dictionary_entropy(None, Some(ll), Some(ml), Some(of));

zstd/src/encoding/match_generator.rs:4338

Same FseDefaultTable cfg-typing issue: Some(ll) assumes default_*_table() returns a reference. On cfgs where it returns Box<FSETable>, this call won’t type-check against Option<&FSETable>. Borrow through Deref so the test remains portable.

    let ll = crate::fse::fse_encoder::default_ll_table();
    let ml = crate::fse::fse_encoder::default_ml_table();
    let of = crate::fse::fse_encoder::default_of_table();
    hc.seed_dictionary_entropy(None, Some(ll), Some(ml), Some(of));

zstd/src/encoding/match_generator.rs:4484

Same FseDefaultTable cfg-typing issue: default_*_table() can return Box<FSETable> on no-atomic/no-critical-section builds, so Some(ll) won’t match Option<&FSETable>. Borrow through Deref (e.g., Some(&*ll)) to keep this test compiling across cfgs.

    let ll = crate::fse::fse_encoder::default_ll_table();
    let ml = crate::fse::fse_encoder::default_ml_table();
    let of = crate::fse::fse_encoder::default_of_table();
    stats.seed_dictionary_entropy(Some(&huff), Some(ll), Some(ml), Some(of));

`default_ll_table()` / `default_ml_table()` / `default_of_table()` now return `FseDefaultTable`, the cfg-typed alias added in the previous round. On atomic / `critical-section` targets it resolves to `&'static FSETable` (passes directly anywhere `&FSETable` is expected); on no-atomic / no-CS it resolves to `Box<FSETable>` (needs `&*` reborrow to satisfy a `&FSETable` parameter). Six test sites were assuming the reference shape and would not compile on the no-atomic / no-CS arm: - `fse/mod.rs::table_header_bits_exact` (3 `check(...)` calls) - `match_generator.rs::dictionary_entropy_seed_initializes_opt_state_from_tables` - `match_generator.rs::dictionary_fse_seed_applies_without_huffman_seed` - `match_generator.rs::dictionary_seed_overrides_predef_price_mode_on_tiny_input` - `match_generator.rs::btultra2_seed_pass_disabled_when_dictionary_entropy_seed_present` - `match_generator.rs::dictionary_huffman_seed_ignored_when_literals_uncompressed` Wrapped the 15 call-site occurrences in `&*` so the reborrow is canonical across both cfg arms. On the atomic arm clippy flags `&*` as needless (the value is already `&FSETable`); each affected test function carries a localized `#[allow(clippy::borrow_deref_ref)]` next to its `#[test]` attribute. `Cargo.toml` `critical-section` feature comment refreshed to match the actual no-atomic / no-CS behaviour: per-call build of an owned `Box<FSETable>` stored in `FseTables` and dropped with the owning `FrameCompressor` — no Box::leak, no unbounded growth (the prior draft of the comment described leaky behaviour that has since been fixed).

Copilot

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated no new comments.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 681-692: The Dfast branch in commit_space() is incorrectly
counting the input buffer as evicted; update the Dfast handling (use
dfast_matcher_mut()/Dfast matcher) to compute eviction by capturing pre =
dfast.window_size, calling dfast.add_data(space, |...| { ... }) (the existing
callback), and then incrementing evicted_bytes by pre - dfast.window_size
instead of using the input buffer size; this mirrors the
trim_to_window()/window_size-delta pattern used in trim_after_budget_retire and
prevents premature shrinking of dictionary_retained_budget/max_window_size.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d667e539-adf7-4ab0-9ff3-0744dbb78890

📥 Commits

Reviewing files that changed from the base of the PR and between 3f5a6a9 and 64e0fcc.

📒 Files selected for processing (3)

zstd/Cargo.toml
zstd/src/encoding/match_generator.rs
zstd/src/fse/mod.rs

…mit_space The `commit_space` Dfast branch was counting the INPUT buffer length as `evicted_bytes`. Dfast's `add_data` callback receives the input `Vec<u8>` for pool recycling (Dfast stores bytes in the contiguous `history` buffer; there is no per-block `Vec<u8>` to pop and hand back, unlike Row / HashChain which the callback contract is shaped around). Counting `data.len()` therefore conflated "new bytes ingested" with "old bytes evicted from the window". The two happen to coincide when the prior window was exactly saturated and the new input fills it 1:1 — which is the only case the existing `dfast_add_data_callback_reports_evicted_len_not_capacity` fixture exercised (max=8, prev=8, new=8 → both numbers are 8). The bug surfaces on the divergent path: max=10, window=[4,4] (size 8), new input 5 bytes → pop one 4-byte block (evicted=4), push 5 (window=[4,5], size=9). The old accounting reported `evicted=5`, which then drives `retire_dictionary_budget` to reclaim 1 byte too many and shrinks `max_window_size` prematurely — degrading dictionary retention. Fix: derive the actual eviction from `window_size` before/after, plus the input length. The pop loop inside `add_data` decrements `window_size` by each evicted block length and the final `extend_from_slice + push_back` adds `space_len`, so `evicted = pre + space_len - post` is the exact reverse-engineering of what the pop loop did. Same `window_size`-delta pattern already in use by `trim_after_budget_retire` for the Dfast variant. The `pre.saturating_add(space_len).saturating_sub(m.window_size)` form keeps the arithmetic robust against pathological pre+space_len overflow on i686 (where `usize::MAX` is reachable in tests of extreme inputs). Regression test `dfast_commit_space_eviction_uses_window_size_delta` forces the divergent case (4 + 4 + 5 = 13 bytes vs max_window = 10) and asserts the formula yields 4 (the actual eviction), not 5 (the input size). The prior `dfast_add_data_callback_reports_evicted_len_not_capacity` fixture is unchanged — its concern (callback data is not the raw backing capacity) is orthogonal and still holds.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@zstd/src/encoding/match_generator.rs`:
- Around line 6990-7048: The test
dfast_commit_space_eviction_uses_window_size_delta currently exercises
DfastMatchGenerator.add_data directly; instead exercise the driver path by
constructing a MatchGeneratorDriver (wrapping the DfastMatchGenerator) and feed
the three blocks [4,4,5] through the driver's public input API so that
MatchGeneratorDriver.commit_space / its driver-backed commit logic runs the same
code path as the production driver; after the third input assert the driver's
observable dictionary_retained_budget (or the driver's window-related
observable) equals 96 (start 100 → evicted 4) and/or that the driver's window
size delta matches the pre+space_len-post calculation to ensure commit_space was
applied by the driver rather than only by DfastMatchGenerator.add_data.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1fb99246-f4d5-4f12-8543-5968a67bd059

📥 Commits

Reviewing files that changed from the base of the PR and between 64e0fcc and 256c28b.

📒 Files selected for processing (1)

zstd/src/encoding/match_generator.rs

Copilot

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated 1 comment.

… std - dfast_commit_space_eviction_uses_window_size_delta now drives the `[4, 4, 5]` fixture through `MatchGeneratorDriver::commit_space` rather than calling `DfastMatchGenerator::add_data` directly and re-deriving the formula on the matcher side. The previous draft would have passed even if `commit_space`'s Dfast branch regressed back to `evicted_bytes += data.len()` — the matcher-only path doesn't run `commit_space` at all. Asserting on the driver's observable `dictionary_retained_budget` and the post-retire `max_window_size` proves the driver-side accounting is the one that fires. - Trace through the assertion targets (budget = 100, max_window = 10, commits [4, 4, 5]): 1st commit: ws 0 → 4, evicted = 0, no retire. budget 100. 2nd commit: ws 4 → 8, evicted = 0, no retire. budget 100. 3rd commit: 8 + 5 > 10 → pop 4-byte block, push 5. ws 9. evicted = pre + space_len - post = 8 + 5 - 9 = 4. retire(4) → budget 96, max_window 6, returns true. trim_after_budget_retire cascade: iter1: ws 9 > 6, pop [4] → ws 5, evicted 4. retire(4) → budget 92, max_window 2. iter2: ws 5 > 2, pop [5] → ws 0, evicted 5. retire(5) → budget 87, max_window 0. iter3: ws 0, exit. Final budget = 87, max_window = 0. - If the Dfast branch in commit_space regresses to `data.len()` the 3rd commit would compute evicted = 5; retire(5) would reclaim one more byte than the fix on each cascade pass. On the 2nd commit (also evicted = 4 under the bug) the cascade alone shrinks max_window_size to 0 prematurely, and the 3rd commit then trips `assert!(data.len() <= self.max_window_size)` in `DfastMatchGenerator::add_data`. Either failure mode surfaces as a test failure rather than silently passing. - `fuzz_regressions.rs::interop_7_byte_input_does_not_oob_in_dfast_fast_loop` drops a stray `extern crate std;` line. The test body does not reference `std::` anywhere (only `crate::decoding::*` / `crate::encoding::*` / `alloc::vec::Vec` via `Vec` shorthand). The Rust 2024 edition rules around unused `extern crate` are stricter than 2021 and the upcoming clippy `legacy_numeric_constants` / `unused_extern_crates` lints would flag this on future toolchain bumps. Test still runs identically.

Copilot

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated 1 comment.

polaz added 3 commits May 17, 2026 13:50

Copilot AI review requested due to automatic review settings May 17, 2026 10:54

polaz mentioned this pull request May 17, 2026

perf(dfast): close level 3 parity gap — compress 11.0× → 5.5× of donor #145

Closed

6 tasks

Copilot started reviewing on behalf of polaz May 17, 2026 10:54 View session

Copilot AI reviewed May 17, 2026

View reviewed changes

Comment thread zstd/src/fse/fse_encoder.rs

Comment thread zstd/src/fse/fse_encoder.rs Outdated

polaz mentioned this pull request May 17, 2026

feat: encoder architecture rewrite (split monolith, const-generic Strategy, arena allocator) #111

Open

20 tasks

coderabbitai Bot reviewed May 17, 2026

View reviewed changes

Comment thread zstd/src/encoding/dfast/mod.rs

polaz requested a review from Copilot May 17, 2026 11:09

Copilot started reviewing on behalf of polaz May 17, 2026 11:09 View session

Copilot AI reviewed May 17, 2026

View reviewed changes

Comment thread zstd/src/fse/fse_encoder.rs Outdated

polaz requested a review from Copilot May 17, 2026 11:24

Copilot started reviewing on behalf of polaz May 17, 2026 11:24 View session

coderabbitai Bot reviewed May 17, 2026

View reviewed changes

Comment thread zstd/src/fse/fse_encoder.rs Outdated

Copilot AI reviewed May 17, 2026

View reviewed changes

Comment thread zstd/src/fse/fse_encoder.rs

sw-release-bot Bot reviewed May 17, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings May 17, 2026 11:53

Copilot started reviewing on behalf of polaz May 17, 2026 11:53 View session

Copilot AI reviewed May 17, 2026

View reviewed changes

Comment thread zstd/src/fse/mod.rs Outdated

Comment thread zstd/src/encoding/match_generator.rs Outdated

Comment thread zstd/Cargo.toml Outdated

polaz requested a review from Copilot May 17, 2026 12:09

Copilot started reviewing on behalf of polaz May 17, 2026 12:10 View session

Copilot AI reviewed May 17, 2026

View reviewed changes

coderabbitai Bot reviewed May 17, 2026

View reviewed changes

Comment thread zstd/src/encoding/match_generator.rs

polaz requested a review from Copilot May 17, 2026 12:30

Copilot started reviewing on behalf of polaz May 17, 2026 12:31 View session

coderabbitai Bot reviewed May 17, 2026

View reviewed changes

Comment thread zstd/src/encoding/match_generator.rs

Copilot AI reviewed May 17, 2026

View reviewed changes

Comment thread zstd/src/tests/fuzz_regressions.rs

polaz requested a review from Copilot May 17, 2026 12:59

Copilot started reviewing on behalf of polaz May 17, 2026 13:00 View session

Copilot AI reviewed May 17, 2026

View reviewed changes

Comment thread zstd/src/encoding/blocks/compressed.rs

polaz merged commit e148c1c into main May 17, 2026
27 checks passed

polaz deleted the perf/7pre-enablement branch May 17, 2026 13:35

polaz mentioned this pull request May 17, 2026

Roadmap: structured-zstd feature parity with C zstd #28

Open

sw-release-bot Bot mentioned this pull request May 16, 2026

chore: release v0.0.21 #138

Open

coderabbitai Bot mentioned this pull request May 17, 2026

fix(test): test_all_artifacts panics when fuzz corpus dir absent (regression from #148) #150

Closed

Conversation

polaz commented May 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Cumulative deltas (compress/level_3_dfast/decodecorpus-z000033, 1 022 035 B input)

Small-input speedup (FSE cache, all levels)

Per-scenario sanity at level_3_dfast (no regressions)

Commits

Phase 7 sub-phase plan (per #111)

Verification gates

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

codecov Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

sw-release-bot Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

⚠️ Performance Alert ⚠️

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

polaz commented May 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 17, 2026 •

edited

Loading

codecov Bot commented May 17, 2026 •

edited

Loading

sw-release-bot Bot left a comment •

edited

Loading