Skip to content

[2/n] use a slab-based impl for ItemSet#243

Open
sunshowers wants to merge 4 commits intosunshowers/spr/main.2n-use-a-slab-based-impl-for-itemsetfrom
sunshowers/spr/2n-use-a-slab-based-impl-for-itemset
Open

[2/n] use a slab-based impl for ItemSet#243
sunshowers wants to merge 4 commits intosunshowers/spr/main.2n-use-a-slab-based-impl-for-itemsetfrom
sunshowers/spr/2n-use-a-slab-based-impl-for-itemset

Conversation

@sunshowers
Copy link
Copy Markdown
Collaborator

The motivation is that a vec is significantly more efficient than a hash table
at storing compact and compact-ish integer keys, as the benchmark data below
shows.

We use a slab-based implementation, i.e. Vec<SlabEntry<T>>, where
SlabEntry<T> is an enum with Occupied(T) and Vacant { next: ItemIndex }
variants. The Vacant entries form an embedded free chain, so that
inserts reuse vacant slots without any secondary allocation.

We also change shrink_to_fit to compact the slab and rewrite internal
indexes.

Delta vs. the HashMap-backed ItemSet from the previous commit
(criterion median percent change; −N% is faster, +N% is slower):

get (point lookup, 32 B records):
    size  id_hash_own  id_hash_brw   id_ord_own   id_ord_brw
       1       −25.3%       −24.4%     +0.8% ⁎      +10.0%
      10       −40.4%       −40.8%       −13.5%       −10.0%
     100       −49.5%       −50.9%       −44.0%       −45.9%
      1k       −57.5%       −56.3%       −68.6%       −63.5%
     10k       −48.9%       −69.7%       −34.1%       −37.6%
     50k       −59.6%       −59.5%       −56.6%       −50.3%
    100k       −46.5%       −40.4%       −65.2%       −67.5%
    500k       +13.3%       +16.4%       −44.8%       −46.0%
      1M        −9.9%       −12.0%        −7.5%       +14.0%

  ⁎ p > 0.05, no statistically significant change.

get_large (point lookup, 1 KiB records):
    size  id_hash_own  id_hash_brw   id_ord_own   id_ord_brw
      1k       −45.7%       −45.3%       −44.4%       −46.1%
     10k       −41.6%       −49.0%       −33.5%       −20.8%
    100k      +7.5% ⁎       −32.2%        −8.0%       +12.9%
      1M       −19.7%       −68.2%       −14.5%        −7.4%

bulk_insert:
  size   id_hash    id_ord
   100    −29.2%    −25.3%
   10k    −30.8%    −29.6%
  100k    −33.3%    −27.4%

bulk_insert_large (1 KiB payload):
  size   id_hash    id_ord
    1k    −37.1%    −24.6%
   10k    −47.5%    −37.4%
  100k    −26.1%    −28.3%
    1M    −52.3%    −52.7%

churn (remove + reinsert at steady state, 1000 ops/iter):
  size   id_hash    id_ord
   100    −22.1%    −16.1%
   10k    −32.7%    −23.1%
  100k    −38.0%    −25.3%

iter (full traversal, 32 B records):
  size   id_hash    id_ord
   100    −36.6%     −9.8%
   10k    −34.9%    −24.0%
  100k    −39.2%    −39.8%

iter_large (full traversal, 1 KiB records):
  size   id_hash    id_ord
    1k    −32.1%    −16.4%
   10k    −38.7%    −52.4%
  100k    −23.2%    −61.9%
    1M    −33.6%    −67.2%

shrink_to_fit (50% holes, then compact):
  size   id_hash    id_ord
   100    −13.3%    −16.6%
   10k     −8.2%    −12.4%
  100k    −18.0%     +5.4%

ref_mut/id_ord_map: +5.4%.

Overall, this is a pretty nice win throughout! There are a few anomalies at large sizes that are best explained by L1 cache overflow, but in general we're up to 3x as fast as before, and significantly more competitive with the std maps, including being faster at a few things like bulk insert and iteration. (As a bonus, we also get to drop the rustc-hash dependency.)

Created using spr 1.3.6-beta.1
@sunshowers sunshowers mentioned this pull request Apr 28, 2026
2 tasks
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant