Skip to content

perf: cache FunctionId2 handles in BlazeSymbolizerWrapper#549

Open
r1viollet wants to merge 3 commits into
mainfrom
r1viollet/cache-blaze-function-ids
Open

perf: cache FunctionId2 handles in BlazeSymbolizerWrapper#549
r1viollet wants to merge 3 commits into
mainfrom
r1viollet/cache-blaze-function-ids

Conversation

@r1viollet
Copy link
Copy Markdown
Collaborator

@r1viollet r1viollet commented May 12, 2026

Problem

intern_function() (→ 3× ProfilesDictionary_insert_str) was called on every sample for every blaze-symbolized frame, driving ~128 ms/min of CPU in ProfilesDictionary_insert_str even on stable workloads.

The existing SymbolTable / symbol_idx cache covers frames resolved at unwind time (DSO/common/base/runtime lookups). Blaze frames are added with k_symbol_idx_null and a file_info_id; process_symbolization runs blaze at pprof-creation time but never stores the resulting FunctionId2 — so every sample for the same ELF address re-interns from scratch.

Solution: two-level FunctionId2 cache in BlazeSymbolizerWrapper

Level 1 — function identity (shared across all call sites of the same function):

  • function_id_cache[func_start]FunctionId2 for outer frames, keyed by blaze_sym.addr (function start address). All call sites within the same function share one dict handle — zero intern_function calls after the first visit to any call site.
  • function_id_cache[inlined_key(elf_addr, idx)]FunctionId2 for inlined frames (per call site, since inlined frames have no function start address).

Level 2 — call site (full hit for repeated exact addresses):

  • address_cache[elf_addr]{func_start, lines[]}. When the exact same ELF address recurs, write from the caches without calling blaze or touching the dict.

nosym_cacheMappingId2 → FunctionId2 for no-symbol frames: all frames from the same DSO share one (empty_name, sopath) function handle.

The caches live in BlazeSymbolizerWrapper (one per ELF file, keyed by FileInfoId_t). When a file is evicted by remove_unvisited(), both caches evict with it — no stale handles.

Observed results (collatz at 999 Hz, ~10 unique functions)

Cycle addr_misses intern_fn_calls addr_hits hit_rate
1 (cold) 2771 115 5.26 M 99.9 %
2+ (warm) ~300 0 ~6 M 100.0 %

Zero actual dict insertions in steady state.

Relationship to symbol_idx

symbol_idx is set at unwind time and covers DSO/common/base/runtime symbols. Blaze frames arrive with k_symbol_idx_null; updating symbol_idx retroactively in committed FunLoc entries is unsafe. This cache is a complementary layer at pprof-creation time. Longer term, feeding blaze results back into the unwind-time cache would remove the need for this layer entirely.

Metrics

Three new stats visible via --internal_stats / statsd:

  • symbols.blaze.intern_fn_calls — actual intern_function() calls per cycle
  • symbols.blaze.addr_misses — new ELF addresses not yet in address_cache
  • symbols.blaze.addr_hits — full cache hits (zero dict cost)

intern_function() was called on every sample for every blaze-symbolized
frame, driving ~128 ms/min of CPU in ProfilesDictionary_insert_str even
on stable workloads where the same addresses recur.

Two caches added to BlazeSymbolizerWrapper (per-ELF-file, same lifetime
as the symbolizer instance):

- function_cache: elf_addr → [{FunctionId2, line}]
  On first symbolization of an address, intern strings and populate.
  On cache hit, write the stored handles directly — zero dict calls.

- nosym_cache: MappingId2 → FunctionId2
  intern_function("", sopath) is identical for all frames of the same
  DSO. Cache the result on first call per mapping.

Co-authored-by: nsavoire <nsavoire@users.noreply.github.com>
@r1viollet r1viollet force-pushed the r1viollet/cache-blaze-function-ids branch from ede1254 to e5b9e5f Compare May 12, 2026 12:08
`intern_function()` (→ 3× `ProfilesDictionary_insert_str`) was called on
every sample for every blaze-symbolized frame, driving ~128 ms/min of CPU
in `ProfilesDictionary_insert_str` even on stable workloads.

The existing `SymbolTable` / `symbol_idx` cache covers frames resolved at
unwind time (DSO/common/base/runtime lookups). Blaze frames are added with
`k_symbol_idx_null` and a `file_info_id`; `process_symbolization` runs
blaze at pprof-creation time but never stores the resulting FunctionId2 —
so every sample for the same ELF address re-interns from scratch.

**Level 1 — function identity** (shared across all call sites):
- `function_id_cache[func_start]` → `FunctionId2` for outer frames,
  keyed by `blaze_sym.addr` (function start address). All call sites
  within the same function share one dict handle — zero `intern_function`
  calls after the first visit to any call site of that function.
- `function_id_cache[inlined_key(elf_addr, idx)]` → `FunctionId2` for
  inlined frames (per call site, since inlined frames have no start addr).

**Level 2 — call site** (fast full hit for repeated addresses):
- `address_cache[elf_addr]` → `{func_start, lines[]}`. When the exact
  same ELF address recurs in a later sample, write from the caches
  without calling blaze or touching the dict at all.

**nosym_cache** — `MappingId2 → FunctionId2` for no-symbol frames; all
frames from the same DSO share one "empty name, sopath" function handle.

| Cycle | addr_misses | **intern_fn_calls** | addr_hits | hit rate |
|-------|-------------|---------------------|-----------|----------|
| 1 (cold) | 2771 | 115 | 5.26 M | 99.9 % |
| 2+ (warm) | ~300 | **0** | ~6 M | 100.0 % |

Zero actual dict insertions in steady state.

Three new stats visible via `--internal_stats`:
- `symbols.blaze.intern_fn_calls` — actual `intern_function()` calls
- `symbols.blaze.addr_misses`    — new ELF addresses not yet in cache
- `symbols.blaze.addr_hits`      — full cache hits (zero dict cost)

`symbol_idx` is set at **unwind time** and covers DSO/common/base/runtime
symbols. Blaze frames arrive with `k_symbol_idx_null`; updating `symbol_idx`
retroactively in committed `FunLoc` entries is unsafe. This cache is a
complementary layer at pprof-creation time. Longer term, feeding blaze
results back into the unwind-time cache would remove the need for this layer.

Co-authored-by: nsavoire <nsavoire@users.noreply.github.com>
@r1viollet r1viollet force-pushed the r1viollet/cache-blaze-function-ids branch 2 times, most recently from ede1254 to b5528cf Compare May 13, 2026 16:29
Avoids relying on the assumption that ELF virtual addresses fit in 48 bits.
In theory a pair<ElfAddress_t, unsigned> could be packed into a uint64_t
(ELF vaddrs are well under 48 bits in practice on both aarch64 and x86_64),
but std::pair removes any architectural dependency.

Separates the two caches cleanly:
  function_id_cache:  ElfAddress_t (func_start) → FunctionId2  (outer frames)
  inlined_id_cache:   {elf_addr, inlined_idx}  → FunctionId2  (inlined frames)
@r1viollet r1viollet marked this pull request as ready for review May 13, 2026 16:40
@r1viollet r1viollet requested a review from nsavoire as a code owner May 13, 2026 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant