Skip to content

Snapshot benchmark#8

Closed
kzajac-opera wants to merge 8 commits into
mainfrom
snapshot-benchmark
Closed

Snapshot benchmark#8
kzajac-opera wants to merge 8 commits into
mainfrom
snapshot-benchmark

Conversation

@kzajac-opera
Copy link
Copy Markdown

No description provided.

mugorski and others added 8 commits May 19, 2026 13:06
- compactSnapshot(): drop noise nodes, normalise PascalCase roles,
  convert headings to markdown, rewrite refs to @PAGE.ELEM dot form
  (better BPE tokenisation than uid=X_Y), strip ARIA default attributes
- Collapse consecutive same-indent text siblings into one line; drop
  the merged line when it echoes the parent label
- cleanUrl(): drop javascript:/data: URLs, same-origin → relative paths,
  strip generic cross-site tracking params (utm_*, gclid/fbclid family);
  removed Amazon-specific params (ie, _encoding, ref_, pd_rd_, pf_rd_)
- applyUrlLut(): dedup repeated URLs and hide whale URLs (≥200 chars)
  behind $uN tokens; full values printed in urls: footer trailer
- Compact truncation limit lowered 16k → 12k chars (raw keeps 16k);
  compaction savings recover the headroom
- --raw flag on all snapshot commands to bypass compaction
- `url <$uN|@ref>` command to resolve LUT tokens and element refs
- Test fixture (test/fixtures/elements.html) covering all major element types
- Task benchmark prompts in test/tasks/ for compact vs raw cost comparison
@kzajac-opera kzajac-opera deleted the snapshot-benchmark branch May 19, 2026 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants