Skip to content

feat: implement cache tag invalidation#46

Open
jjohnson-hdb wants to merge 15 commits into
mainfrom
feature-cache-tags
Open

feat: implement cache tag invalidation#46
jjohnson-hdb wants to merge 15 commits into
mainfrom
feature-cache-tags

Conversation

@jjohnson-hdb
Copy link
Copy Markdown
Contributor

@jjohnson-hdb jjohnson-hdb commented May 7, 2026

Summary

  • The revalidateTag() method on the cache handler was a stub. It now persists invalidations to a new nextjs_cache_invalidation table and propagates across the cluster via a Harper subscription, so a tag invalidated on one node is observed by every node within milliseconds.
  • Soft-invalidation only — no background hard-purge. On get(), the handler returns null if any of an entry's tags has an invalidation timestamp newer than the entry's lastModified, which prompts Next.js to regenerate. The new write naturally restores "fresh" status by bumping lastModified..
  • Adds tags: [String] to nextjs_isr_cache and a new nextjs_cache_invalidation table (7-day expiration). Schema diff is in schema.graphql.
  • Wires up the previously unused next-16-caching fixture: corrects a broken cacheHandler reference (was .js, dist emits .cjs), uses withHarper(..., { experimentalHarperCache: true }) for path resolution, moves the previously-skipped ISR tests out of next-16.pw.ts into a dedicated next-16-caching.pw.ts, and adds a revalidateTag end-to-end test.
  • Documents the feature in the README — replaces the "Caching (Work In Progress)" stub with usage examples, the propagation model, and current limitations (revalidatePath() and group-based invalidation are not yet implemented).

Where to focus review

  • src/CacheHandler.cts — main implementation. The interesting bits are the module-level subscription init (idempotent, hydrates from current rows then subscribes with omitCurrent: true) and the isInvalidated check that combines ctx.revalidatedTags (per-request snapshot from Next.js) with the persistent map.
  • extractTags — handles three Next.js cache shapes: FETCH (tags on context), APP_PAGE/APP_ROUTE/PAGES (tags in the x-next-cache-tags header on the cached response), and a fallback to data.tags. Mirrors what FileSystemCache does internally.

Test plan

  • npm run build — TypeScript compiles clean
  • npm run test:integration -- integrationTests/next-16-caching.pw.ts
  • Verify v14, v15, v16 (non-caching) tests still pass — no changes to their code paths but worth confirming.

Files changed

  • schema.graphqltags: [String] on nextjs_isr_cache, new nextjs_cache_invalidation table
  • src/CacheHandler.cts — full implementation of get/set/revalidateTag
  • README.md — documents the feature
  • integrationTests/next-16.pw.ts — removed dead skipped block
  • integrationTests/next-16-caching.pw.ts — new test file with 4 tests
  • fixtures/next-16-caching/next.config.mjs — fixed broken cacheHandler path
  • fixtures/next-16-caching/app/tagged/page.js — new fixture page using unstable_cache
  • fixtures/next-16-caching/app/api/revalidate/route.js — new fixture route handler calling revalidateTag

jjohnson-hdb and others added 5 commits May 7, 2026 16:31
Replaces the stubbed `revalidateTag` with a soft-invalidation pattern
backed by a new `nextjs_cache_invalidation` table. Each tag → timestamp
entry is hydrated into an in-memory map on first construction and kept
fresh across workers via a Harper subscription, so any worker observes
invalidations from any other.

Adapts the pattern from aeo-page-cache, omitting the background
hard-purge: Next.js naturally overwrites stale rows on regeneration, so
hard-deleting buys little and would compete with rendering for I/O on
the same worker. Cold rows can be reclaimed by the table's expiration.

`get` checks both `ctx.revalidatedTags` (the per-request snapshot Next
passes the constructor) and the persistent map; if any tag predates the
record's `lastModified`, returns null so Next regenerates.

`set` extracts tags from `ctx.tags` for FETCH entries and from the
`x-next-cache-tags` header for APP_PAGE / APP_ROUTE / PAGES, storing
them as a CSV column.

Schema adds `tags: String` to `nextjs_isr_cache` and a new
`nextjs_cache_invalidation` table (7-day expiration).

Wires up the previously unused next-16-caching fixture: corrects the
`cacheHandler` reference (was looking for `.js`, dist emits `.cjs`),
moves the previously-skipped ISR tests into a dedicated test file
against this fixture, and adds a `revalidateTag` end-to-end test using
`unstable_cache` + a route handler.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CSV form was a holdover from the original aeo pattern, where the
hard-purge used a `contains` substring search on the column. With the
hard-purge dropped, tags are only ever read in-process, so a real array
is simpler.

`NextISRCache.tags` is now `[String]`. `set()` writes the array
directly; `get()` reads it directly. No `.join(',')` / `.split(',')`
churn, and no risk of comma-in-tag mishandling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Caching section was a stub flagged as WIP. Now that
`revalidateTag()` is fully implemented end-to-end, document:

- how to enable the handler
- how tags propagate across the cluster (subscription on the
  invalidation table)
- the soft-invalidation model — no hard-purge, natural overwrite
  on regeneration
- the two tables added to `harperfast_nextjs`
- current limitations (no `revalidatePath`, no group-based
  invalidation)

Keeps the `experimentalHarperCache` caveat since the contract may
still shift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Earlier I replaced the fixture's hand-rolled cacheHandler config with
`withHarper({}, { experimentalHarperCache: true })`. The motivation was
fixing a real bug — the path pointed at `CacheHandler.js`, but tsc
emits `CacheHandler.cjs` — but switching to withHarper was unnecessary.

The fixture was deliberately written to exercise the cacheHandler
config in isolation from withHarper's other behaviour, and listing
`@harperfast/nextjs` in serverExternalPackages was intentional.
withHarper does not add that entry. The commented turbopack.root note
was also a deliberate placeholder.

Reverts the file to its original shape, with the one-character
correction (`.js` → `.cjs`) that was actually needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jjohnson-hdb jjohnson-hdb marked this pull request as draft May 7, 2026 22:14
The Next.js cacheHandler module is loaded by Next.js itself (via
`require()` on the `cacheHandler` config path), not by Harper. With
turbopack, that load happens inside a build worker thread. Importing
`harper` at the top of CacheHandler.cts ran harper's module
initialization in that worker thread, which tried to register native
worker hooks process-wide — conflicting with the same registration
already done by the Harper main process. The result was a stream of
"Worker creator already registered" uncaught exceptions; the HTTP
worker kept restarting until Harper gave up
(`Thread has been restarted undefined times and will not be restarted`),
which manifested in tests as the fixture timing out before Harper
reached ready.

Switch to the same pattern `plugin.ts` uses: import `harper` for types
only, and read `databases` from `globalThis` at call time. The module
now loads cleanly in any context, and methods short-circuit (return
null / no-op) if `databases` isn't available (i.e. we're in a context
without Harper globals — which means there's nothing useful we could
do anyway).

Verified locally: every integration test file passes individually
(19/19 tests across all six fixtures including the four new caching
tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jjohnson-hdb jjohnson-hdb marked this pull request as ready for review May 14, 2026 17:11
jjohnson-hdb and others added 6 commits May 14, 2026 12:13
TEMPORARY — do not merge. Lets the customer install @harperfast/nextjs
directly from this branch via github:HarperFast/nextjs#feature-cache-tags
without needing a prepare script or a published npm release. Revert before
merging to main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The schema declared `data: String`, but the CacheHandler stores the
Next.js IncrementalCacheValue object ({ kind, html, rscData, headers, ... }).
Every put threw "in property data must be a string", leaving the cache
empty and every request a MISS — which broke all next-16-caching
integration tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@socket-security
Copy link
Copy Markdown

socket-security Bot commented May 14, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Updated@​harperfast/​integration-testing@​0.3.0 ⏵ 0.3.178 +1100100 +195 +1100

View full report

@Ethan-Arrowood
Copy link
Copy Markdown
Member

Why cacheHandler can't be set automatically by withHarper()

Turbopack resolves the cacheHandler config value as a raw filesystem path (via its internal FileSystemPath). When a library like withHarper() tries to set this path programmatically, the only options available in CJS are:

  • __dirname — always dereferences symlinks, producing the real path on disk (e.g., /Users/dev/nextjs/dist/CacheHandler.cjs). When the package is installed via npm link or file: protocol, this path is outside the project tree, and Turbopack panics with "leaves the filesystem root".
  • require.resolve — also dereferences symlinks, same problem.

Neither CJS nor ESM provides a way to resolve a module path without following symlinks. The only working approach is string concatenation (join(configDir, 'node_modules', '@harperfast', 'nextjs', 'dist', 'CacheHandler.cjs')), which preserves the logical path through node_modules. But this requires the caller's directory — which a library function doesn't have access to.

The fix: export a cacheHandlerPath() helper that the user calls from their config file, passing import.meta.dirname (ESM) or __dirname (CJS). This anchors the path to the config file's location, keeping it within Turbopack's filesystem root.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants