Skip to content

fix(vectorize): preserve chunks on embed failure + add i18n docs#54

Merged
techiejd merged 4 commits into
mainfrom
fix/vectorize-safety-and-i18n-docs
May 18, 2026
Merged

fix(vectorize): preserve chunks on embed failure + add i18n docs#54
techiejd merged 4 commits into
mainfrom
fix/vectorize-safety-and-i18n-docs

Conversation

@techiejd
Copy link
Copy Markdown
Owner

@techiejd techiejd commented May 18, 2026

Summary

Two independently-valuable changes split off from the parked scope-aware-chunk-identity brainstorming (see docs/plans/2026-05-13-vectorize-safety-and-localization-docs.md for the full story).

  • fix(vectorize) — reorder the vectorize task so deleteChunks runs after toKnowledgePool, validation, and the external embedding API succeed. Previously a transient rate-limit or network failure during embedding would silently wipe a doc's existing chunks until the next save. Now those failures leave the previous chunks intact for the next retry.
  • docs(readme) — add a "Localization (i18n)" section that surfaces the existing locale-aware embedding/search workflow as a first-class pattern (declare locale as a required extension field, iterate locales inside toKnowledgePool, filter at search time via the existing where filter). Adds a Features bullet, TOC entry, and a Roadmap "Help wanted" line for scope-aware chunk identity that links to the archived design spec.
  • docs(spec) — design docs for the scope-aware-chunk-identity exploration that ultimately got parked as YAGNI, plus the split-spec that produced this PR. Archived design is at docs/plans/archive/2026-05-10-scope-aware-chunk-identity.md.

No public API change. Patch bump warranted for the safety fix.

Test plan

  • New integration test dev/specs/vectorizeReorder.spec.ts — fails on the prior task ordering (asserts existing chunks survive an embed failure); passes after the reorder.
  • Full int suite passes locally (pnpm test:int): 27 files, 68 tests, no regressions.
  • Manual README review in GitHub preview to confirm the new Localization anchor resolves from the TOC and the Roadmap link points at the archived spec.
  • Changeset entry to be added before next release (patch bump for the vectorize safety fix).

techiejd added 4 commits May 18, 2026 18:20
Generalizes the locale-scoping problem so a single source doc can produce
multiple independent chunk-sets along any user-declared scope dimension
(locale, draft/published, tenant, etc.) without re-embedding one slice
wiping the others.
… docs

Park the scope-aware-chunk-identity design after concluding the locale case
(the dominant motivator) is already solvable with the existing extension
field + `where` pattern. The reorder benefits described in that spec have
independent value and are extracted into a new spec, alongside README
additions that surface the existing localization capability and add a
Roadmap signal for scope-aware identity.
Previously the vectorize task ran `deleteChunks` first, then
`toKnowledgePool`, validation, and embedding. Any failure in the
external embedding API (rate limit, network blip, malformed input)
would silently wipe a doc's chunks until the next save.

The destructive step now runs only after we have valid embeddings
ready to insert. Transient errors leave the previous chunks intact
for the next retry.

A residual gap remains between `deleteChunks` and the end of the
`storeChunk` `Promise.all` (partial-failure window); closing that
fully needs an adapter-level transaction and is out of scope here.
…roadmap line

Surfaces the existing locale-aware embedding/search capability as a
first-class workflow: declare locale as a required extension field,
iterate locales inside toKnowledgePool, filter at search time via the
existing where filter. Neutralizes competitor positioning that markets
locale-scoped search as a differentiator.

Roadmap "Help wanted" gains a scope-aware-chunk-identity entry pointing
at the archived design, framed as a market-research signal — issues
citing it surface real demand for the deferred feature.
@techiejd techiejd merged commit 3a8cc02 into main May 18, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant