fix(vectorize): preserve chunks on embed failure + add i18n docs#54
Merged
Conversation
Generalizes the locale-scoping problem so a single source doc can produce multiple independent chunk-sets along any user-declared scope dimension (locale, draft/published, tenant, etc.) without re-embedding one slice wiping the others.
… docs Park the scope-aware-chunk-identity design after concluding the locale case (the dominant motivator) is already solvable with the existing extension field + `where` pattern. The reorder benefits described in that spec have independent value and are extracted into a new spec, alongside README additions that surface the existing localization capability and add a Roadmap signal for scope-aware identity.
Previously the vectorize task ran `deleteChunks` first, then `toKnowledgePool`, validation, and embedding. Any failure in the external embedding API (rate limit, network blip, malformed input) would silently wipe a doc's chunks until the next save. The destructive step now runs only after we have valid embeddings ready to insert. Transient errors leave the previous chunks intact for the next retry. A residual gap remains between `deleteChunks` and the end of the `storeChunk` `Promise.all` (partial-failure window); closing that fully needs an adapter-level transaction and is out of scope here.
…roadmap line Surfaces the existing locale-aware embedding/search capability as a first-class workflow: declare locale as a required extension field, iterate locales inside toKnowledgePool, filter at search time via the existing where filter. Neutralizes competitor positioning that markets locale-scoped search as a differentiator. Roadmap "Help wanted" gains a scope-aware-chunk-identity entry pointing at the archived design, framed as a market-research signal — issues citing it surface real demand for the deferred feature.
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two independently-valuable changes split off from the parked scope-aware-chunk-identity brainstorming (see docs/plans/2026-05-13-vectorize-safety-and-localization-docs.md for the full story).
fix(vectorize)— reorder the vectorize task sodeleteChunksruns aftertoKnowledgePool, validation, and the external embedding API succeed. Previously a transient rate-limit or network failure during embedding would silently wipe a doc's existing chunks until the next save. Now those failures leave the previous chunks intact for the next retry.docs(readme)— add a "Localization (i18n)" section that surfaces the existing locale-aware embedding/search workflow as a first-class pattern (declarelocaleas a required extension field, iterate locales insidetoKnowledgePool, filter at search time via the existingwherefilter). Adds a Features bullet, TOC entry, and a Roadmap "Help wanted" line for scope-aware chunk identity that links to the archived design spec.docs(spec)— design docs for the scope-aware-chunk-identity exploration that ultimately got parked as YAGNI, plus the split-spec that produced this PR. Archived design is at docs/plans/archive/2026-05-10-scope-aware-chunk-identity.md.No public API change. Patch bump warranted for the safety fix.
Test plan
pnpm test:int): 27 files, 68 tests, no regressions.