ENG-1699 - Implement batch upsert for concepts in Supabase#1000
ENG-1699 - Implement batch upsert for concepts in Supabase#1000
Conversation
- Introduced a new function `upsertConceptBatches` to handle the upsert of concepts in batches, improving efficiency and error handling. - Added utility functions for chunking arrays and summarizing failed upsert IDs. - Updated the `convertDgToSupabaseConcepts` function to utilize the new batching logic, ensuring better management of concept dependencies during synchronization. - Removed redundant chunking logic from `upsertNodesToSupabaseAsContentWithEmbeddings` for cleaner code. This change enhances the synchronization process with Supabase by allowing for more manageable data handling and clearer error reporting.
|
This pull request has been ignored for the connected project Preview Branches by Supabase. |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
📝 WalkthroughWalkthroughThis PR introduces batching support for concept and content upserts in the sync utility. It replaces single RPC calls with fixed-size batch operations via new helper functions, adds per-batch error reporting, and refactors the concept upsertion flow to include dependency ordering and batch-aware error handling. ChangesConcept and Content Upsert Batching
Sequence DiagramsequenceDiagram
participant Client as Sync Utility
participant Batcher as Batch Processor
participant RPC as Supabase RPC
rect rgba(100, 150, 255, 0.5)
Note over Client,RPC: Concept Upsertion (Batched)
Client->>Batcher: convertDgToSupabaseConcepts(nodes)
Batcher->>Batcher: orderConcepts (dependency resolution)
Batcher->>Batcher: chunk into CONCEPT_BATCH_SIZE
loop For each batch
Batcher->>RPC: upsert_concepts RPC call
RPC-->>Batcher: {success: [...], failed_ids: [...]}
Batcher->>Batcher: Collect per-batch failures
end
Batcher->>Batcher: summarizeFailedConceptUpsertIds
Batcher-->>Client: Aggregated failure report
end
rect rgba(100, 255, 150, 0.5)
Note over Client,RPC: Content Upsertion (Batched)
Client->>Batcher: uploadBatches(embeddings)
Batcher->>Batcher: chunk into BATCH_SIZE
loop For each batch
Batcher->>RPC: upsert_content RPC call
RPC-->>Batcher: Response
end
Batcher-->>Client: Batched upload complete
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
apps/roam/src/utils/syncDgNodesToSupabase.ts (2)
320-327: ⚡ Quick winThis warning will likely fire on healthy incremental syncs.
missinghere only means “not in the current batch,” andnodeTypeSince(...)intentionally excludes unchanged schema concepts. That makes warning-level logging noisy for blocks whose dependencies were already synced earlier. Consider downgrading this todebugor only warning after confirming the dependency is absent remotely.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/roam/src/utils/syncDgNodesToSupabase.ts` around lines 320 - 327, The current console.warn that logs when orderConceptsByDependency(conceptsToUpsert) returns missing is too noisy because missing only means "not in the current batch" (nodeTypeSince(...) excludes unchanged schema concepts); change the behavior to either call a debug-level logger instead of console.warn for the missing array, or conditionally promote to a warning only after verifying the dependency is actually absent in Supabase (e.g., query the remote store for the missing concept IDs before logging). Update the block that destructures const { ordered, missing } = orderConceptsByDependency(conceptsToUpsert) to use the chosen approach and keep the descriptive message about missing dependencies when logging.
56-65: ⚡ Quick winRename the sentinel-summary helper to match what the RPC returns.
These are failure codes, not concept IDs, so
summarizeFailedConceptUpsertIds/failedIdsreads misleadingly in the path you most need to debug quickly.As per coding guidelines, "Function names should describe their purpose clearly" and "Choose descriptive function names that make comments unnecessary".Proposed rename
-const summarizeFailedConceptUpsertIds = (failedIds: number[]): string => { - const counts = failedIds.reduce<Record<string, number>>((acc, id) => { - acc[id] = (acc[id] ?? 0) + 1; +const summarizeFailedConceptUpsertStatuses = ( + failedStatuses: number[], +): string => { + const counts = failedStatuses.reduce<Record<string, number>>((acc, status) => { + acc[status] = (acc[status] ?? 0) + 1; return acc; }, {});- const failedIds = (data || []).filter((id) => id < 0); - if (failedIds.length > 0) { + const failedStatuses = (data || []).filter((status) => status < 0); + if (failedStatuses.length > 0) { throw new Error( - `upsert_concepts returned row failures for batch ${idx + 1}/${batches.length}: ${summarizeFailedConceptUpsertIds(failedIds)}`, + `upsert_concepts returned row failures for batch ${idx + 1}/${batches.length}: ${summarizeFailedConceptUpsertStatuses(failedStatuses)}`, ); }Also applies to: 92-95
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/roam/src/utils/syncDgNodesToSupabase.ts` around lines 56 - 65, The helper summarizeFailedConceptUpsertIds is misnamed because it summarizes failure codes (sentinels) not concept IDs; rename the function to something like summarizeFailureCodes (and its parameter failedIds to failureCodes) and update all references (including the other occurrence around the code that handles lines 92-95) so the name and parameter clearly indicate they are failure/sentinel codes returned by the RPC; keep the same implementation and type (number[] -> number[]) but change identifiers and any inline comments to match the new names to avoid confusion when debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@apps/roam/src/utils/syncDgNodesToSupabase.ts`:
- Around line 320-327: The current console.warn that logs when
orderConceptsByDependency(conceptsToUpsert) returns missing is too noisy because
missing only means "not in the current batch" (nodeTypeSince(...) excludes
unchanged schema concepts); change the behavior to either call a debug-level
logger instead of console.warn for the missing array, or conditionally promote
to a warning only after verifying the dependency is actually absent in Supabase
(e.g., query the remote store for the missing concept IDs before logging).
Update the block that destructures const { ordered, missing } =
orderConceptsByDependency(conceptsToUpsert) to use the chosen approach and keep
the descriptive message about missing dependencies when logging.
- Around line 56-65: The helper summarizeFailedConceptUpsertIds is misnamed
because it summarizes failure codes (sentinels) not concept IDs; rename the
function to something like summarizeFailureCodes (and its parameter failedIds to
failureCodes) and update all references (including the other occurrence around
the code that handles lines 92-95) so the name and parameter clearly indicate
they are failure/sentinel codes returned by the RPC; keep the same
implementation and type (number[] -> number[]) but change identifiers and any
inline comments to match the new names to avoid confusion when debugging.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 29382a68-9bf8-44f9-8e46-1d7df9e88c0d
📒 Files selected for processing (1)
apps/roam/src/utils/syncDgNodesToSupabase.ts
maparent
left a comment
There was a problem hiding this comment.
Code LGTM. It fails globally if any batch fails, which I think is the correct behaviour so we can retry globally.
Tested and the sync completed.
*Written by codex 5.5 xhigh after long debugging session.
Summary
Fixes large first-sync failures by batching
upsert_conceptscalls during Roam Supabase sync.Previously, concept sync sent the entire concept payload in one
upsert_conceptsRPC. On larger graphs, this can exceed Supabase/Postgres statementtimeout, causing the whole concept transaction to roll back and leaving the graph with synced
Content/embeddings but zeroConceptrows.Changes
upsert_conceptswith a batch size of200.chunkhelper for sync batching.upsert_conceptssentinel return values (-1,-2) so row-level failures fail the sync with batch context.Why
A customer graph had ~6,700 synced content rows and embeddings, but
upsert_conceptsfailed with:Summary by CodeRabbit
New Features
Bug Fixes