feat(ontology): seed NamespaceRegistry with bO-* upstream vocabs (PR #407 follow-up)#408
Conversation
… vocabs Companion to PR #407 (merged). Expands `NamespaceRegistry::seed_defaults` from 16 to 29 entries, registering the 13 external vocabularies that PR #407 added hydrators for. This is the O(1) IRI ↔ context_id matching table backed by `lance_cache.rs`'s Lance dataset; consumers like smb-office-rs and woa-rs lookup by namespace shortname instead of hand-rolling slot constants. Why this lives in lance-graph-ontology, not in OGIT: - Public OWL/RDF source files stay pristine in data/ontologies/ (DOLCE+DUL, FIBO-FND/BE, OWL-Time, PROV-O, QUDT, schema.org, SKOS, ZUGFeRD CII XSDs + Schematron). Modifying them taints downstream use. - The OGIT repo is authoritative for namespace registrations but adding new TTL files there with hand-picked contextIds would be drift. - The matching table belongs in the CLIENT (lance-graph-ontology), keyed by namespace shortname, persisted via the existing lance_cache layer. - Per user direction 2026-05-21: "expand always but drift is probably bad" + "deinterlace them locally and keep that matching table in a lance table for O(1) and check what lance-graph-ontology has in regards" → expansion lives here, OGIT untouched. Allocation: ID Namespace PR / Hydrator ───────────────────────────────────────────────────────── 0 SMB (pre-existing) 1 WorkOrder (pre-existing) 2 Healthcare (pre-existing) 3 Network (pre-existing) 4 EmailCorrespondance (pre-existing) 5 SharePoint (pre-existing) 10-19 Medical/<sub> (pre-existing, dense) 20 Foundation/DOLCE-DUL bO-1 hydrate_dolce 21 Foundation/OWL-Time bO-2 hydrate_owltime 22 Foundation/PROV-O bO-3 hydrate_provo 23 Foundation/QUDT bO-4 hydrate_qudt 24 Foundation/schema-org bO-8 hydrate_schemaorg 25 Foundation/SKOS bO-5 hydrate_skos 30 FinancialAccounting/FIBO-FND bO-6 hydrate_fibo_fnd 31 FinancialAccounting/FIBO-BE bO-7 hydrate_fibo_be 32 FinancialAccounting/ZUGFeRD bO-16 hydrate_zugferd 33 FinancialAccounting/ZUGFeRD-Rules bO-15 hydrate_zugferd_rules 34 FinancialAccounting/SKR03 bO-13 hydrate_skr03 35 FinancialAccounting/SKR04 bO-13 hydrate_skr04 36 FinancialAccounting/SKR03-Bau bO-13 hydrate_skr03_bau Allocation policy matches the existing Medical/<sub> pattern: dense within family-range, gaps between ranges left as expansion room. `allocate()` continues to fill gaps 6..=9 and 26..=29 first, then 37+. Notes: - `next_free_id` doc-comment updated to reflect the new seed layout. First dynamic id is now 6 (was already 6 in practice; the prior comment said "20" which was off by 14). - Three regression tests updated: * `seed_defaults_has_sixteen_entries` → `_has_twenty_nine_entries` * `seed_defaults_assigns_canonical_ids` adds spot-checks at 20/25/30/34/35/36 * `allocate_skips_to_first_unused_id` len assertion 16 → 29 - One integration test (`tests/context_id_test.rs`) updated to match. All 116 lance-graph-ontology tests pass; clippy clean (5 pre-existing oxrdf deprecation warnings, no new); downstream consumers (callcenter, consumer-conformance, cognitive-shader-driver) build clean.
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughThe pull request expands the namespace registry's pre-seeded namespace allocations from 16 to 29 entries by adding reserved ranges for ChangesExpanded Namespace Registry Seeding
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
…O(1) inheritance from family buckets Two user clarifications, folded into §4: 1. The OWL/DOLCE cross-walk table is not "interop crutches" — it is the SOURCE MATERIAL from which lance-graph-ontology constructs the OGIT/OWL/DOLCE mapping. The hydrators are the construction tool; the cross-walk standards (DOLCE+DUL, OWL-Time, PROV-O, QUDT, schema.org, FIBO, SKOS, Schematron, XSD, SKR DATEV, ZUGFeRD) are the bricks; OGIT canonical surface (per-family codebook + inherits-from DAG + edge whitelists at OGIT::*_V1 slots) is the synthesis. 2. The hydrators + inherits_from + per-family codebook + family- bucket dense array TOGETHER form the spine that makes schema / label / codebook inheritance O(1) cheap at lookup time. §4.1 rewritten as "the source material for the OGIT mapping" — direction-of-build diagram added showing how each external standard flows through its hydrator into its OGIT::*_V1 G-slot. The MetaAnchors fields (foundry_object_type, owl_upper_class, dolce_marker, wikidata_qid) are reframed as the runtime READ SURFACE over the constructed mapping, not as the populated target. DolceMarker enum naming open question (Endurant/ Perdurant vs Object/Event per canonical DUL rename) called out explicitly as a decision needed before D-UB-3. §4.3 lead-in reframed from "the cross-walk surface is now concrete" to "the construction tool that builds the OGIT mapping is now concrete in lance_graph_ontology::hydrators::*". PR #408 reference added (NamespaceRegistry::seed_defaults wires the corresponding G-slots at boot). §4.4 NEW — locks the O(1) inheritance property: - Schema inheritance: inherits_from chain flattened into FamilyEntry.axiom_blob at hydration; query-time cost is one masked u16 + one array index = O(1). Zero chain-walks at query time. - Label inheritance: rdfs:label per-locale collapsed during the subClassOf walk at hydration; FamilyEntry.label_* reads are O(1) array index. Zero parent lookup at query time. - Codebook inheritance: per-family centroid references parent codebook by u8 offset when content distributions overlap (with Jirak-aware Berry-Esseen bound per I-NOISE-FLOOR-JIRAK). One indirection max. - Why family buckets vs flat dict: ~5ns L1-cache-resident two array indices vs ~50-100ns hash + collision + cache miss = 20× cost gap. Co-design between construction tool (hydrators) and runtime substrate (family buckets) — neither earns the property alone. Concrete consumer-side payoffs spelled out for woa-bridge, medcare-bridge, smb-bridge: pre-baked schema / label / codebook inheritance means route handlers read one FamilyEntry per row identity; no OWL reasoning, no rdfs:label walk, no Schematron re-parse at query time.
Summary
Follow-up to merged PR #407. Adds the 13 new external vocabularies as
seeded entries in
NamespaceRegistry::seed_defaults()— the canonicalIRI ↔ context_id matching table backed by
lance_cache.rs's Lancedataset for O(1) lookup. Closes the gap between "the hydrator exists"
(PR #407) and "consumers can look up the namespace by shortname".
Why this lives in lance-graph-ontology, not in OGIT
User direction 2026-05-21:
Per those constraints:
data/ontologies/stay pristine (DOLCE+DUL, FIBO, OWL-Time, PROV-O, QUDT, schema.org,
SKOS, ZUGFeRD CII). Never modified — only parsed at hydrate time.
adding new TTL files there with hand-picked contextIds would
drift OGIT's existing dense Medical 10-19 allocation pattern.
keyed by namespace shortname, persisted via the existing
lance_cachelayer (already O(1)).
Allocation
Slots picked to extend the existing dense-within-family pattern
(Medical/ 10-19) without colliding:
hydrate_dolcehydrate_owltimehydrate_provohydrate_qudthydrate_schemaorghydrate_skoshydrate_fibo_fndhydrate_fibo_behydrate_zugferdhydrate_zugferd_ruleshydrate_skr03hydrate_skr04hydrate_skr03_bauallocate()continues to fill gaps6..=9and26..=29first, then37+— matching the existing Medical-as-dense-family pattern.Diff scope
crates/lance-graph-ontology/src/namespace_registry.rs:seed_defaultsadds 13 entriesnext_free_iddoc-comment updated (was claiming "20" as firstdynamic id; actual was 6 even before this PR)
_has_twenty_nine_entries,_assigns_canonical_ids) +allocate_skips_to_first_unused_idlen assertion
crates/lance-graph-ontology/tests/context_id_test.rs:_seed_defaults_assigns_canonical_v1_idsadds spot-checks at thenew IDs (20, 25, 30, 36)
_allocate_is_idempotent_and_denselen assertion 18 → 31Net: +62 / -11 lines, 2 files. No behavior change for existing
consumers (the 16 pre-existing entries keep their canonical IDs).
Test plan
count — 2 integration tests updated in-place rather than added)
no new ones)
lance-graph-callcenter,lance-graph-consumer-conformance,cognitive-shader-driverdata/ontologies/source files NOT modified — pristine upstreamCross-references
crates/lance-graph-ontology/src/namespace_registry.rsmodule docs +.claude/plans/ogit-cascade-supabase-callcenter-v1.md§ Pillar 1lance_cache.rsfor O(1) persistence:crates/lance-graph-ontology/src/lance_cache.rsGenerated by Claude Code
Summary by CodeRabbit