Skip to content

fix(install): align tier name to registry canon + pass TIER to model-init + fail loud on unknown tier#1085

Open
joelteply wants to merge 26 commits into
canaryfrom
fix/install-tier-name-divergence
Open

fix(install): align tier name to registry canon + pass TIER to model-init + fail loud on unknown tier#1085
joelteply wants to merge 26 commits into
canaryfrom
fix/install-tier-name-divergence

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Lane A PR-1 — addresses RTX 5090 silent-no-replies install root cause

Per @continuum-8e97's 2026-05-11 RTX VDD finding: install seeded only voice-models, no Qwen GGUF, personas had no local provider to invoke, fail-hard rule (#1077) didn't fire because the silent skip happened pre-resolver in download-models.sh.

Root cause — two compounding bugs

(1) Tier-name divergence: install.sh sets CONTINUUM_TIER='primary' for 32GB+ Macs but src/shared/models.json + src/scripts/download-models.sh canon is 'full'. If 'primary' ever leaks to model-init's TIER env, the jq lookup auto_download.by_tier[primary] returns [] (silent), leaving install with always[] (voice/embedding/whisper/piper/kokoro/silero) only — no Qwen.

(2) Container /proc/meminfo blindspot: docker-compose.yml has mem_limit: ${MODEL_INIT_MEM:-2g} on model-init. download-models.sh:30 reads RAM from /proc/meminfo INSIDE the container. With cgroups-aware /proc/meminfo, that's the 2GB limit, NOT host RAM. Result: TIER auto-detects to mba regardless of host (RTX 5090 / 32GB+ Mac / 8GB MBA all see 2GB). Even when CONTINUUM_TIER isn't set externally, the in-container detection silently bottoms-out at the smallest tier.

Changes — 3 files, single-purpose, additive

  1. install.sh: rename CONTINUUM_TIER='primary''full' (single source of truth = src/shared/models.json tiers keys). Updates inline comment + case-stmt fallback default. Three textual occurrences of the legacy name converted; new comment block explaining the canon.

  2. docker-compose.yml: pass TIER=${CONTINUUM_TIER:-full} to model-init's env. Makes install.sh's hardware-tier choice flow through to the downloader. The :-full default guarantees headed installs (no install.sh) still pull the full multimodal Qwen set rather than bottoming-out at mba.

  3. src/scripts/download-models.sh: validate $TIER against {mba|mid|full} BEFORE the jq lookup. Unknown tier (e.g. residual 'primary' or any future divergence) errors loudly with the registry's actual valid set + the most likely cause. Per Joel's "no silent fallback to placeholder models" rule.

Validation

  • bash -n install.sh: syntax OK
  • bash -n src/scripts/download-models.sh: syntax OK
  • docker compose config --quiet: parses OK
  • Precommit hook (TypeScript build + browser ping): PASS

Out of scope (separate followups)

  • Lane B Docker volume/profile mechanics (@continuum-8e97 owns)
  • Verify refactor(persona): fail hard on missing model selection #1077 fail-hard fires when NO local model present at runtime (currently fires only when SOME model fails to resolve; silent skip pre-resolver may be a separate regression — Lane A PR-2 territory)
  • Linux install path doesn't set CONTINUUM_TIER; now defaults to full on non-Mac, which is right for Linux+RTX. MBA on Linux would need explicit env override — acceptable since the in-container /proc/meminfo bottom-out is now fail-loud rather than silent-mba

Cross-platform validation: continuum-8e97's RTX rerun should now either pull Qwen models (success) or fail loud at install with a tier-validation error (correct loud-fail). Either result confirms the silent-skip is gone.

🤖 Generated with Claude Code

joelteply added a commit that referenced this pull request May 11, 2026
…oad-avatar-models.sh (#1090)

Per the issue: third-party CDN failures (RTX install hit OpenGameArt curl
exit 11 = CURLE_FTP_WEIRD_PASS_REPLY on vroid-female-base.vrm) propagated
through `set -e` and exited the entire script, which made the model-init
container exit non-zero. Compounded with #1085 (tier-name canon) for the
"RTX install ships with no Qwen" symptom.

Fix shape per #1087's recommended Option A:
- Wrap each per-VRM curl/wget call in `set +e ... set -e` so a single
  download failure increments a FAILED counter instead of killing the
  script. The script-level `set -e` invariant is preserved everywhere
  else (jq, mkdir, mv, etc. still hard-fail on real bugs).
- Capture and log the actual curl exit code on each failure (Joel's
  "never swallow errors — evidence is for the debugger" rule). The
  warning includes the exit code, the failed name, and the source URL
  so the next debugger has everything they need.
- Run summary at the end emits a "DEGRADED" structured warning naming
  exactly which VRMs failed + the upstream cause (third-party CDN, not
  a Continuum bug) + the re-run command. Operator visibility, not
  silent suppression.
- Script unconditionally exits 0 — partial avatar set is acceptable
  (Bevy live mode degrades to whatever VRMs are present), and a
  third-party CDN blip should NOT block install. The summary above
  carries the diagnostic; downstream consumers see clean exit + warning.
- Bonus: replace hardcoded `8` with EXPECTED constant; quote tmpzip /
  tmpdir / vrm_file mktemp captures (shellcheck SC2155).

Smoke-tested locally: MODELS_DIR=/tmp/avatar-smoke-test bash -x
download-avatar-models.sh → all 8 VRMs downloaded successfully on host
with working CDN + exit 0. Failure path code is symmetric (set +e capture
exit, log, increment FAILED, continue) — same shape proven by the
existing per-file failure handling in download-models.sh:115-124.

Closes #1087.

Co-authored-by: Test <test@test.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joelteply added a commit that referenced this pull request May 11, 2026
…typed error (#1089)

Lane A PR-2 — surfaces install-time-no-Qwen as observable runtime health
rather than process panic. Pairs with #1085 (install fix for the SOURCE
of the no-Qwen state) by making the runtime VISIBILITY of "no local
model loadable" testable + integrable.

Background: continuum-8e97 RTX 5090 install (2026-05-11) had cuda stack
ready, VRAM available, zero personas replying — root cause was no Qwen
GGUF seeded. The existing `LlamaCppAdapter::new()` would have panicked
with the right message, but is constructed LAZILY (first generate_text
call). Personas silent-skip pre-resolver, so the panic was never reached.
Adapter never tried to load.

Changes:

- New typed error `NoLocalModelLoadable { provider_id, rows_in_registry,
  rows_with_gguf_local_path }` with thiserror Display naming the
  actionable remediation ("Install seeded no local Qwen GGUF — run
  model-init downloader or seed manually").

- New `LlamaCppAdapter::try_new() -> Result<Self, NoLocalModelLoadable>`:
  Result-returning variant. Boot-time health checks (continuum status,
  ai/status, install-time validators) MUST use this so an install with
  no Qwen seeded reports the typed error cleanly instead of crash-looping
  later when a persona attempts to invoke.

- New `LlamaCppAdapter::try_new_from<'a, I>(models: I)` pure variant
  taking a model iterator directly, mirroring my model_resolver.rs
  pattern. Lets tests assemble synthetic registries without going
  through the global() singleton. `try_new()` calls
  `try_new_from(global().models_for_provider("llamacpp-local"))`.

- Legacy `LlamaCppAdapter::new()` preserved (panics on err) — same
  observable behavior as before for callers that haven't migrated.

3 tests covering the contract:

- try_new_from_errors_when_no_llamacpp_local_rows: empty iterator →
  NoLocalModelLoadable with rows_in_registry=0, error message contains
  "model-init" remediation hint
- try_new_from_errors_when_llamacpp_rows_exist_but_none_have_gguf_path:
  registry has llamacpp-local rows but artifact resolver couldn't find
  any GGUF on disk → NoLocalModelLoadable with rows_in_registry=2,
  rows_with_gguf_local_path=0 (the RTX 5090 case Codex's #1085 +
  upstream model-init bug produces)
- try_new_from_succeeds_with_at_least_one_resolved_path: mixed registry
  (one resolved, one not) → adapter picks resolved row, model_path +
  default_model match

Validation:
- cargo test --features metal,accelerate -p continuum-core --lib
  inference::llamacpp_adapter: 3/3 pass

Out of scope (separate followups):
- Wire `try_new()` into a runtime boot health check (Lane A PR-3 or
  ai/status integration), surfaces the typed error to operators via
  jtag command output. PR-2 ships the primitive; integration is next.
- The artifact resolver behavior when explicit gguf path doesn't exist
  on disk — silently falls through to other resolvers (artifacts.rs:73).
  Worth a separate audit but doesn't change PR-2's contract.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added size: XL and removed size: M labels May 11, 2026
@joelteply joelteply force-pushed the fix/install-tier-name-divergence branch 7 times, most recently from 7331be6 to faa5827 Compare May 12, 2026 05:36
joelteply and others added 13 commits May 13, 2026 14:31
… fail loud on unknown tier

ROOT CAUSE for 2026-05-11 RTX 5090 silent-no-replies finding (continuum-8e97
VDD report): install seeded only voice-models, no Qwen GGUF, personas had
no local provider to invoke, fail-hard rule (#1077) didn't fire because the
silent skip happened pre-resolver in download-models.sh.

Two compounding bugs:

(1) Tier-name divergence — install.sh sets CONTINUUM_TIER='primary' for
32GB+ Macs but src/shared/models.json + src/scripts/download-models.sh
canon is 'full'. If 'primary' ever leaks to model-init's TIER env, the jq
lookup `auto_download.by_tier[primary]` returns [] (silent), leaving install
with always[] (voice/embedding/whisper/piper/kokoro/silero) only — no Qwen.

(2) Container /proc/meminfo blindspot — docker-compose has
`mem_limit: ${MODEL_INIT_MEM:-2g}` on model-init. download-models.sh:30 reads
RAM from /proc/meminfo INSIDE the container. With cgroups-aware /proc/meminfo,
that's the 2GB limit, NOT host RAM. Result: TIER auto-detects to `mba`
regardless of host (RTX 5090 / 32GB+ Mac / 8GB MBA all see 2GB). Even when
CONTINUUM_TIER isn't set externally, in-container detection silently
bottoms-out at the smallest tier.

Three changes — all single-purpose, additive (no semantic shifts elsewhere):

1. install.sh: rename CONTINUUM_TIER='primary' → 'full' (single source of
   truth = src/shared/models.json `tiers` keys). Updates inline comment +
   case-stmt fallback default. Three textual occurrences of the legacy
   name converted to the canonical name plus a note in the comment block
   explaining why.

2. docker-compose.yml: pass `TIER=${CONTINUUM_TIER:-full}` to model-init's
   env. Makes install.sh's hardware-tier choice flow through to the
   downloader instead of having the container guess from its own
   /proc/meminfo. The `:-full` default guarantees headed installs (no
   install.sh) still pull the full multimodal Qwen set rather than
   bottoming-out at mba.

3. src/scripts/download-models.sh: validate $TIER against {mba|mid|full}
   BEFORE the jq lookup. Unknown tier (e.g. residual 'primary' or any
   future divergence) errors loudly with the registry's actual valid
   set + the most likely cause. Per Joel's "no silent fallback to
   placeholder models" rule.

Validation:
- bash -n install.sh: syntax OK
- bash -n src/scripts/download-models.sh: syntax OK
- docker compose config --quiet: parses OK

Out of scope (separate followups):
- Lane B Docker volume/profile mechanics (continuum-8e97 owns)
- Verify #1077 fail-hard fires when NO local model present at runtime
- Linux install path doesn't set CONTINUUM_TIER; now defaults to `full`
  on non-Mac, which is right for Linux+RTX. MBA on Linux would need
  explicit env override — acceptable since the in-container
  /proc/meminfo bottom-out is now fail-loud rather than silent-mba

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joelteply joelteply force-pushed the fix/install-tier-name-divergence branch from 61bdeb4 to 7ac34d3 Compare May 13, 2026 19:39
joelteply pushed a commit that referenced this pull request May 14, 2026
…tion (#1120)

Symptom from #1120 (claude-tab-2 reported, validating PR #1085):
  npm test -- --runTestsByPath src/tests/unit/seed-install-tier.test.ts
fails before any test runs because src/scripts/test-with-server.ts
imports a non-existent './system-startup' module.

The canonical entry for npm-test mode lives at
src/system/core/SystemOrchestrator.ts as
SystemOrchestration.forTesting() — same factory used by the rest of
the testing path. Update the import + replace startSystem('npm-test')
with the canonical call. Loud-throw on failure so test runs surface
startup errors rather than silently mis-behaving.

Validation: npm run build:ts passes clean. Hooks ran without --no-verify.

Card: continuum#1120.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant