fix(router): exclude hard-cooldowned providers + DeepSeek V4 + Codex plan-tier by typelicious · Pull Request #226 · fusionAIze/faigate

typelicious · 2026-04-27T20:01:34Z

Target release

v2.5.0 — three layered router/provider fixes plus the DeepSeek V3 → V4 catalog migration that was already in flight.

Why this PR exists

Operator incident on 2026-04-26: an OpenCode/Codenomad chat session got stuck in compaction with Error: "{\"detail\":\"The 'gpt-5-codex' model is not supported when using Codex with a ChatGPT account.\"}". Diagnosing it surfaced three independent layered failures, all of which would have hit other operators eventually:

The Codex effective-model mapping unconditionally translated gpt-5.4 → gpt-5-codex, which chatgpt.com/backend-api/codex/responses accepts only for ChatGPT Pro accounts. Plus subscribers (the majority of operators) got a hard 400 on every Codex request.
The router's _select_policy_provider and _validate_health only checked provider_health.healthy. That flag flips back to True after a single successful health probe, so a structurally broken provider sitting at the top of prefer_providers got re-selected on every user request — the cooldown windows the adaptive RoutePressure tracker was setting were observability-only, never read at routing time. In the live incident openai-codex was re-selected 5+ times in a row across user requests despite all of them 400ing.
Once Codex was fully out of the way, the next provider (DeepSeek V4) introduced a new mandatory reasoning_content round-trip on assistant messages whenever thinking mode is active. OpenCode/Codenomad/generic openai-compat clients don't track that field, so every multi-turn request 400'd with "The reasoning_content in the thinking mode must be passed back to the API."

What changed

`faigate/providers.py`

_codex_effective_model is now plan-tier-aware. New _detect_codex_chatgpt_plan_tier() parses chatgpt_plan_type from the cached ~/.codex/auth.json id_token JWT once per process lifetime. gpt-5.4 maps to gpt-5-codex for Pro, gpt-5-codex-mini for Plus, and raw pass-through for unknown so the upstream rejects explicitly rather than us guessing wrong.
DeepSeek V4 thinking-mode safeguard in the openai-compat send-path. When any prior assistant message lacks reasoning_content, the request body sets thinking: {"type": "disabled"} (V4 expects an Anthropic-style ThinkingOptions struct — enable_thinking: false is silently ignored upstream, which I confirmed via direct probe). Single-turn reasoning still flows through unchanged. Provider/request extra_body overrides win because they merge in after the auto-set.

`faigate/router.py`

New _provider_in_hard_cooldown(name, ctx) helper consulted in:
- _provider_matches_policy — excludes cooldown'd providers from the policy candidate set so prefer_providers ordering can no longer mask them.
- _validate_health — adds a third reason ("primary in cooldown") that triggers the existing fallback-chain logic, and filters the fallback chain itself so we don't just route to a different broken provider.
Soft-degrade windows (transport-error, timeout) intentionally stay routable and continue to be handled via the additive adaptation_penalty in ranking.
Direct/explicit-model routing (model: "openai-codex" etc.) is unaffected by design — explicit caller intent overrides the adaptive demotion.

`config.yaml`

deepseek-chat → deepseek-v4-flash (workhorse, 1M ctx, $0.028/$0.14/$0.28 per 1M tokens) and deepseek-reasoner → deepseek-v4-pro (premium reasoning, 1M ctx, $0.145/$1.74/$3.48). Lane canonical models become deepseek/v4-flash and deepseek/v4-pro. All inbound references in prefer_providers, model_shortcuts, static_rules, and degrade chains have been swept.
opencode and coding profiles get a deny_providers list for the full openai-codex* family as defence-in-depth against future ChatGPT-side gating changes. Explicit invocation still works.

Tests (+24, 0 regressions)

New tests/test_router_cooldown.py — 5 tests covering policy exclusion, recovery after window expiry, and primary→fallback transition.
New tests/test_provider_safeguards.py — 14 tests covering plan-tier mapping (Pro/Plus/unknown), JWT parsing edge cases, and the DeepSeek thinking safeguard (single-turn, multi-turn with/without reasoning_content, non-deepseek lanes, extra_body override).
tests/test_providers.py — autouse fixture pinning the plan cache to pro so legacy gpt-5-codex assertions stay valid; plan-tier detection itself moved into the new dedicated suite.
tests/test_routing.py — fixture provider names swept V3 → V4.

Full pytest tests/ --ignore=tests/test_wizard.py -k "not benchmark" shows 446 passed, 16 failed, 6 deselected. All 16 failures pre-exist on main (11× ruamel.yaml missing in py3.14 env, 3× bundled assets/metadata/catalog.v1.json already V4-aware vs. older test fixtures). Tracked separately — see follow-up issues linked below.

Live verification

A Codenomad session that was producing HTTP 400 reasoning_content errors now routes via Route: deepseek-v4-flash [static/subagent] → HTTP 200 with no manual intervention.

Test plan

Unit: pytest tests/test_router_cooldown.py tests/test_provider_safeguards.py tests/test_routing.py tests/test_providers.py tests/test_adaptation.py — 64 passed.
Full suite (excluding test_wizard.py separately tracked): 446 passed, 16 pre-existing failures unchanged.
Live integration: Codenomad multi-turn session with Phase-1 v2.4.0 user config + Phase-2 router patch loaded → HTTP 200.
DeepSeek API probe to confirm thinking: {type: disabled} is the param V4 actually accepts (boolean form silently ignored).
Reviewer to spot-check the JWT parsing in _detect_codex_chatgpt_plan_tier() — base64 padding handling, defensive try/except returning "unknown" on any error.

Follow-ups (not in this PR)

Filed as separate issues:

Health-driven catalog refresh smartness — make the metadata loop emit deprecation/replacement edges, drift-detect via release-note polling, surface in Gate Bar.
Restore tests/test_wizard.py (V3→V4 fixture sweep, ~79 references).
Fix the 16 pre-existing test failures on main (ruamel.yaml dependency for py3.14 + bundled catalog vs. fixture drift).

🤖 Generated with Claude Code

…plan-tier Three layered router/provider fixes traced back to one operator incident on 2026-04-26 where a Codex 400 loop, then a DeepSeek V4 multi-turn 400, broke OpenCode/Codenomad sessions: - providers.py: `_codex_effective_model` now reads `chatgpt_plan_type` from the cached ~/.codex/auth.json id_token JWT once per process and routes `gpt-5.4` to the variant the account can actually use (`gpt-5-codex` for Pro, `gpt-5-codex-mini` for Plus, raw pass-through for unknown so the upstream rejects explicitly). Previously it always returned `gpt-5-codex`, which the chatgpt.com Codex backend rejects for ChatGPT Plus subscribers. - providers.py: openai-compat send-path adds a DeepSeek-V4-specific thinking-mode safeguard. V4 made `reasoning_content` round-trip on assistant messages mandatory whenever thinking is active; clients that don't track it (OpenCode, Codenomad, generic openai-compat SDKs) 400 on every multi-turn follow-up. When any prior assistant message lacks reasoning_content, the request body now sets `thinking: {"type": "disabled"}` (V4 expects an Anthropic-style ThinkingOptions struct, not a boolean — the legacy `enable_thinking: false` form is silently ignored upstream). Single-turn reasoning still flows through. Provider/request `extra_body` overrides win because they merge in after the auto-set. - router.py: new helper `_provider_in_hard_cooldown(name, ctx)` is consulted in both `_provider_matches_policy` (excludes from policy candidate set) and `_validate_health` (forces fallback when the primary is in cooldown, also filters fallback chain). The adaptive RoutePressure tracker already classified persistent failures (auth-invalid, quota-exhausted, model-unavailable, endpoint-mismatch, rate-limited) and surfaced them in /health, but the routing decision layer never read `request_blocked` — so a structurally broken provider sitting at the top of `prefer_providers` would be re-selected on every request for the entire cooldown window. Soft-degrade windows (transport-error, timeout) intentionally stay routable and continue to be handled via the additive `adaptation_penalty` in ranking. Direct/explicit-model routing (`model: "openai-codex"` etc.) is unaffected by design — explicit caller intent overrides the adaptive demotion. Plus the DeepSeek V3 → V4 catalog migration that was already in flight: - config.yaml: `deepseek-chat` → `deepseek-v4-flash` (workhorse, 1M ctx, $0.028/$0.14/$0.28 cache-hit/miss/output per 1M tokens) and `deepseek-reasoner` → `deepseek-v4-pro` (premium reasoning, 1M ctx, $0.145/$1.74/$3.48). Lane canonical models become `deepseek/v4-flash` and `deepseek/v4-pro`. All inbound references in `prefer_providers`, `model_shortcuts`, `static_rules`, and degrade chains have been swept. `opencode`/`coding` profiles get a `deny_providers` list for the full `openai-codex*` family as defence-in-depth against future ChatGPT-side gating changes. Tests - tests/test_router_cooldown.py covers policy exclusion + recovery + primary→fallback transition for hard-cooldown'd providers. - tests/test_provider_safeguards.py covers the plan-tier mapping (Pro/Plus/unknown), JWT parsing edge cases, and the DeepSeek thinking safeguard (single-turn, multi-turn with/without reasoning_content, non-deepseek lanes, extra_body override). - tests/test_providers.py gets an autouse fixture pinning the plan cache to `pro` so the legacy gpt-5-codex assertions stay valid. - tests/test_routing.py sweeps fixture provider names V3 → V4. Verified live against the running gateway: a Codenomad session that was producing HTTP 400 reasoning_content errors now routes via `deepseek-v4-flash [static/subagent]` with HTTP 200. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

- pyproject.toml: 2.4.0 → 2.5.0 - CHANGELOG: rename `v2.4.1 - 2026-04-27` section to `v2.5.0 - 2026-04-27` (the changes warrant a minor bump, not a patch — DeepSeek V3→V4 catalog rename is a config-breaking change for downstream operators with custom prefer_provider lists referring to the legacy IDs) - providers.py: ruff format collapse of multi-line `any()` predicate that ruff wanted as one line. No semantic change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

`scripts/faigate-release --dry-run` enforces version alignment between `pyproject.toml` and `faigate/__init__.py` to prevent the package metadata from drifting at release time. The previous commit only bumped pyproject.toml — this completes the alignment. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions Bot enabled auto-merge (squash) April 27, 2026 20:01

André Lange and others added 2 commits April 27, 2026 22:11

github-actions Bot merged commit 1f2ae89 into main Apr 27, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(router): exclude hard-cooldowned providers + DeepSeek V4 + Codex plan-tier#226

fix(router): exclude hard-cooldowned providers + DeepSeek V4 + Codex plan-tier#226
github-actions[bot] merged 3 commits intomainfrom
fix/codex-plan-tier-and-deepseek-v4

typelicious commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

typelicious commented Apr 27, 2026

Target release

Why this PR exists

What changed

faigate/providers.py

faigate/router.py

config.yaml

Tests (+24, 0 regressions)

Live verification

Test plan

Follow-ups (not in this PR)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`faigate/providers.py`

`faigate/router.py`

`config.yaml`