fix(router): exclude hard-cooldowned providers + DeepSeek V4 + Codex plan-tier#226
Merged
github-actions[bot] merged 3 commits intomainfrom Apr 27, 2026
Merged
Conversation
…plan-tier
Three layered router/provider fixes traced back to one operator incident
on 2026-04-26 where a Codex 400 loop, then a DeepSeek V4 multi-turn 400,
broke OpenCode/Codenomad sessions:
- providers.py: `_codex_effective_model` now reads `chatgpt_plan_type`
from the cached ~/.codex/auth.json id_token JWT once per process and
routes `gpt-5.4` to the variant the account can actually use
(`gpt-5-codex` for Pro, `gpt-5-codex-mini` for Plus, raw pass-through
for unknown so the upstream rejects explicitly). Previously it always
returned `gpt-5-codex`, which the chatgpt.com Codex backend rejects
for ChatGPT Plus subscribers.
- providers.py: openai-compat send-path adds a DeepSeek-V4-specific
thinking-mode safeguard. V4 made `reasoning_content` round-trip on
assistant messages mandatory whenever thinking is active; clients that
don't track it (OpenCode, Codenomad, generic openai-compat SDKs) 400
on every multi-turn follow-up. When any prior assistant message lacks
reasoning_content, the request body now sets `thinking: {"type":
"disabled"}` (V4 expects an Anthropic-style ThinkingOptions struct,
not a boolean — the legacy `enable_thinking: false` form is silently
ignored upstream). Single-turn reasoning still flows through.
Provider/request `extra_body` overrides win because they merge in
after the auto-set.
- router.py: new helper `_provider_in_hard_cooldown(name, ctx)` is
consulted in both `_provider_matches_policy` (excludes from policy
candidate set) and `_validate_health` (forces fallback when the
primary is in cooldown, also filters fallback chain). The adaptive
RoutePressure tracker already classified persistent failures
(auth-invalid, quota-exhausted, model-unavailable, endpoint-mismatch,
rate-limited) and surfaced them in /health, but the routing decision
layer never read `request_blocked` — so a structurally broken provider
sitting at the top of `prefer_providers` would be re-selected on every
request for the entire cooldown window. Soft-degrade windows
(transport-error, timeout) intentionally stay routable and continue
to be handled via the additive `adaptation_penalty` in ranking.
Direct/explicit-model routing (`model: "openai-codex"` etc.) is
unaffected by design — explicit caller intent overrides the adaptive
demotion.
Plus the DeepSeek V3 → V4 catalog migration that was already in flight:
- config.yaml: `deepseek-chat` → `deepseek-v4-flash` (workhorse, 1M ctx,
$0.028/$0.14/$0.28 cache-hit/miss/output per 1M tokens) and
`deepseek-reasoner` → `deepseek-v4-pro` (premium reasoning, 1M ctx,
$0.145/$1.74/$3.48). Lane canonical models become `deepseek/v4-flash`
and `deepseek/v4-pro`. All inbound references in `prefer_providers`,
`model_shortcuts`, `static_rules`, and degrade chains have been swept.
`opencode`/`coding` profiles get a `deny_providers` list for the full
`openai-codex*` family as defence-in-depth against future ChatGPT-side
gating changes.
Tests
- tests/test_router_cooldown.py covers policy exclusion + recovery +
primary→fallback transition for hard-cooldown'd providers.
- tests/test_provider_safeguards.py covers the plan-tier mapping
(Pro/Plus/unknown), JWT parsing edge cases, and the DeepSeek thinking
safeguard (single-turn, multi-turn with/without reasoning_content,
non-deepseek lanes, extra_body override).
- tests/test_providers.py gets an autouse fixture pinning the plan
cache to `pro` so the legacy gpt-5-codex assertions stay valid.
- tests/test_routing.py sweeps fixture provider names V3 → V4.
Verified live against the running gateway: a Codenomad session that was
producing HTTP 400 reasoning_content errors now routes via
`deepseek-v4-flash [static/subagent]` with HTTP 200.
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced Apr 27, 2026
- pyproject.toml: 2.4.0 → 2.5.0 - CHANGELOG: rename `v2.4.1 - 2026-04-27` section to `v2.5.0 - 2026-04-27` (the changes warrant a minor bump, not a patch — DeepSeek V3→V4 catalog rename is a config-breaking change for downstream operators with custom prefer_provider lists referring to the legacy IDs) - providers.py: ruff format collapse of multi-line `any()` predicate that ruff wanted as one line. No semantic change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`scripts/faigate-release --dry-run` enforces version alignment between `pyproject.toml` and `faigate/__init__.py` to prevent the package metadata from drifting at release time. The previous commit only bumped pyproject.toml — this completes the alignment. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Target release
v2.5.0 — three layered router/provider fixes plus the DeepSeek V3 → V4 catalog migration that was already in flight.
Why this PR exists
Operator incident on 2026-04-26: an OpenCode/Codenomad chat session got stuck in compaction with
Error: "{\"detail\":\"The 'gpt-5-codex' model is not supported when using Codex with a ChatGPT account.\"}". Diagnosing it surfaced three independent layered failures, all of which would have hit other operators eventually:gpt-5.4→gpt-5-codex, whichchatgpt.com/backend-api/codex/responsesaccepts only for ChatGPT Pro accounts. Plus subscribers (the majority of operators) got a hard 400 on every Codex request._select_policy_providerand_validate_healthonly checkedprovider_health.healthy. That flag flips back toTrueafter a single successful health probe, so a structurally broken provider sitting at the top ofprefer_providersgot re-selected on every user request — the cooldown windows the adaptiveRoutePressuretracker was setting were observability-only, never read at routing time. In the live incidentopenai-codexwas re-selected 5+ times in a row across user requests despite all of them 400ing.reasoning_contentround-trip on assistant messages whenever thinking mode is active. OpenCode/Codenomad/generic openai-compat clients don't track that field, so every multi-turn request 400'd with"The reasoning_content in the thinking mode must be passed back to the API."What changed
faigate/providers.py_codex_effective_modelis now plan-tier-aware. New_detect_codex_chatgpt_plan_tier()parseschatgpt_plan_typefrom the cached~/.codex/auth.jsonid_token JWT once per process lifetime.gpt-5.4maps togpt-5-codexfor Pro,gpt-5-codex-minifor Plus, and raw pass-through for unknown so the upstream rejects explicitly rather than us guessing wrong.reasoning_content, the request body setsthinking: {"type": "disabled"}(V4 expects an Anthropic-styleThinkingOptionsstruct —enable_thinking: falseis silently ignored upstream, which I confirmed via direct probe). Single-turn reasoning still flows through unchanged. Provider/requestextra_bodyoverrides win because they merge in after the auto-set.faigate/router.py_provider_in_hard_cooldown(name, ctx)helper consulted in:_provider_matches_policy— excludes cooldown'd providers from the policy candidate set soprefer_providersordering can no longer mask them._validate_health— adds a third reason ("primary in cooldown") that triggers the existing fallback-chain logic, and filters the fallback chain itself so we don't just route to a different broken provider.transport-error,timeout) intentionally stay routable and continue to be handled via the additiveadaptation_penaltyin ranking.model: "openai-codex"etc.) is unaffected by design — explicit caller intent overrides the adaptive demotion.config.yamldeepseek-chat→deepseek-v4-flash(workhorse, 1M ctx, $0.028/$0.14/$0.28 per 1M tokens) anddeepseek-reasoner→deepseek-v4-pro(premium reasoning, 1M ctx, $0.145/$1.74/$3.48). Lane canonical models becomedeepseek/v4-flashanddeepseek/v4-pro. All inbound references inprefer_providers,model_shortcuts,static_rules, and degrade chains have been swept.opencodeandcodingprofiles get adeny_providerslist for the fullopenai-codex*family as defence-in-depth against future ChatGPT-side gating changes. Explicit invocation still works.Tests (+24, 0 regressions)
tests/test_router_cooldown.py— 5 tests covering policy exclusion, recovery after window expiry, and primary→fallback transition.tests/test_provider_safeguards.py— 14 tests covering plan-tier mapping (Pro/Plus/unknown), JWT parsing edge cases, and the DeepSeek thinking safeguard (single-turn, multi-turn with/without reasoning_content, non-deepseek lanes, extra_body override).tests/test_providers.py— autouse fixture pinning the plan cache toproso legacygpt-5-codexassertions stay valid; plan-tier detection itself moved into the new dedicated suite.tests/test_routing.py— fixture provider names swept V3 → V4.Full
pytest tests/ --ignore=tests/test_wizard.py -k "not benchmark"shows 446 passed, 16 failed, 6 deselected. All 16 failures pre-exist onmain(11×ruamel.yamlmissing in py3.14 env, 3× bundledassets/metadata/catalog.v1.jsonalready V4-aware vs. older test fixtures). Tracked separately — see follow-up issues linked below.Live verification
A Codenomad session that was producing HTTP 400 reasoning_content errors now routes via
Route: deepseek-v4-flash [static/subagent] → HTTP 200with no manual intervention.Test plan
pytest tests/test_router_cooldown.py tests/test_provider_safeguards.py tests/test_routing.py tests/test_providers.py tests/test_adaptation.py— 64 passed.test_wizard.pyseparately tracked): 446 passed, 16 pre-existing failures unchanged.thinking: {type: disabled}is the param V4 actually accepts (boolean form silently ignored)._detect_codex_chatgpt_plan_tier()— base64 padding handling, defensivetry/exceptreturning"unknown"on any error.Follow-ups (not in this PR)
Filed as separate issues:
tests/test_wizard.py(V3→V4 fixture sweep, ~79 references).main(ruamel.yaml dependency for py3.14 + bundled catalog vs. fixture drift).🤖 Generated with Claude Code