Skip to content

fix(router): exclude hard-cooldowned providers + DeepSeek V4 + Codex plan-tier#226

Merged
github-actions[bot] merged 3 commits intomainfrom
fix/codex-plan-tier-and-deepseek-v4
Apr 27, 2026
Merged

fix(router): exclude hard-cooldowned providers + DeepSeek V4 + Codex plan-tier#226
github-actions[bot] merged 3 commits intomainfrom
fix/codex-plan-tier-and-deepseek-v4

Conversation

@typelicious
Copy link
Copy Markdown
Collaborator

Target release

v2.5.0 — three layered router/provider fixes plus the DeepSeek V3 → V4 catalog migration that was already in flight.

Why this PR exists

Operator incident on 2026-04-26: an OpenCode/Codenomad chat session got stuck in compaction with Error: "{\"detail\":\"The 'gpt-5-codex' model is not supported when using Codex with a ChatGPT account.\"}". Diagnosing it surfaced three independent layered failures, all of which would have hit other operators eventually:

  1. The Codex effective-model mapping unconditionally translated gpt-5.4gpt-5-codex, which chatgpt.com/backend-api/codex/responses accepts only for ChatGPT Pro accounts. Plus subscribers (the majority of operators) got a hard 400 on every Codex request.
  2. The router's _select_policy_provider and _validate_health only checked provider_health.healthy. That flag flips back to True after a single successful health probe, so a structurally broken provider sitting at the top of prefer_providers got re-selected on every user request — the cooldown windows the adaptive RoutePressure tracker was setting were observability-only, never read at routing time. In the live incident openai-codex was re-selected 5+ times in a row across user requests despite all of them 400ing.
  3. Once Codex was fully out of the way, the next provider (DeepSeek V4) introduced a new mandatory reasoning_content round-trip on assistant messages whenever thinking mode is active. OpenCode/Codenomad/generic openai-compat clients don't track that field, so every multi-turn request 400'd with "The reasoning_content in the thinking mode must be passed back to the API."

What changed

faigate/providers.py

  • _codex_effective_model is now plan-tier-aware. New _detect_codex_chatgpt_plan_tier() parses chatgpt_plan_type from the cached ~/.codex/auth.json id_token JWT once per process lifetime. gpt-5.4 maps to gpt-5-codex for Pro, gpt-5-codex-mini for Plus, and raw pass-through for unknown so the upstream rejects explicitly rather than us guessing wrong.
  • DeepSeek V4 thinking-mode safeguard in the openai-compat send-path. When any prior assistant message lacks reasoning_content, the request body sets thinking: {"type": "disabled"} (V4 expects an Anthropic-style ThinkingOptions struct — enable_thinking: false is silently ignored upstream, which I confirmed via direct probe). Single-turn reasoning still flows through unchanged. Provider/request extra_body overrides win because they merge in after the auto-set.

faigate/router.py

  • New _provider_in_hard_cooldown(name, ctx) helper consulted in:
    • _provider_matches_policy — excludes cooldown'd providers from the policy candidate set so prefer_providers ordering can no longer mask them.
    • _validate_health — adds a third reason ("primary in cooldown") that triggers the existing fallback-chain logic, and filters the fallback chain itself so we don't just route to a different broken provider.
  • Soft-degrade windows (transport-error, timeout) intentionally stay routable and continue to be handled via the additive adaptation_penalty in ranking.
  • Direct/explicit-model routing (model: "openai-codex" etc.) is unaffected by design — explicit caller intent overrides the adaptive demotion.

config.yaml

  • deepseek-chatdeepseek-v4-flash (workhorse, 1M ctx, $0.028/$0.14/$0.28 per 1M tokens) and deepseek-reasonerdeepseek-v4-pro (premium reasoning, 1M ctx, $0.145/$1.74/$3.48). Lane canonical models become deepseek/v4-flash and deepseek/v4-pro. All inbound references in prefer_providers, model_shortcuts, static_rules, and degrade chains have been swept.
  • opencode and coding profiles get a deny_providers list for the full openai-codex* family as defence-in-depth against future ChatGPT-side gating changes. Explicit invocation still works.

Tests (+24, 0 regressions)

  • New tests/test_router_cooldown.py — 5 tests covering policy exclusion, recovery after window expiry, and primary→fallback transition.
  • New tests/test_provider_safeguards.py — 14 tests covering plan-tier mapping (Pro/Plus/unknown), JWT parsing edge cases, and the DeepSeek thinking safeguard (single-turn, multi-turn with/without reasoning_content, non-deepseek lanes, extra_body override).
  • tests/test_providers.py — autouse fixture pinning the plan cache to pro so legacy gpt-5-codex assertions stay valid; plan-tier detection itself moved into the new dedicated suite.
  • tests/test_routing.py — fixture provider names swept V3 → V4.

Full pytest tests/ --ignore=tests/test_wizard.py -k "not benchmark" shows 446 passed, 16 failed, 6 deselected. All 16 failures pre-exist on main (11× ruamel.yaml missing in py3.14 env, 3× bundled assets/metadata/catalog.v1.json already V4-aware vs. older test fixtures). Tracked separately — see follow-up issues linked below.

Live verification

A Codenomad session that was producing HTTP 400 reasoning_content errors now routes via Route: deepseek-v4-flash [static/subagent] → HTTP 200 with no manual intervention.

Test plan

  • Unit: pytest tests/test_router_cooldown.py tests/test_provider_safeguards.py tests/test_routing.py tests/test_providers.py tests/test_adaptation.py — 64 passed.
  • Full suite (excluding test_wizard.py separately tracked): 446 passed, 16 pre-existing failures unchanged.
  • Live integration: Codenomad multi-turn session with Phase-1 v2.4.0 user config + Phase-2 router patch loaded → HTTP 200.
  • DeepSeek API probe to confirm thinking: {type: disabled} is the param V4 actually accepts (boolean form silently ignored).
  • Reviewer to spot-check the JWT parsing in _detect_codex_chatgpt_plan_tier() — base64 padding handling, defensive try/except returning "unknown" on any error.

Follow-ups (not in this PR)

Filed as separate issues:

  • Health-driven catalog refresh smartness — make the metadata loop emit deprecation/replacement edges, drift-detect via release-note polling, surface in Gate Bar.
  • Restore tests/test_wizard.py (V3→V4 fixture sweep, ~79 references).
  • Fix the 16 pre-existing test failures on main (ruamel.yaml dependency for py3.14 + bundled catalog vs. fixture drift).

🤖 Generated with Claude Code

…plan-tier

Three layered router/provider fixes traced back to one operator incident
on 2026-04-26 where a Codex 400 loop, then a DeepSeek V4 multi-turn 400,
broke OpenCode/Codenomad sessions:

- providers.py: `_codex_effective_model` now reads `chatgpt_plan_type`
  from the cached ~/.codex/auth.json id_token JWT once per process and
  routes `gpt-5.4` to the variant the account can actually use
  (`gpt-5-codex` for Pro, `gpt-5-codex-mini` for Plus, raw pass-through
  for unknown so the upstream rejects explicitly). Previously it always
  returned `gpt-5-codex`, which the chatgpt.com Codex backend rejects
  for ChatGPT Plus subscribers.

- providers.py: openai-compat send-path adds a DeepSeek-V4-specific
  thinking-mode safeguard. V4 made `reasoning_content` round-trip on
  assistant messages mandatory whenever thinking is active; clients that
  don't track it (OpenCode, Codenomad, generic openai-compat SDKs) 400
  on every multi-turn follow-up. When any prior assistant message lacks
  reasoning_content, the request body now sets `thinking: {"type":
  "disabled"}` (V4 expects an Anthropic-style ThinkingOptions struct,
  not a boolean — the legacy `enable_thinking: false` form is silently
  ignored upstream). Single-turn reasoning still flows through.
  Provider/request `extra_body` overrides win because they merge in
  after the auto-set.

- router.py: new helper `_provider_in_hard_cooldown(name, ctx)` is
  consulted in both `_provider_matches_policy` (excludes from policy
  candidate set) and `_validate_health` (forces fallback when the
  primary is in cooldown, also filters fallback chain). The adaptive
  RoutePressure tracker already classified persistent failures
  (auth-invalid, quota-exhausted, model-unavailable, endpoint-mismatch,
  rate-limited) and surfaced them in /health, but the routing decision
  layer never read `request_blocked` — so a structurally broken provider
  sitting at the top of `prefer_providers` would be re-selected on every
  request for the entire cooldown window. Soft-degrade windows
  (transport-error, timeout) intentionally stay routable and continue
  to be handled via the additive `adaptation_penalty` in ranking.
  Direct/explicit-model routing (`model: "openai-codex"` etc.) is
  unaffected by design — explicit caller intent overrides the adaptive
  demotion.

Plus the DeepSeek V3 → V4 catalog migration that was already in flight:

- config.yaml: `deepseek-chat` → `deepseek-v4-flash` (workhorse, 1M ctx,
  $0.028/$0.14/$0.28 cache-hit/miss/output per 1M tokens) and
  `deepseek-reasoner` → `deepseek-v4-pro` (premium reasoning, 1M ctx,
  $0.145/$1.74/$3.48). Lane canonical models become `deepseek/v4-flash`
  and `deepseek/v4-pro`. All inbound references in `prefer_providers`,
  `model_shortcuts`, `static_rules`, and degrade chains have been swept.
  `opencode`/`coding` profiles get a `deny_providers` list for the full
  `openai-codex*` family as defence-in-depth against future ChatGPT-side
  gating changes.

Tests
- tests/test_router_cooldown.py covers policy exclusion + recovery +
  primary→fallback transition for hard-cooldown'd providers.
- tests/test_provider_safeguards.py covers the plan-tier mapping
  (Pro/Plus/unknown), JWT parsing edge cases, and the DeepSeek thinking
  safeguard (single-turn, multi-turn with/without reasoning_content,
  non-deepseek lanes, extra_body override).
- tests/test_providers.py gets an autouse fixture pinning the plan
  cache to `pro` so the legacy gpt-5-codex assertions stay valid.
- tests/test_routing.py sweeps fixture provider names V3 → V4.

Verified live against the running gateway: a Codenomad session that was
producing HTTP 400 reasoning_content errors now routes via
`deepseek-v4-flash [static/subagent]` with HTTP 200.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
André Lange and others added 2 commits April 27, 2026 22:11
- pyproject.toml: 2.4.0 → 2.5.0
- CHANGELOG: rename `v2.4.1 - 2026-04-27` section to `v2.5.0 - 2026-04-27`
  (the changes warrant a minor bump, not a patch — DeepSeek V3→V4 catalog
  rename is a config-breaking change for downstream operators with custom
  prefer_provider lists referring to the legacy IDs)
- providers.py: ruff format collapse of multi-line `any()` predicate that
  ruff wanted as one line. No semantic change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`scripts/faigate-release --dry-run` enforces version alignment between
`pyproject.toml` and `faigate/__init__.py` to prevent the package
metadata from drifting at release time. The previous commit only bumped
pyproject.toml — this completes the alignment.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions github-actions Bot merged commit 1f2ae89 into main Apr 27, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant