Skip to content

[v1.x] Drop responses/notifications when write stream is already closed#2502

Open
bobbyo wants to merge 2 commits intomodelcontextprotocol:v1.xfrom
bobbyo:fix/closed-stream-send-drop-v1x
Open

[v1.x] Drop responses/notifications when write stream is already closed#2502
bobbyo wants to merge 2 commits intomodelcontextprotocol:v1.xfrom
bobbyo:fix/closed-stream-send-drop-v1x

Conversation

@bobbyo
Copy link
Copy Markdown

@bobbyo bobbyo commented Apr 24, 2026

Audience / Scope / Status

If you are… Read this section Why
reviewing the runtime fix Summary, Topology, Invariants, Changes shows why the catch belongs in BaseSession, not only in outer server layers
reviewing the test/CI follow-up Validation explains the coverage and pyright failures on the first push
looking for broader transport redesign Non-goals this PR intentionally hardens closed-write semantics only

Last verified against: v1.x on 2026-04-24

Relevant upstream issues

Summary

This PR makes BaseSession.send_notification() and BaseSession._send_response() treat writes to an already-closed transport as expected no-ops rather than exceptions.

The goal is narrow: if the peer has already gone away, a late response or notification should be dropped cleanly at the session send boundary.

Failure topology

flowchart LR
  H["Request handler\n`mcp.server.lowlevel.server`"]:::server
  R["RequestResponder.respond()\n`src/mcp/shared/session.py:121`"]:::fyr
  S["BaseSession._send_response() / send_notification()\n`src/mcp/shared/session.py:318-357`"]:::fyr
  W["write_stream.send(SessionMessage)\nanyio memory/object stream"]:::transport
  C["Client transport\nstream already closed"]:::client

  H -- "handler result / late progress" --> R
  R -- "JSON-RPC response" --> S
  S -- "SessionMessage write" --> W
  W -. "peer disconnected earlier" .-> C

  classDef client fill:#bfdbfe,stroke:#1e3a8a,stroke-width:2px,color:#0f172a
  classDef transport fill:#e5e7eb,stroke:#6b7280,stroke-width:1px,color:#374151
  classDef server fill:#dcfce7,stroke:#166534,stroke-width:2px,color:#0f172a
  classDef fyr fill:#ddd6fe,stroke:#5b21b6,stroke-width:2px,color:#0f172a
Loading

Why the catch belongs here

Layer Existing behavior Gap
mcp.server.lowlevel.server already has an outer closed-transport catch around await message.respond(response) at src/mcp/server/lowlevel/server.py:799-809 only covers the RequestResponder path that reaches this exact outer frame
BaseSession.send_notification() previously wrote directly to _write_stream.send(...) late notifications could still bubble ClosedResourceError / BrokenResourceError
BaseSession._send_response() previously wrote directly to _write_stream.send(...) any caller below the outer server frame still relied on higher layers to catch transport-close races

Invariants

  • INV-1: A closed peer transport is not a recoverable write target, but it is also not a server bug. Late writes should be dropped, not escalated.
  • INV-2: The session send boundary is the narrowest shared seam that sees both responses and notifications. If violated: one call site gets hardened while sibling send paths still leak the same failure.
  • INV-3: Higher-layer transport-close catches may remain for defense in depth. If violated: removing all outer guards in the same PR would expand scope from behavior hardening into control-flow refactoring.

Changes

  • Catch anyio.BrokenResourceError / anyio.ClosedResourceError in BaseSession.send_notification() at src/mcp/shared/session.py:318-339.
  • Catch anyio.BrokenResourceError / anyio.ClosedResourceError in BaseSession._send_response() at src/mcp/shared/session.py:343-357.
  • Add session-layer regression tests for closed and open write streams at tests/server/test_session.py:223-282.
  • Add a low-level regression test that exercises the existing outer server catch at tests/server/test_lowlevel_exception_handling.py:78-96 and src/mcp/server/lowlevel/server.py:799-809.

Validation

Functional

PYTHONPATH=src python3 -m pytest -o addopts='' \
  tests/server/test_session.py \
  tests/server/test_lowlevel_exception_handling.py -q

Local result after the CI-fix follow-up:

  • 17 passed

CI follow-up

The first push exposed two CI-only problems, not a runtime regression:

Check Initial failure Follow-up in this PR
test matrix coverage dropped because the new session-layer catch made the old outer catch in lowlevel/server.py unexercised added test_handle_request_drops_response_when_transport_is_closed() to cover the existing outer branch
pre-commit strict pyright rejected loosely typed fake session/write-stream helpers tightened annotations and cast sites in tests/server/test_session.py

Targeted local follow-up verification:

/tmp/python-sdk-upstream/.venv-ci/bin/pyright \
  tests/server/test_session.py \
  tests/server/test_lowlevel_exception_handling.py

Local result:

  • 0 errors, 0 warnings, 0 informations

Non-goals

  • This PR does not redesign cancellation or stateless request lifecycle.
  • This PR does not remove the outer guard in lowlevel/server.py; it keeps defense in depth.
  • This PR does not claim to solve every disconnect race in the stack. It narrows one shared failure seam so downstream servers stop surfacing expected closed-write conditions as crashes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant