Skip to content

Azure: Improve retrying mechanism#194

Open
JAVGan wants to merge 2 commits intomainfrom
SPSTRAT-725
Open

Azure: Improve retrying mechanism#194
JAVGan wants to merge 2 commits intomainfrom
SPSTRAT-725

Conversation

@JAVGan
Copy link
Copy Markdown
Collaborator

@JAVGan JAVGan commented Apr 27, 2026

Changes

1 - Increase the HTTPError retry range

Start retrying HTTPError exceptions from code 408 instead of 500 as it was before.

With that we hope to catch missing relevant statues in which is worth retrying

2 - Ensure retrying mechanism works

Asked cursor to implement the unit-tests to ensure the current retrying mechanism is effective

JIRA

Refers to SPSTRAT-725

Summary by Sourcery

Broaden Azure PartnerPortalSession HTTP retry status codes and add tests to validate the retry behavior end to end.

Bug Fixes:

  • Retry logic now covers additional transient HTTP error codes starting from 408 instead of only 5xx responses.

Enhancements:

  • Updated PartnerPortalSession to configure urllib3 Retry with a wider status_forcelist range (408–511).

Tests:

  • Added end-to-end tests using a local HTTP server to verify which status codes trigger retries for GET/PUT and that POST is not retried by default.
  • Added tests to assert the configured Retry policy (total retries, backoff factor, and status_forcelist) matches expectations and that non-forcelist statuses are not retried.

Start retrying HTTPError exceptions from code `408` instead of `500` as
it was before.

With that we hope to catch missing relevant statues in which is worth
retrying

Refers to SPSTRAT-725

Signed-off-by: Jonathan Gangi <jgangi@redhat.com>
Asked cursor to implement the unit-tests.

Description:

Here is what was added and how it behaves.

requests_mock replaces requests.Session.send, so requests never use urllib3’s normal path and HTTPAdapter retries do not run. The new tests use a real thread-local HTTP server and mount the same HTTPS retry adapter on http:// so urllib3 retry behaves like production.

1. test_default_status_forcelist_is_408_through_511 – Checks the mounted adapter’s Retry: total == 5, backoff_factor == 1, and status_forcelist == tuple(range(408, 512)) (i.e. 408–511).

2. GET/PUT success after retry – Parametrized GET (408, 429, 500, 511) and one PUT (503): first response error from the forcelist, second 200, asserts two server hits.

3. No retry outside forcelist – GET returns 403 or 407 then would-be 200: only one hit; client keeps the 403/407 response.

4. 512 – range(408, 512) stops at 511, so 512 is not retried (one hit, status 512).

5. test_urllib3_retry_forcelist_covers_408_through_511 – Uses urllib3’s Retry.is_retry(method="GET", ...) so every code 408–511 is treated as retryable and 407 / 512 are not.

6. test_post_does_not_retry_by_default_urllib3 – Documents urllib3’s default: POST is not in allowed_methods, so 500 then 200 still yields a single attempt with status 500.

_get_token is patched so tests never call Microsoft login; backoff_factor=0 keeps waits negligible.

Refers to SPSTRAT-725

Signed-off-by: Jonathan Gangi <jgangi@redhat.com>

Assisted-by: Cursor/Gemini
@JAVGan
Copy link
Copy Markdown
Collaborator Author

JAVGan commented Apr 27, 2026

@lslebodn @ashwgit PTAL

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Apr 27, 2026

Reviewer's Guide

Extends PartnerPortalSession’s HTTP retry policy to cover HTTP 408–511 responses and adds end-to-end tests using a local HTTP server to verify urllib3 Retry behavior across GET, PUT, and POST methods, including which status codes are retried or not.

Sequence diagram for PartnerPortalSession HTTP request with expanded retry range

sequenceDiagram
    actor Tester
    participant TestCode
    participant PartnerPortalSession
    participant RequestsSession
    participant AzureEndpoint

    Tester->>TestCode: trigger operation()
    TestCode->>PartnerPortalSession: get_resource()
    PartnerPortalSession->>RequestsSession: request(GET /resource)
    RequestsSession->>AzureEndpoint: HTTP GET /resource
    AzureEndpoint-->>RequestsSession: 408 Request Timeout
    RequestsSession-->>PartnerPortalSession: HTTPError(408)
    PartnerPortalSession->>PartnerPortalSession: Retry(status_forcelist=408-511)
    PartnerPortalSession->>RequestsSession: retry request(GET /resource)
    RequestsSession->>AzureEndpoint: HTTP GET /resource
    AzureEndpoint-->>RequestsSession: 200 OK
    RequestsSession-->>PartnerPortalSession: Response(200)
    PartnerPortalSession-->>TestCode: Response(200)
    TestCode-->>Tester: success
Loading

Updated class diagram for PartnerPortalSession retry configuration

classDiagram
    class PartnerPortalSession {
        - requests_Session session
        - int total_retries
        - float backoff_factor
        - tuple status_forcelist
        + __init__(total_retries, backoff_factor, status_forcelist)
        + request(method, url, headers, data, params)
    }

    class Retry {
        - int total
        - float backoff_factor
        - tuple status_forcelist
        + __init__(total, backoff_factor, status_forcelist)
        + increment(method, url, response, error)
    }

    PartnerPortalSession --> Retry : configures

    %% Default behavior after this PR
    PartnerPortalSession : status_forcelist = range(408,512)
    Retry : status_forcelist = range(408,512)
Loading

Flow diagram for retry decision on HTTP status codes

flowchart LR
    A[Start HTTP request] --> B[Receive HTTP response]
    B --> C{Is exception HTTPError?}
    C -- No --> Z[Return response]
    C -- Yes --> D[Get status code]
    D --> E{status code in
    408-511?}
    E -- No --> Z
    E -- Yes --> F[Check remaining retries]
    F --> G{retries left > 0?}
    G -- No --> Z
    G -- Yes --> H[Apply backoff]
    H --> I[Retry HTTP request]
    I --> B
Loading

File-Level Changes

Change Details Files
Broaden PartnerPortalSession retry status codes from 500–511 to 408–511 while keeping retry count and backoff behavior the same.
  • Change default status_forcelist to tuple(range(408, 512)) in PartnerPortalSession initialization.
  • Preserve existing defaults for total retries and backoff factor.
  • Allow override of status_forcelist via constructor kwargs as before.
cloudpub/ms_azure/session.py
Introduce local HTTP server–backed tests to verify real urllib3 Retry behavior for PartnerPortalSession.
  • Add helpers to start a local HTTPServer and to build a BaseHTTPRequestHandler that returns a predefined sequence of status/body responses across calls.
  • Add helper to construct a PartnerPortalSession targeting an HTTP server and mount the same Retry adapter for http:// as for https://.
  • Add tests that assert the default Retry configuration (total, backoff_factor, and status_forcelist range 408–511).
  • Add tests that GET/PUT retry once on forcelist statuses then succeed, and do not retry for non-forcelist statuses (403, 407, 512).
  • Add a direct urllib3 Retry test confirming that status_forcelist=tuple(range(408, 512)) retries all codes 408–511 but not 407 or 512.
  • Add a test showing POST does not retry by default even for 500, documenting urllib3’s allowed_methods behavior.
tests/ms_azure/test_session.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In PartnerPortalSession's Retry configuration, consider explicitly setting allowed_methods instead of relying on urllib3 defaults, so POST non-retry behavior is stable even if urllib3 changes its defaults in the future.
  • The test helpers that start a local HTTP server repeat try/finally shutdown logic in multiple tests; you could factor this into a context manager or pytest fixture to centralize server lifecycle management and reduce duplication.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `PartnerPortalSession`'s `Retry` configuration, consider explicitly setting `allowed_methods` instead of relying on urllib3 defaults, so POST non-retry behavior is stable even if urllib3 changes its defaults in the future.
- The test helpers that start a local HTTP server repeat try/finally shutdown logic in multiple tests; you could factor this into a context manager or pytest fixture to centralize server lifecycle management and reduce duplication.

## Individual Comments

### Comment 1
<location path="tests/ms_azure/test_session.py" line_range="317-321" />
<code_context>
+        assert retry.backoff_factor == 1
+        assert retry.status_forcelist == tuple(range(408, 512))
+
+    @pytest.mark.parametrize(
+        "first_status",
+        [408, 429, 500, 511],
+    )
+    def test_get_retries_on_forcelist_status_then_succeeds(
+        self,
+        auth_dict: Dict[str, str],
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test case where retries are exhausted without a successful response to verify the retry limit behavior.

Currently we only cover the case where a retry eventually succeeds. Please add a test where the server keeps returning a forcelist status (e.g., 500) for more than `total_retries`, and assert both the final status and the number of calls to confirm we honor the retry limit exactly (no infinite or under-retrying).

Suggested implementation:

```python
        retry = adapter.max_retries
        assert retry.total == 5
        assert retry.backoff_factor == 1
        assert retry.status_forcelist == tuple(range(408, 512))


@pytest.mark.parametrize(
    "status_code",
    [408, 429, 500, 511],
)
def test_get_retries_exhausted_on_forcelist_status(
    auth_dict: Dict[str, str],
    port: int,
) -> None:
    """Verify that we honor the retry limit exactly when the server never recovers.

    The server keeps returning a forcelist status so all retries are exhausted.
    We assert both the final status and the exact number of calls performed.
    """
    # Build a session that targets a local HTTP server with the retry adapter.
    session = _build_local_partner_portal_session(auth_dict=auth_dict, port=port)

    # The adapter is configured with a total of 5 retries (see test above),
    # which means we should see 1 initial request + 5 retries = 6 calls.
    expected_total_retries = session.get_adapter("http://").max_retries.total
    expected_calls = expected_total_retries + 1

    call_count = 0

    def _failing_handler(request):
        nonlocal call_count
        call_count += 1
        # Always return a status code from the forcelist so that retries are triggered.
        from requests import Response

        response = Response()
        response.status_code = status_code
        response._content = b"forced failure"  # type: ignore[attr-defined]
        response.url = request.url  # type: ignore[attr-defined]
        return response

    # Configure the local server to always respond with the given status_code.
    # This assumes the existing test infrastructure already exposes a way
    # to register a handler for the local HTTP server that listens on `port`.
    _register_handler_for_local_server(port=port, handler=_failing_handler)

    # Perform a GET to trigger the retry logic. The request path and params
    # should match whatever is used in the success-case test so we exercise
    # the same code path here.
    response = session.get("")

    # We never succeed, so the final status should be the forcelist status.
    assert response.status_code == status_code

    # Ensure we honored the retry limit exactly: one initial call plus `total` retries.
    assert call_count == expected_calls

```

To make this compile and integrate cleanly, adjust the following to match your existing test helpers and fixtures:

1. Replace `_build_local_partner_portal_session` with the actual helper function you are defining just below the shown snippet (the one that builds a `PartnerPortalSession` for a local HTTP server). The signature I assumed is:
   ```python
   def _build_local_partner_portal_session(
       auth_dict: Dict[str, str],
       port: int,
       **session_kwargs: Any,
   ) -> PartnerPortalSession:
       ...
   ```
   If the name or arguments differ, update the call in `test_get_retries_exhausted_on_forcelist_status` accordingly.

2. Implement `_register_handler_for_local_server(port: int, handler: Callable)` or replace this with the existing mechanism you already use in `test_get_retries_on_forcelist_status_then_succeeds` to control responses from the local HTTP server. The intent is:
   - For every request received on the test server listening on `port`, invoke `handler(request)` and return its `Response`.
   - Ensure all responses during this test use the forcelist status so retries never succeed.

3. Ensure the request path and query parameters in `session.get("")` (or equivalent) match what your production code uses and what the success-case test uses, so that the retry adapter is actually exercised.

4. If you already have a more direct way to count calls (e.g., a counter on the server fixture), replace the `call_count` logic with that, but keep the assertion `calls == total_retries + 1`.

5. If your project already imports `Response` or uses a different way to construct responses for the local server, adjust the `_failing_handler` implementation accordingly to stay consistent with the rest of the test file.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +317 to +321
@pytest.mark.parametrize(
"first_status",
[408, 429, 500, 511],
)
def test_get_retries_on_forcelist_status_then_succeeds(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add a test case where retries are exhausted without a successful response to verify the retry limit behavior.

Currently we only cover the case where a retry eventually succeeds. Please add a test where the server keeps returning a forcelist status (e.g., 500) for more than total_retries, and assert both the final status and the number of calls to confirm we honor the retry limit exactly (no infinite or under-retrying).

Suggested implementation:

        retry = adapter.max_retries
        assert retry.total == 5
        assert retry.backoff_factor == 1
        assert retry.status_forcelist == tuple(range(408, 512))


@pytest.mark.parametrize(
    "status_code",
    [408, 429, 500, 511],
)
def test_get_retries_exhausted_on_forcelist_status(
    auth_dict: Dict[str, str],
    port: int,
) -> None:
    """Verify that we honor the retry limit exactly when the server never recovers.

    The server keeps returning a forcelist status so all retries are exhausted.
    We assert both the final status and the exact number of calls performed.
    """
    # Build a session that targets a local HTTP server with the retry adapter.
    session = _build_local_partner_portal_session(auth_dict=auth_dict, port=port)

    # The adapter is configured with a total of 5 retries (see test above),
    # which means we should see 1 initial request + 5 retries = 6 calls.
    expected_total_retries = session.get_adapter("http://").max_retries.total
    expected_calls = expected_total_retries + 1

    call_count = 0

    def _failing_handler(request):
        nonlocal call_count
        call_count += 1
        # Always return a status code from the forcelist so that retries are triggered.
        from requests import Response

        response = Response()
        response.status_code = status_code
        response._content = b"forced failure"  # type: ignore[attr-defined]
        response.url = request.url  # type: ignore[attr-defined]
        return response

    # Configure the local server to always respond with the given status_code.
    # This assumes the existing test infrastructure already exposes a way
    # to register a handler for the local HTTP server that listens on `port`.
    _register_handler_for_local_server(port=port, handler=_failing_handler)

    # Perform a GET to trigger the retry logic. The request path and params
    # should match whatever is used in the success-case test so we exercise
    # the same code path here.
    response = session.get("")

    # We never succeed, so the final status should be the forcelist status.
    assert response.status_code == status_code

    # Ensure we honored the retry limit exactly: one initial call plus `total` retries.
    assert call_count == expected_calls

To make this compile and integrate cleanly, adjust the following to match your existing test helpers and fixtures:

  1. Replace _build_local_partner_portal_session with the actual helper function you are defining just below the shown snippet (the one that builds a PartnerPortalSession for a local HTTP server). The signature I assumed is:

    def _build_local_partner_portal_session(
        auth_dict: Dict[str, str],
        port: int,
        **session_kwargs: Any,
    ) -> PartnerPortalSession:
        ...

    If the name or arguments differ, update the call in test_get_retries_exhausted_on_forcelist_status accordingly.

  2. Implement _register_handler_for_local_server(port: int, handler: Callable) or replace this with the existing mechanism you already use in test_get_retries_on_forcelist_status_then_succeeds to control responses from the local HTTP server. The intent is:

    • For every request received on the test server listening on port, invoke handler(request) and return its Response.
    • Ensure all responses during this test use the forcelist status so retries never succeed.
  3. Ensure the request path and query parameters in session.get("") (or equivalent) match what your production code uses and what the success-case test uses, so that the retry adapter is actually exercised.

  4. If you already have a more direct way to count calls (e.g., a counter on the server fixture), replace the call_count logic with that, but keep the assertion calls == total_retries + 1.

  5. If your project already imports Response or uses a different way to construct responses for the local server, adjust the _failing_handler implementation accordingly to stay consistent with the rest of the test file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant