Skip to content

Azure: Rework the PartnerPortalSession Retries#193

Closed
JAVGan wants to merge 1 commit intomainfrom
SPSTRAT-725
Closed

Azure: Rework the PartnerPortalSession Retries#193
JAVGan wants to merge 1 commit intomainfrom
SPSTRAT-725

Conversation

@JAVGan
Copy link
Copy Markdown
Collaborator

@JAVGan JAVGan commented Apr 27, 2026

This commit changes the PartnerPortalSession to rely on tenacity for retries on HTTPError instead of mounting an adapter with the retries from urllib3.

It also increases the HTTPError range from 500-512 to 408-512 in order to caught more exceptions in the retry logic.

This aims to fix some issues with the retrying mechanism we found.

Refers to SPSTRAT-725

Signed-off-by: Jonathan Gangi jgangi@redhat.com

Assisted-by: Cursor/Gemini

Summary by Sourcery

Rework Azure PartnerPortalSession HTTP retry handling to use a tenacity-based policy with explicit status-code-based retry rules and raise-for-status semantics.

Enhancements:

  • Introduce configurable Azure retry settings (total retries, backoff factor, and retriable HTTP status range 408–512) for PartnerPortalSession requests.
  • Replace urllib3 adapter-based retries with a tenacity retry decorator around the internal request method, including raise_for_status integration.

Tests:

  • Add unit tests covering HTTPError retry behavior, including success after transient failures, stopping after the configured maximum attempts, non-retriable status codes, errors without responses, and non-HTTP exceptions.
  • Add tests for the helper that determines whether an HTTPError should be retried based on the attached response and status code.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Apr 27, 2026

Reviewer's Guide

Refactors Azure PartnerPortalSession HTTP retry handling to use a tenacity-decorated request wrapper with configurable retry/backoff settings and a broadened, centralized HTTPError-based retry policy, along with comprehensive tests for the new behavior.

Sequence diagram for tenacity-based PartnerPortalSession request retries

sequenceDiagram
    actor Caller
    participant PartnerPortalSession
    participant Tenacity as Tenacity_retry_wrapper
    participant Session as requests_Session
    participant AzureAPI

    Caller->>PartnerPortalSession: get(path, **kwargs)
    PartnerPortalSession->>PartnerPortalSession: _request(method=GET, path, params, **kwargs)

    activate PartnerPortalSession
    PartnerPortalSession->>Tenacity: execute _request with retry_if_exception(_should_retry_request_http_error)

    loop up to AZURE_TOTAL_RETRIES attempts
        Tenacity->>PartnerPortalSession: call inner _request body
        PartnerPortalSession->>Session: request(method, url, params, headers, timeout, **kwargs)
        Session->>AzureAPI: HTTP request
        AzureAPI-->>Session: HTTP response
        Session-->>PartnerPortalSession: Response
        PartnerPortalSession->>PartnerPortalSession: response.raise_for_status()

        alt status_code between AZURE_RETRY_HTTP_STATUS_MIN and AZURE_RETRY_HTTP_STATUS_MAX
            PartnerPortalSession-->>Tenacity: raise HTTPError
            Tenacity->>Tenacity: wait_exponential(multiplier=AZURE_BACKOFF_FACTOR)
            Tenacity-->>PartnerPortalSession: retry call
        else status_code outside retry range
            PartnerPortalSession-->>Caller: return Response
            Tenacity-->>Caller: propagate success
        end
    end

    alt all attempts raise retryable HTTPError
        Tenacity-->>Caller: raise last HTTPError
    end

    deactivate PartnerPortalSession
Loading

Class diagram for updated Azure PartnerPortalSession retry handling

classDiagram
    class AccessToken {
        - access_token: str
        - expires_in: int
        - expires_on: int
        + __init__(access_token: str, expires_in: int, expires_on: int)
    }

    class PartnerPortalSession {
        - auth_keys: Dict[str, Any]
        - _prefix_url: str
        - session: requests.Session
        - _token: AccessToken
        - _additional_args: Dict[str, Any]
        + __init__(auth_keys: Dict[str, Any], prefix_url: str, **kwargs)
        + make_graph_api_session(auth_keys: Dict[str, Any]) PartnerPortalSession
        - _get_token() str
        - _request(method: str, path: str, params: Optional[Dict[str, Any]], **kwargs) requests.Response
        + get(path: str, **kwargs) requests.Response
        + post(path: str, **kwargs) requests.Response
        + put(path: str, **kwargs) requests.Response
        + delete(path: str, **kwargs) requests.Response
    }

    class AzureSessionModule {
        <<module>>
        + AZURE_SESSION_TIMEOUT: float
        + AZURE_TOTAL_RETRIES: int
        + AZURE_BACKOFF_FACTOR: float
        + AZURE_RETRY_HTTP_STATUS_MIN: int
        + AZURE_RETRY_HTTP_STATUS_MAX: int
        + _should_retry_request_http_error(exc: BaseException) bool
    }

    PartnerPortalSession --> AccessToken : uses
    PartnerPortalSession --> AzureSessionModule : reads_config_and_retry_predicate
Loading

File-Level Changes

Change Details Files
Replace urllib3/HTTPAdapter-based retries with a tenacity retry wrapper around PartnerPortalSession._request using HTTPError-based predicates and exponential backoff.
  • Remove mounting of HTTPAdapter with urllib3 Retry on the underlying requests.Session.
  • Introduce environment-driven configuration constants for total retries and backoff factor (AZURE_TOTAL_RETRIES, AZURE_BACKOFF_FACTOR).
  • Decorate PartnerPortalSession._request with tenacity.retry, stop_after_attempt, wait_exponential, and retry_if_exception tied to a custom predicate.
  • Ensure _request now calls response.raise_for_status() before returning, so HTTP errors surface as HTTPError exceptions for the retry logic.
cloudpub/ms_azure/session.py
Introduce a centralized predicate and configuration for retriable HTTPError status codes, expanding the range to 408–512.
  • Add AZURE_RETRY_HTTP_STATUS_MIN and AZURE_RETRY_HTTP_STATUS_MAX constants to define the inclusive retriable status window.
  • Implement _should_retry_request_http_error to gate retries on HTTPError with a non-null response whose status code falls within 408–512.
  • Wire _should_retry_request_http_error into the tenacity retry_if_exception configuration for _request.
cloudpub/ms_azure/session.py
Add tests that validate the new tenacity-based retry behavior for HTTPError and non-HTTPError cases, including edge conditions of the retriable status range.
  • Add unit tests for _should_retry_request_http_error covering non-HTTPError inputs and HTTPError without a response.
  • Add tests for successful transient HTTPError retries, exhaustion of retries on persistent HTTPError, and no-retry behavior outside the configured status range.
  • Add tests ensuring no retries occur for non-HTTPError exceptions and for HTTPError lacking a response object.
  • Add tests confirming retries are triggered by raise_for_status() on retriable status codes, including the upper bound (512).
tests/ms_azure/test_session.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@JAVGan
Copy link
Copy Markdown
Collaborator Author

JAVGan commented Apr 27, 2026

@lslebodn @ashwgit PTAL

I've opted to change the retry mechanism from urllib3 to tenacity since it's easier to mock it in tests and ensure it will work as we intended.

I also increased the minimal range of HTTPErrors from 500 to 408 (timeout) to hopefully get more exceptions in the retry mechanism. With that we also capture the 40X errors which may be relevant

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • In PartnerPortalSession.__init__, the removal of the kwargs.pop calls for total_retries, backoff_factor, and status_forcelist means any callers still passing these will now forward unexpected kwargs into requests.Session or downstream; consider either explicitly rejecting these keys or continuing to strip them to avoid surprising behavior.
  • The retry configuration (AZURE_TOTAL_RETRIES, AZURE_BACKOFF_FACTOR, and HTTP status bounds) is now read once at import time; if you expect these to be changed dynamically (e.g., per-session or per-request), consider threading them through the constructor instead of relying solely on module-level constants.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `PartnerPortalSession.__init__`, the removal of the `kwargs.pop` calls for `total_retries`, `backoff_factor`, and `status_forcelist` means any callers still passing these will now forward unexpected kwargs into `requests.Session` or downstream; consider either explicitly rejecting these keys or continuing to strip them to avoid surprising behavior.
- The retry configuration (`AZURE_TOTAL_RETRIES`, `AZURE_BACKOFF_FACTOR`, and HTTP status bounds) is now read once at import time; if you expect these to be changed dynamically (e.g., per-session or per-request), consider threading them through the constructor instead of relying solely on module-level constants.

## Individual Comments

### Comment 1
<location path="tests/ms_azure/test_session.py" line_range="295" />
<code_context>
+    @pytest.mark.parametrize('status', [400, 407, 513])
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for HTTPError at the upper bound of the retryable status range (512).

The implementation retries for status codes in the inclusive range 408–512, but the parametrized test only covers values outside that range (400, 407, 513). Please add coverage for the upper bound (512)—either by extending this parametrized list or adding a separate test—so both edges of the [408, 512] interval are explicitly tested.

```suggestion
    @mock.patch('time.sleep')
    @mock.patch("cloudpub.ms_azure.session.requests.Session")
    def test_request_retries_http_error_at_upper_status_bound(
        self,
        mock_session: mock.MagicMock,
        _sleep: mock.MagicMock,
        auth_dict: Dict[str, str],
        token: Dict[str, str],
    ) -> None:
        err = requests.exceptions.HTTPError(response=mock.Mock(status_code=512))
        mock_session.return_value.request.side_effect = [err] * AZURE_TOTAL_RETRIES

        session = PartnerPortalSession.make_graph_api_session(
            auth_dict, schema_version=auth_dict['AZURE_SCHEMA_VERSION']
        )

        with pytest.raises(requests.exceptions.HTTPError) as exc_info:
            session.get('/foo')

        assert exc_info.value is err
        assert mock_session.return_value.request.call_count == AZURE_TOTAL_RETRIES

    @pytest.mark.parametrize('status', [400, 407, 513])
```
</issue_to_address>

### Comment 2
<location path="tests/ms_azure/test_session.py" line_range="243-247" />
<code_context>
                 timeout=explicit_timeout,
             )
+
+    @staticmethod
+    def _http_error(status_code: int = 408) -> requests.exceptions.HTTPError:
+        resp = requests.Response()
+        resp.status_code = status_code
+        return requests.exceptions.HTTPError(response=resp)
+
+    @mock.patch('time.sleep')
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for HTTPError without a response to confirm it is not retried.

The retry predicate `_should_retry_request_http_error` returns `False` when `exc.response` is `None`, but `_http_error` always attaches a `Response`, so that branch isn’t exercised. Please add a test that raises `requests.exceptions.HTTPError()` without a response and asserts the request is only attempted once, to lock in this behavior and prevent regressions.

Suggested implementation:

```python
    @mock.patch('time.sleep')
    @mock.patch("cloudpub.ms_azure.session.requests.Session")
    def test_request_retries_on_http_error_then_succeeds(
        self,
        mock_session: mock.MagicMock,
        _sleep: mock.MagicMock,
        auth_dict: Dict[str, str],
        token: Dict[str, str],
    ) -> None:
        """Transient HTTPError from the underlying request is retried; success returns normally."""
        mock_session.return_value.post.return_value = response(200, token)

    @mock.patch('time.sleep')
    @mock.patch("cloudpub.ms_azure.session.requests.Session")
    def test_request_http_error_without_response_is_not_retried(
        self,
        mock_session: mock.MagicMock,
        _sleep: mock.MagicMock,
        auth_dict: Dict[str, str],
        token: Dict[str, str],
    ) -> None:
        """
        HTTPError without a response should not be treated as transient: it must not be retried.
        """
        # Simulate a non-transient HTTPError: `response` is omitted, so it defaults to None.
        mock_session.return_value.post.side_effect = requests.exceptions.HTTPError()

        # Exercise the same code path as the retrying test, but with a bare HTTPError.
        session = PartnerPortalSession(**auth_dict)

        with pytest.raises(requests.exceptions.HTTPError):
            session.request_access_token()

        # The request should only have been attempted once and no backoff sleeps should occur.
        assert mock_session.return_value.post.call_count == 1
        _sleep.assert_not_called()

```

To fully integrate this test, you may need to:

1. Ensure `pytest` is imported at the top of `tests/ms_azure/test_session.py` if it is not already:

   ```python
   import pytest
   ```

2. Align the session construction and method call with your existing tests:
   * Replace `session = PartnerPortalSession(**auth_dict)` with whatever you already do in `test_request_retries_on_http_error_then_succeeds` to create/configure the `PartnerPortalSession` (for example, passing `auth_dict` positionally, injecting `AccessToken`, or wiring the mocked `Session` onto the instance).
   * Replace `session.request_access_token()` with the same method or function call that is used in `test_request_retries_on_http_error_then_succeeds` to trigger the HTTP request and retry logic under test.

3. If the code under test uses a different HTTP verb or method than `post`, adjust `mock_session.return_value.post` and the associated assertions (`post.call_count`) to match the actual method used (e.g., `get`, `put`, etc.).
</issue_to_address>

### Comment 3
<location path="tests/ms_azure/test_session.py" line_range="249-258" />
<code_context>
+    @mock.patch('time.sleep')
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests to verify that non-HTTPError exceptions are not retried.

Since retries only occur when `_should_retry_request_http_error` returns `True` (i.e., for `HTTPError`), add tests where `mock_session.return_value.request.side_effect` is a non-`HTTPError` such as `requests.exceptions.ConnectionError()` or `requests.exceptions.Timeout()`. Assert that the call happens exactly once and the exception is propagated, confirming retries are strictly limited to `HTTPError` cases.

Suggested implementation:

```python
    @staticmethod
    def _http_error(status_code: int = 408) -> requests.exceptions.HTTPError:
        resp = requests.Response()
        resp.status_code = status_code
        return requests.exceptions.HTTPError(response=resp)

    @mock.patch("time.sleep")
    @mock.patch("cloudpub.ms_azure.session.requests.Session")
    def test_request_does_not_retry_on_connection_error(
        self,
        mock_session: mock.MagicMock,
        _sleep: mock.MagicMock,
        auth_dict: Dict[str, str],
    ) -> None:
        """Non-HTTPError ConnectionError from the underlying request is not retried."""
        # Arrange: the underlying Session.request raises a ConnectionError immediately
        mock_session.return_value.request.side_effect = requests.exceptions.ConnectionError()

        # Act / Assert: the exception is propagated and no retries occur
        with pytest.raises(requests.exceptions.ConnectionError):
            # NOTE: use the same entry point as the HTTPError retry test (e.g., AzureSession(...).request(...))
            self.session.request("GET", "https://example.test", auth=auth_dict)

        # Verify only a single attempt and no sleeps between retries
        assert mock_session.return_value.request.call_count == 1
        _sleep.assert_not_called()

    @mock.patch("time.sleep")
    @mock.patch("cloudpub.ms_azure.session.requests.Session")
    def test_request_does_not_retry_on_timeout_error(
        self,
        mock_session: mock.MagicMock,
        _sleep: mock.MagicMock,
        auth_dict: Dict[str, str],
    ) -> None:
        """Non-HTTPError Timeout from the underlying request is not retried."""
        # Arrange: the underlying Session.request raises a Timeout immediately
        mock_session.return_value.request.side_effect = requests.exceptions.Timeout()

        # Act / Assert: the exception is propagated and no retries occur
        with pytest.raises(requests.exceptions.Timeout):
            # NOTE: use the same entry point as the HTTPError retry test (e.g., AzureSession(...).request(...))
            self.session.request("GET", "https://example.test", auth=auth_dict)

        # Verify only a single attempt and no sleeps between retries
        assert mock_session.return_value.request.call_count == 1
        _sleep.assert_not_called()

    @mock.patch('time.sleep')
    @mock.patch("cloudpub.ms_azure.session.requests.Session")
    def test_request_retries_on_http_error_then_succeeds(
        self,
        mock_session: mock.MagicMock,
        _sleep: mock.MagicMock,
        auth_dict: Dict[str, str],
        token: Dict[str, str],
    ) -> None:
        """Transient HTTPError from the underlying request is retried; success returns normally."""
        mock_session.return_value.post.return_value = response(200, token)

```

1. In the new tests, `self.session.request(...)` is a placeholder for whatever entry point your existing `test_request_retries_on_http_error_then_succeeds` uses to exercise the retry logic. Replace `self.session.request("GET", "https://example.test", auth=auth_dict)` with the appropriate call (for example, constructing your Azure session class and calling its `request` method, or using a helper).
2. Ensure `pytest` and `requests` are already imported at the top of the file; they likely are, but if not, add `import pytest` and `import requests`.
3. If your production code uses a different method than `Session.request` (e.g., `post` specifically) for the retry logic, adjust `mock_session.return_value.request.side_effect` to target the correct method while still configuring a non-HTTPError exception and verifying only one invocation.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

assert exc_info.value is err
assert mock_session.return_value.request.call_count == AZURE_TOTAL_RETRIES

@pytest.mark.parametrize('status', [400, 407, 513])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add a test for HTTPError at the upper bound of the retryable status range (512).

The implementation retries for status codes in the inclusive range 408–512, but the parametrized test only covers values outside that range (400, 407, 513). Please add coverage for the upper bound (512)—either by extending this parametrized list or adding a separate test—so both edges of the [408, 512] interval are explicitly tested.

Suggested change
@pytest.mark.parametrize('status', [400, 407, 513])
@mock.patch('time.sleep')
@mock.patch("cloudpub.ms_azure.session.requests.Session")
def test_request_retries_http_error_at_upper_status_bound(
self,
mock_session: mock.MagicMock,
_sleep: mock.MagicMock,
auth_dict: Dict[str, str],
token: Dict[str, str],
) -> None:
err = requests.exceptions.HTTPError(response=mock.Mock(status_code=512))
mock_session.return_value.request.side_effect = [err] * AZURE_TOTAL_RETRIES
session = PartnerPortalSession.make_graph_api_session(
auth_dict, schema_version=auth_dict['AZURE_SCHEMA_VERSION']
)
with pytest.raises(requests.exceptions.HTTPError) as exc_info:
session.get('/foo')
assert exc_info.value is err
assert mock_session.return_value.request.call_count == AZURE_TOTAL_RETRIES
@pytest.mark.parametrize('status', [400, 407, 513])

Comment on lines +243 to +247
@staticmethod
def _http_error(status_code: int = 408) -> requests.exceptions.HTTPError:
resp = requests.Response()
resp.status_code = status_code
return requests.exceptions.HTTPError(response=resp)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider adding a test for HTTPError without a response to confirm it is not retried.

The retry predicate _should_retry_request_http_error returns False when exc.response is None, but _http_error always attaches a Response, so that branch isn’t exercised. Please add a test that raises requests.exceptions.HTTPError() without a response and asserts the request is only attempted once, to lock in this behavior and prevent regressions.

Suggested implementation:

    @mock.patch('time.sleep')
    @mock.patch("cloudpub.ms_azure.session.requests.Session")
    def test_request_retries_on_http_error_then_succeeds(
        self,
        mock_session: mock.MagicMock,
        _sleep: mock.MagicMock,
        auth_dict: Dict[str, str],
        token: Dict[str, str],
    ) -> None:
        """Transient HTTPError from the underlying request is retried; success returns normally."""
        mock_session.return_value.post.return_value = response(200, token)

    @mock.patch('time.sleep')
    @mock.patch("cloudpub.ms_azure.session.requests.Session")
    def test_request_http_error_without_response_is_not_retried(
        self,
        mock_session: mock.MagicMock,
        _sleep: mock.MagicMock,
        auth_dict: Dict[str, str],
        token: Dict[str, str],
    ) -> None:
        """
        HTTPError without a response should not be treated as transient: it must not be retried.
        """
        # Simulate a non-transient HTTPError: `response` is omitted, so it defaults to None.
        mock_session.return_value.post.side_effect = requests.exceptions.HTTPError()

        # Exercise the same code path as the retrying test, but with a bare HTTPError.
        session = PartnerPortalSession(**auth_dict)

        with pytest.raises(requests.exceptions.HTTPError):
            session.request_access_token()

        # The request should only have been attempted once and no backoff sleeps should occur.
        assert mock_session.return_value.post.call_count == 1
        _sleep.assert_not_called()

To fully integrate this test, you may need to:

  1. Ensure pytest is imported at the top of tests/ms_azure/test_session.py if it is not already:

    import pytest
  2. Align the session construction and method call with your existing tests:

    • Replace session = PartnerPortalSession(**auth_dict) with whatever you already do in test_request_retries_on_http_error_then_succeeds to create/configure the PartnerPortalSession (for example, passing auth_dict positionally, injecting AccessToken, or wiring the mocked Session onto the instance).
    • Replace session.request_access_token() with the same method or function call that is used in test_request_retries_on_http_error_then_succeeds to trigger the HTTP request and retry logic under test.
  3. If the code under test uses a different HTTP verb or method than post, adjust mock_session.return_value.post and the associated assertions (post.call_count) to match the actual method used (e.g., get, put, etc.).

Comment on lines +249 to +258
@mock.patch('time.sleep')
@mock.patch("cloudpub.ms_azure.session.requests.Session")
def test_request_retries_on_http_error_then_succeeds(
self,
mock_session: mock.MagicMock,
_sleep: mock.MagicMock,
auth_dict: Dict[str, str],
token: Dict[str, str],
) -> None:
"""Transient HTTPError from the underlying request is retried; success returns normally."""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add tests to verify that non-HTTPError exceptions are not retried.

Since retries only occur when _should_retry_request_http_error returns True (i.e., for HTTPError), add tests where mock_session.return_value.request.side_effect is a non-HTTPError such as requests.exceptions.ConnectionError() or requests.exceptions.Timeout(). Assert that the call happens exactly once and the exception is propagated, confirming retries are strictly limited to HTTPError cases.

Suggested implementation:

    @staticmethod
    def _http_error(status_code: int = 408) -> requests.exceptions.HTTPError:
        resp = requests.Response()
        resp.status_code = status_code
        return requests.exceptions.HTTPError(response=resp)

    @mock.patch("time.sleep")
    @mock.patch("cloudpub.ms_azure.session.requests.Session")
    def test_request_does_not_retry_on_connection_error(
        self,
        mock_session: mock.MagicMock,
        _sleep: mock.MagicMock,
        auth_dict: Dict[str, str],
    ) -> None:
        """Non-HTTPError ConnectionError from the underlying request is not retried."""
        # Arrange: the underlying Session.request raises a ConnectionError immediately
        mock_session.return_value.request.side_effect = requests.exceptions.ConnectionError()

        # Act / Assert: the exception is propagated and no retries occur
        with pytest.raises(requests.exceptions.ConnectionError):
            # NOTE: use the same entry point as the HTTPError retry test (e.g., AzureSession(...).request(...))
            self.session.request("GET", "https://example.test", auth=auth_dict)

        # Verify only a single attempt and no sleeps between retries
        assert mock_session.return_value.request.call_count == 1
        _sleep.assert_not_called()

    @mock.patch("time.sleep")
    @mock.patch("cloudpub.ms_azure.session.requests.Session")
    def test_request_does_not_retry_on_timeout_error(
        self,
        mock_session: mock.MagicMock,
        _sleep: mock.MagicMock,
        auth_dict: Dict[str, str],
    ) -> None:
        """Non-HTTPError Timeout from the underlying request is not retried."""
        # Arrange: the underlying Session.request raises a Timeout immediately
        mock_session.return_value.request.side_effect = requests.exceptions.Timeout()

        # Act / Assert: the exception is propagated and no retries occur
        with pytest.raises(requests.exceptions.Timeout):
            # NOTE: use the same entry point as the HTTPError retry test (e.g., AzureSession(...).request(...))
            self.session.request("GET", "https://example.test", auth=auth_dict)

        # Verify only a single attempt and no sleeps between retries
        assert mock_session.return_value.request.call_count == 1
        _sleep.assert_not_called()

    @mock.patch('time.sleep')
    @mock.patch("cloudpub.ms_azure.session.requests.Session")
    def test_request_retries_on_http_error_then_succeeds(
        self,
        mock_session: mock.MagicMock,
        _sleep: mock.MagicMock,
        auth_dict: Dict[str, str],
        token: Dict[str, str],
    ) -> None:
        """Transient HTTPError from the underlying request is retried; success returns normally."""
        mock_session.return_value.post.return_value = response(200, token)
  1. In the new tests, self.session.request(...) is a placeholder for whatever entry point your existing test_request_retries_on_http_error_then_succeeds uses to exercise the retry logic. Replace self.session.request("GET", "https://example.test", auth=auth_dict) with the appropriate call (for example, constructing your Azure session class and calling its request method, or using a helper).
  2. Ensure pytest and requests are already imported at the top of the file; they likely are, but if not, add import pytest and import requests.
  3. If your production code uses a different method than Session.request (e.g., post specifically) for the retry logic, adjust mock_session.return_value.request.side_effect to target the correct method while still configuring a non-HTTPError exception and verifying only one invocation.

@JAVGan JAVGan force-pushed the SPSTRAT-725 branch 2 times, most recently from ae50485 to e33dbc2 Compare April 27, 2026 18:27
This commit changes the `PartnerPortalSession` to rely on `tenacity` for
retries on HTTPError instead of mounting an adapter with the retries
from `urllib3`.

It also increases the HTTPError range from `500-512` to `408-512` in
order to caught more exceptions in the retry logic.

This aims to fix some issues with the retrying mechanism we found.

Refers to SPSTRAT-725

Signed-off-by: Jonathan Gangi <jgangi@redhat.com>

Assisted-by: Cursor/Gemini
@JAVGan
Copy link
Copy Markdown
Collaborator Author

JAVGan commented Apr 27, 2026

@sourcery-ai review

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The new tenacity-based retry policy only retries HTTPError and no longer covers transient ConnectionError/ReadTimeout-style failures that urllib3.Retry previously handled, which may reduce robustness against network flakiness; consider explicitly including those exception types if that behavior is still desired.
  • Adding response.raise_for_status() inside _request changes the behavior of all callers to now raise on non-2xx responses; if any existing callers relied on inspecting non-successful responses, consider either making this opt-in or documenting and validating that this behavioral change is acceptable.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new tenacity-based retry policy only retries `HTTPError` and no longer covers transient `ConnectionError`/`ReadTimeout`-style failures that `urllib3.Retry` previously handled, which may reduce robustness against network flakiness; consider explicitly including those exception types if that behavior is still desired.
- Adding `response.raise_for_status()` inside `_request` changes the behavior of all callers to now raise on non-2xx responses; if any existing callers relied on inspecting non-successful responses, consider either making this opt-in or documenting and validating that this behavioral change is acceptable.

## Individual Comments

### Comment 1
<location path="cloudpub/ms_azure/session.py" line_range="159-163" />
<code_context>
         log.debug("Serving the bearer token")
         return self._token.access_token

+    @retry(
+        retry=retry_if_exception(_should_retry_request_http_error),
+        stop=stop_after_attempt(AZURE_TOTAL_RETRIES),
+        wait=wait_exponential(multiplier=AZURE_BACKOFF_FACTOR),
+        reraise=True,
+    )
     def _request(
</code_context>
<issue_to_address>
**issue (bug_risk):** New retry logic only handles HTTPError and no longer retries connection/timeouts at the transport layer.

Switching to tenacity means we now only retry `HTTPError` raised by `raise_for_status`, while transport-layer failures (e.g., `ConnectionError`, DNS issues, timeouts) return immediately with no retry. To align with the previous `HTTPAdapter(max_retries=...)` behavior, please extend the retry predicate to include these exceptions (e.g., via `retry_if_exception_type` or combined predicates) so both HTTP- and network-level transient errors are retried.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +159 to +163
@retry(
retry=retry_if_exception(_should_retry_request_http_error),
stop=stop_after_attempt(AZURE_TOTAL_RETRIES),
wait=wait_exponential(multiplier=AZURE_BACKOFF_FACTOR),
reraise=True,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): New retry logic only handles HTTPError and no longer retries connection/timeouts at the transport layer.

Switching to tenacity means we now only retry HTTPError raised by raise_for_status, while transport-layer failures (e.g., ConnectionError, DNS issues, timeouts) return immediately with no retry. To align with the previous HTTPAdapter(max_retries=...) behavior, please extend the retry predicate to include these exceptions (e.g., via retry_if_exception_type or combined predicates) so both HTTP- and network-level transient errors are retried.

@JAVGan JAVGan closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant