Skip to content

fix: improve error messages for waiter timeouts#5785

Merged
mollyheamazon merged 5 commits intoaws:masterfrom
joshuatowner:fix-error-message
Apr 23, 2026
Merged

fix: improve error messages for waiter timeouts#5785
mollyheamazon merged 5 commits intoaws:masterfrom
joshuatowner:fix-error-message

Conversation

@joshuatowner
Copy link
Copy Markdown
Contributor

@joshuatowner joshuatowner commented Apr 22, 2026

Improve timeout error messages for TrainingJob and EvaluationJob waits

When wait() times out, the current error says "Increase the timeout and try again" — which implies the job stopped and needs to be re-created. In reality, the job is still running server-side; only the client-side polling stopped.

Changes:

  • Add message parameter to TimeoutExceededError so callers can provide context-specific guidance (defaults to existing message for backward compat)
  • TrainingJob.wait(): timeout now says "Your training job is still running. Call .refresh() to check its current status."
  • EvaluationPipelineExecution.wait(): timeout now says "Your evaluation job is still running. Use .refresh() to check its current status." Also changes resource_type from PipelineExecution to EvaluationJob for clarity.
  • Fix pre-existing typo resouce_type → resource_type in TrainingJob.wait() (would have caused TypeError at runtime)

@joshuatowner joshuatowner marked this pull request as ready for review April 22, 2026 07:35
raise TimeoutExceededError(resouce_type="TrainingJob", status=current_status)
raise TimeoutExceededError(
resource_type="TrainingJob",
status=current_status,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have helped caught typo of this autogenerated script: should be resource_type instead of resouce_type, thanks! Can you help update the source for this script

  1. https://github.com/aws/sagemaker-python-sdk/blob/master/sagemaker-core/src/sagemaker/core/tools/templates.py#L335, also line 388, 439.
  2. Regenerate resources.py:
   cd sagemaker-core
   python -m sagemaker.core.tools.codegen
  1. That single template fix will correct all 85+ resources in the generated
    resources.py at once

Copy link
Copy Markdown
Contributor

@gauravmadarkal gauravmadarkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mollyheamazon mollyheamazon merged commit 815953e into aws:master Apr 23, 2026
27 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants