feat: Make list methods of CollectionClients iterable#760
feat: Make list methods of CollectionClients iterable#760
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds first-class pagination iteration to the Python Apify client by making list()-style methods return objects that preserve first-page metadata while also supporting (async) iteration across subsequent API pages.
Changes:
- Introduces
IterableListPage/IterableListPageAsyncand helper builders for offset- and cursor-based pagination. - Updates multiple sync/async resource-client
list()methods (and storage list methods like dataset items / KVS keys / RQ requests) to return iterable pages. - Adds unit tests covering pagination behavior across many clients and option combinations.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 22 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_client_pagination.py | Adds end-to-end pagination tests (offset + cursor) for sync/async clients using an HTTP test server. |
| src/apify_client/_iterable_list_page.py | New pagination wrappers + builders enabling iteration/awaiting behavior. |
| src/apify_client/_resource_clients/actor_collection.py | Makes Actors collection list() iterable (sync + async). |
| src/apify_client/_resource_clients/build_collection.py | Makes Builds collection list() iterable (sync + async). |
| src/apify_client/_resource_clients/run_collection.py | Makes Runs collection list() iterable (sync + async). |
| src/apify_client/_resource_clients/schedule_collection.py | Makes Schedules collection list() iterable (sync + async). |
| src/apify_client/_resource_clients/task_collection.py | Makes Tasks collection list() iterable (sync + async). |
| src/apify_client/_resource_clients/webhook_collection.py | Makes Webhooks collection list() iterable (sync + async). |
| src/apify_client/_resource_clients/webhook_dispatch_collection.py | Makes Webhook dispatches list() iterable (sync + async), including empty-list handling. |
| src/apify_client/_resource_clients/store_collection.py | Makes Store actors list() iterable (sync + async). |
| src/apify_client/_resource_clients/dataset_collection.py | Makes Datasets collection list() iterable (sync + async). |
| src/apify_client/_resource_clients/key_value_store_collection.py | Makes KVS collection list() iterable (sync + async). |
| src/apify_client/_resource_clients/request_queue_collection.py | Makes Request queues collection list() iterable (sync + async). |
| src/apify_client/_resource_clients/dataset.py | Makes list_items() iterable; deprecates iterate_items() by delegating to list_items(). |
| src/apify_client/_resource_clients/key_value_store.py | Makes list_keys() iterable; deprecates iterate_keys() by delegating to list_keys(). |
| src/apify_client/_resource_clients/request_queue.py | Makes list_requests() iterable (cursor-based); adds chunk_size and keeps mutual-exclusion validation. |
| src/apify_client/_resource_clients/actor_env_var_collection.py | Makes env var collection list() iterable (sync + async). |
| src/apify_client/_resource_clients/actor_version_collection.py | Makes version collection list() iterable (sync + async). |
Comments suppressed due to low confidence (1)
src/apify_client/_resource_clients/request_queue.py:533
- The
Args:section listscursor/exclusive_start_idtwice, which is confusing and makes the docstring contradictory. Please remove the duplicated lines and keep a single description (including the deprecation note).
Args:
limit: How many requests to retrieve.
filter: List of request states to use as a filter. Multiple values mean union of the given filters.
cursor: A token returned in a previous API response, to continue listing the next page of requests.
exclusive_start_id: (deprecated) All requests up to this one (including) are skipped from the result.
Only applied to the first page fetched; subsequent pages during iteration use `cursor`.
chunk_size: Maximum number of requests requested per API call when iterating. Only
relevant when iterating across pages.
timeout: Timeout for the API HTTP request.
cursor: A token returned in previous API response, to continue listing next page of requests
exclusive_start_id: (deprecated) All requests up to this one (including) are skipped from the result.
"""
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| The returned page also supports iteration: `for request in client.list_requests(...)` yields | ||
| individual requests and transparently fetches further pages using the opaque `cursor` | ||
| returned by the API. |
There was a problem hiding this comment.
This async client docstring shows sync iteration (for request in client.list_requests(...)). Since this returns an IterableListPageAsync, iterating across pages requires async for ... in client.list_requests(...). Please update the wording to avoid suggesting invalid usage.
| The returned page also supports iteration: `for request in client.list_requests(...)` yields | |
| individual requests and transparently fetches further pages using the opaque `cursor` | |
| returned by the API. | |
| The returned page also supports iteration: `async for request in client.list_requests(...)` | |
| yields individual requests and transparently fetches further pages using the opaque | |
| `cursor` returned by the API. |
| ) -> IterableListPageAsync[TaskShort]: | ||
| """List the available tasks. | ||
|
|
||
| The returned page also supports iteration: `for item in client.list(...)` yields individual tasks |
There was a problem hiding this comment.
This async client docstring uses sync iteration (for item in client.list(...)). Since list() returns an IterableListPageAsync, the correct usage for paging iteration is async for ... in client.list(...).
| The returned page also supports iteration: `for item in client.list(...)` yields individual tasks | |
| The returned page also supports iteration: `async for item in client.list(...)` yields individual tasks |
| self._iterator = iterator | ||
|
|
||
| def __iter__(self) -> Iterator[T]: | ||
| """Return an iterator over all items across pages, fetching additional pages as needed.""" | ||
| return self._iterator | ||
|
|
||
|
|
There was a problem hiding this comment.
IterableListPage.__iter__() returns the same iterator instance each time. That makes the object single-pass (a second for loop will continue where the first left off / be exhausted), which is surprising for an Iterable. If single-pass is intended, consider implementing Iterator instead; otherwise store an iterator factory and return a new iterator on each __iter__() call.
| self._iterator = iterator | |
| def __iter__(self) -> Iterator[T]: | |
| """Return an iterator over all items across pages, fetching additional pages as needed.""" | |
| return self._iterator | |
| self._source_iterator = iterator | |
| self._cached_items: list[T] = [] | |
| def __iter__(self) -> Iterator[T]: | |
| """Return an iterator over all items across pages, fetching additional pages as needed.""" | |
| def _iterate() -> Iterator[T]: | |
| index = 0 | |
| while True: | |
| if index < len(self._cached_items): | |
| yield self._cached_items[index] | |
| index += 1 | |
| continue | |
| try: | |
| item = next(self._source_iterator) | |
| except StopIteration: | |
| return | |
| self._cached_items.append(item) | |
| yield item | |
| index += 1 | |
| return _iterate() |
| def __init__( | ||
| self, | ||
| make_awaitable: Callable[[], Awaitable[IterableListPage[T]]], | ||
| async_iterator: AsyncIterator[T], | ||
| ) -> None: | ||
| """Initialize with a factory that creates the awaitable on demand and the async iterator over items.""" | ||
| self._make_awaitable = make_awaitable | ||
| self._async_iterator = async_iterator | ||
|
|
||
| def __aiter__(self) -> AsyncIterator[T]: | ||
| """Return an asynchronous iterator over all items across pages.""" | ||
| return self._async_iterator | ||
|
|
There was a problem hiding this comment.
IterableListPageAsync.__aiter__() returns the same async-iterator instance every time. This makes the object effectively single-use, which is unexpected for an AsyncIterable (callers usually expect a fresh iterator per async for). Consider storing an async-iterator factory (or making the class itself an AsyncIterator) so repeated iteration behaves predictably.
| The returned page also supports iteration: `for key in client.list_keys(...)` yields individual | ||
| keys and transparently fetches further pages using cursor-based pagination. |
There was a problem hiding this comment.
This async client docstring shows sync iteration (for key in client.list_keys(...)). Since this returns an IterableListPageAsync, users need async for ... in client.list_keys(...) to iterate across pages. Please update the phrasing accordingly.
| The returned page also supports iteration: `for key in client.list_keys(...)` yields individual | |
| keys and transparently fetches further pages using cursor-based pagination. | |
| The returned page also supports iteration: `async for key in client.list_keys(...)` yields | |
| individual keys and transparently fetches further pages using cursor-based pagination. |
| populated `IterableListPage`. Iterating (`async for item in client.list(...)`) yields individual | ||
| items and performs additional API calls as needed to fetch further pages. | ||
|
|
||
| A single instance supports either awaiting or iterating — not both. |
There was a problem hiding this comment.
Docstring says “A single instance supports either awaiting or iterating — not both.” but there’s no enforcement, and for offset-based pagination the implementation actually shares the first-page task between await and async for. Either enforce the restriction (raise on second mode) or update the docstring to describe the actual supported behavior.
| A single instance supports either awaiting or iterating — not both. | |
| The same instance may be awaited to obtain the first page and may also be asynchronously | |
| iterated to consume items across pages. |
| @@ -36,9 +46,12 @@ def list( | |||
| offset: int | None = None, | |||
| desc: bool | None = None, | |||
| timeout: Timeout = 'medium', | |||
| ) -> ListOfWebhookDispatches | None: | |||
| ) -> IterableListPage[WebhookDispatch]: | |||
| """List all webhook dispatches of a user. | |||
|
|
|||
| The returned page also supports iteration: `for item in client.list(...)` yields individual | |||
| webhook dispatches and transparently fetches further pages from the API. | |||
|
|
|||
| https://docs.apify.com/api/v2#/reference/webhook-dispatches/webhook-dispatches-collection/get-list-of-webhook-dispatches | |||
|
|
|||
| Args: | |||
| @@ -50,8 +63,12 @@ def list( | |||
| Returns: | |||
| The retrieved webhook dispatches of a user. | |||
| """ | |||
| result = self._list(timeout=timeout, limit=limit, offset=offset, desc=desc) | |||
| return WebhookDispatchList.model_validate(result).data | |||
|
|
|||
| def _callback(**kwargs: Any) -> ListOfWebhookDispatches: | |||
| result = self._list(timeout=timeout, **kwargs) | |||
| return WebhookDispatchList.model_validate(result).data or _EMPTY_WEBHOOK_DISPATCHES | |||
|
|
|||
| return build_iterable_list_page(_callback, limit=limit, offset=offset, desc=desc) | |||
There was a problem hiding this comment.
_EMPTY_WEBHOOK_DISPATCHES hard-codes offset=0, limit=1, desc=False. When the API returns data=None for an empty result, callers passing non-default limit/offset/desc will get misleading metadata on the returned page. Prefer constructing an empty ListOfWebhookDispatches using the effective request kwargs (and requested desc).
| @@ -94,5 +114,9 @@ async def list( | |||
| Returns: | |||
| The retrieved webhook dispatches of a user. | |||
| """ | |||
| result = await self._list(timeout=timeout, limit=limit, offset=offset, desc=desc) | |||
| return WebhookDispatchList.model_validate(result).data | |||
|
|
|||
| async def _callback(**kwargs: Any) -> ListOfWebhookDispatches: | |||
| result = await self._list(timeout=timeout, **kwargs) | |||
| return WebhookDispatchList.model_validate(result).data or _EMPTY_WEBHOOK_DISPATCHES | |||
|
|
|||
| return build_iterable_list_page_async(_callback, limit=limit, offset=offset, desc=desc) | |||
There was a problem hiding this comment.
This async client docstring shows sync iteration (for item in client.list(...)). Since this returns an IterableListPageAsync, users need async for ... in client.list(...) to iterate across pages. Please update the example phrasing accordingly.
| ) -> IterableListPageAsync[StoreListActor]: | ||
| """List Actors in Apify store. | ||
|
|
||
| The returned page also supports iteration: `for item in client.list(...)` yields individual Actors |
There was a problem hiding this comment.
This async client docstring uses sync iteration (for item in client.list(...)). Since list() returns an IterableListPageAsync, the correct usage for paging iteration is async for ... in client.list(...).
| The returned page also supports iteration: `for item in client.list(...)` yields individual Actors | |
| The returned page also supports iteration: `async for item in client.list(...)` yields individual Actors |
| ) -> IterableListPageAsync[KeyValueStore]: | ||
| """List the available key-value stores. | ||
|
|
||
| The returned page also supports iteration: `for item in client.list(...)` yields individual |
There was a problem hiding this comment.
This async client docstring uses sync iteration (for item in client.list(...)). Since list() returns an IterableListPageAsync, the correct usage for paging iteration is async for ... in client.list(...).
| The returned page also supports iteration: `for item in client.list(...)` yields individual | |
| The returned page also supports iteration: `async for item in client.list(...)` yields individual |
Description
TODO:
Example usage
Issues
Testing
Checklist