Skip to content

feat: Make list methods of CollectionClients iterable#760

Draft
Pijukatel wants to merge 6 commits intomasterfrom
iterable-list-methods-2
Draft

feat: Make list methods of CollectionClients iterable#760
Pijukatel wants to merge 6 commits intomasterfrom
iterable-list-methods-2

Conversation

@Pijukatel
Copy link
Copy Markdown
Contributor

@Pijukatel Pijukatel commented Apr 23, 2026

Description

  • All collection clients list method returns an iterator as well.
  • All async collection clients list method returns an async iterator as well.
  • List of modified clients (same for async clients):
    • ActorCollectionClient
    • BuildCollectionClient
    • RunCollectionClient
    • ScheduleCollectionClient
    • TaskCollectionClient
    • WebhookCollectionClient
    • WebhookDispatchCollectionClient
    • DatasetCollectionClient
    • KeyValueStoreCollectionClient
    • RequestQueueCollectionClient
    • StoreCollectionClient
    • ActorEnvVarCollectionClient
    • ActorVersionCollectionClient
  • Additionally, the following storage-related list methods were modified to support iteration as well:
    • DatasetClient.list_items (and marking Dataset.iterate_items as deprecated)
    • KeyValueStoreClient.list_keys (and marking Dataset.iterate_items as deprecated)
    • RequestQueueClient.list_requests

TODO:

  • Update docs

Example usage

...
# Sync
datasets_client = ApifyClient(token='...').datasets()

# Same as before
list_page = datasets_client.list(...)

# New functionality
individual_items = [item for item in datasets_client.list(...)]

...
# Async
datasets_client = ApifyClientAsync(token='...').datasets()

# Same as before
list_page = await datasets_client.list(...)

# New functionality
individual_items = [item async for item in datasets_client.list(...)]

Issues

Testing

  • Unit tests
  • Manual API tests

Checklist

  • CI passed

Working tests and implementation.
TODO:
-Check KVS and RQ special cases
-Figure out model mocking in some elegant way
@github-actions github-actions Bot added this to the 139th sprint - Tooling team milestone Apr 24, 2026
@github-actions github-actions Bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Apr 24, 2026
@Pijukatel Pijukatel requested a review from Copilot April 24, 2026 06:02
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds first-class pagination iteration to the Python Apify client by making list()-style methods return objects that preserve first-page metadata while also supporting (async) iteration across subsequent API pages.

Changes:

  • Introduces IterableListPage / IterableListPageAsync and helper builders for offset- and cursor-based pagination.
  • Updates multiple sync/async resource-client list() methods (and storage list methods like dataset items / KVS keys / RQ requests) to return iterable pages.
  • Adds unit tests covering pagination behavior across many clients and option combinations.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 22 comments.

Show a summary per file
File Description
tests/unit/test_client_pagination.py Adds end-to-end pagination tests (offset + cursor) for sync/async clients using an HTTP test server.
src/apify_client/_iterable_list_page.py New pagination wrappers + builders enabling iteration/awaiting behavior.
src/apify_client/_resource_clients/actor_collection.py Makes Actors collection list() iterable (sync + async).
src/apify_client/_resource_clients/build_collection.py Makes Builds collection list() iterable (sync + async).
src/apify_client/_resource_clients/run_collection.py Makes Runs collection list() iterable (sync + async).
src/apify_client/_resource_clients/schedule_collection.py Makes Schedules collection list() iterable (sync + async).
src/apify_client/_resource_clients/task_collection.py Makes Tasks collection list() iterable (sync + async).
src/apify_client/_resource_clients/webhook_collection.py Makes Webhooks collection list() iterable (sync + async).
src/apify_client/_resource_clients/webhook_dispatch_collection.py Makes Webhook dispatches list() iterable (sync + async), including empty-list handling.
src/apify_client/_resource_clients/store_collection.py Makes Store actors list() iterable (sync + async).
src/apify_client/_resource_clients/dataset_collection.py Makes Datasets collection list() iterable (sync + async).
src/apify_client/_resource_clients/key_value_store_collection.py Makes KVS collection list() iterable (sync + async).
src/apify_client/_resource_clients/request_queue_collection.py Makes Request queues collection list() iterable (sync + async).
src/apify_client/_resource_clients/dataset.py Makes list_items() iterable; deprecates iterate_items() by delegating to list_items().
src/apify_client/_resource_clients/key_value_store.py Makes list_keys() iterable; deprecates iterate_keys() by delegating to list_keys().
src/apify_client/_resource_clients/request_queue.py Makes list_requests() iterable (cursor-based); adds chunk_size and keeps mutual-exclusion validation.
src/apify_client/_resource_clients/actor_env_var_collection.py Makes env var collection list() iterable (sync + async).
src/apify_client/_resource_clients/actor_version_collection.py Makes version collection list() iterable (sync + async).
Comments suppressed due to low confidence (1)

src/apify_client/_resource_clients/request_queue.py:533

  • The Args: section lists cursor / exclusive_start_id twice, which is confusing and makes the docstring contradictory. Please remove the duplicated lines and keep a single description (including the deprecation note).
        Args:
            limit: How many requests to retrieve.
            filter: List of request states to use as a filter. Multiple values mean union of the given filters.
            cursor: A token returned in a previous API response, to continue listing the next page of requests.
            exclusive_start_id: (deprecated) All requests up to this one (including) are skipped from the result.
                Only applied to the first page fetched; subsequent pages during iteration use `cursor`.
            chunk_size: Maximum number of requests requested per API call when iterating. Only
                relevant when iterating across pages.
            timeout: Timeout for the API HTTP request.
            cursor: A token returned in previous API response, to continue listing next page of requests
            exclusive_start_id: (deprecated) All requests up to this one (including) are skipped from the result.
        """

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1095 to +1097
The returned page also supports iteration: `for request in client.list_requests(...)` yields
individual requests and transparently fetches further pages using the opaque `cursor`
returned by the API.
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This async client docstring shows sync iteration (for request in client.list_requests(...)). Since this returns an IterableListPageAsync, iterating across pages requires async for ... in client.list_requests(...). Please update the wording to avoid suggesting invalid usage.

Suggested change
The returned page also supports iteration: `for request in client.list_requests(...)` yields
individual requests and transparently fetches further pages using the opaque `cursor`
returned by the API.
The returned page also supports iteration: `async for request in client.list_requests(...)`
yields individual requests and transparently fetches further pages using the opaque
`cursor` returned by the API.

Copilot uses AI. Check for mistakes.
) -> IterableListPageAsync[TaskShort]:
"""List the available tasks.

The returned page also supports iteration: `for item in client.list(...)` yields individual tasks
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This async client docstring uses sync iteration (for item in client.list(...)). Since list() returns an IterableListPageAsync, the correct usage for paging iteration is async for ... in client.list(...).

Suggested change
The returned page also supports iteration: `for item in client.list(...)` yields individual tasks
The returned page also supports iteration: `async for item in client.list(...)` yields individual tasks

Copilot uses AI. Check for mistakes.
Comment on lines +86 to +92
self._iterator = iterator

def __iter__(self) -> Iterator[T]:
"""Return an iterator over all items across pages, fetching additional pages as needed."""
return self._iterator


Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IterableListPage.__iter__() returns the same iterator instance each time. That makes the object single-pass (a second for loop will continue where the first left off / be exhausted), which is surprising for an Iterable. If single-pass is intended, consider implementing Iterator instead; otherwise store an iterator factory and return a new iterator on each __iter__() call.

Suggested change
self._iterator = iterator
def __iter__(self) -> Iterator[T]:
"""Return an iterator over all items across pages, fetching additional pages as needed."""
return self._iterator
self._source_iterator = iterator
self._cached_items: list[T] = []
def __iter__(self) -> Iterator[T]:
"""Return an iterator over all items across pages, fetching additional pages as needed."""
def _iterate() -> Iterator[T]:
index = 0
while True:
if index < len(self._cached_items):
yield self._cached_items[index]
index += 1
continue
try:
item = next(self._source_iterator)
except StopIteration:
return
self._cached_items.append(item)
yield item
index += 1
return _iterate()

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +116
def __init__(
self,
make_awaitable: Callable[[], Awaitable[IterableListPage[T]]],
async_iterator: AsyncIterator[T],
) -> None:
"""Initialize with a factory that creates the awaitable on demand and the async iterator over items."""
self._make_awaitable = make_awaitable
self._async_iterator = async_iterator

def __aiter__(self) -> AsyncIterator[T]:
"""Return an asynchronous iterator over all items across pages."""
return self._async_iterator

Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IterableListPageAsync.__aiter__() returns the same async-iterator instance every time. This makes the object effectively single-use, which is unexpected for an AsyncIterable (callers usually expect a fresh iterator per async for). Consider storing an async-iterator factory (or making the class itself an AsyncIterator) so repeated iteration behaves predictably.

Copilot uses AI. Check for mistakes.
Comment on lines +597 to +598
The returned page also supports iteration: `for key in client.list_keys(...)` yields individual
keys and transparently fetches further pages using cursor-based pagination.
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This async client docstring shows sync iteration (for key in client.list_keys(...)). Since this returns an IterableListPageAsync, users need async for ... in client.list_keys(...) to iterate across pages. Please update the phrasing accordingly.

Suggested change
The returned page also supports iteration: `for key in client.list_keys(...)` yields individual
keys and transparently fetches further pages using cursor-based pagination.
The returned page also supports iteration: `async for key in client.list_keys(...)` yields
individual keys and transparently fetches further pages using cursor-based pagination.

Copilot uses AI. Check for mistakes.
Comment thread src/apify_client/_iterable_list_page.py Outdated
populated `IterableListPage`. Iterating (`async for item in client.list(...)`) yields individual
items and performs additional API calls as needed to fetch further pages.

A single instance supports either awaiting or iterating — not both.
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring says “A single instance supports either awaiting or iterating — not both.” but there’s no enforcement, and for offset-based pagination the implementation actually shares the first-page task between await and async for. Either enforce the restriction (raise on second mode) or update the docstring to describe the actual supported behavior.

Suggested change
A single instance supports either awaiting or iteratingnot both.
The same instance may be awaited to obtain the first page and may also be asynchronously
iterated to consume items across pages.

Copilot uses AI. Check for mistakes.
Comment on lines 20 to +71
@@ -36,9 +46,12 @@ def list(
offset: int | None = None,
desc: bool | None = None,
timeout: Timeout = 'medium',
) -> ListOfWebhookDispatches | None:
) -> IterableListPage[WebhookDispatch]:
"""List all webhook dispatches of a user.

The returned page also supports iteration: `for item in client.list(...)` yields individual
webhook dispatches and transparently fetches further pages from the API.

https://docs.apify.com/api/v2#/reference/webhook-dispatches/webhook-dispatches-collection/get-list-of-webhook-dispatches

Args:
@@ -50,8 +63,12 @@ def list(
Returns:
The retrieved webhook dispatches of a user.
"""
result = self._list(timeout=timeout, limit=limit, offset=offset, desc=desc)
return WebhookDispatchList.model_validate(result).data

def _callback(**kwargs: Any) -> ListOfWebhookDispatches:
result = self._list(timeout=timeout, **kwargs)
return WebhookDispatchList.model_validate(result).data or _EMPTY_WEBHOOK_DISPATCHES

return build_iterable_list_page(_callback, limit=limit, offset=offset, desc=desc)
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_EMPTY_WEBHOOK_DISPATCHES hard-codes offset=0, limit=1, desc=False. When the API returns data=None for an empty result, callers passing non-default limit/offset/desc will get misleading metadata on the returned page. Prefer constructing an empty ListOfWebhookDispatches using the effective request kwargs (and requested desc).

Copilot uses AI. Check for mistakes.
Comment on lines 100 to +122
@@ -94,5 +114,9 @@ async def list(
Returns:
The retrieved webhook dispatches of a user.
"""
result = await self._list(timeout=timeout, limit=limit, offset=offset, desc=desc)
return WebhookDispatchList.model_validate(result).data

async def _callback(**kwargs: Any) -> ListOfWebhookDispatches:
result = await self._list(timeout=timeout, **kwargs)
return WebhookDispatchList.model_validate(result).data or _EMPTY_WEBHOOK_DISPATCHES

return build_iterable_list_page_async(_callback, limit=limit, offset=offset, desc=desc)
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This async client docstring shows sync iteration (for item in client.list(...)). Since this returns an IterableListPageAsync, users need async for ... in client.list(...) to iterate across pages. Please update the example phrasing accordingly.

Copilot uses AI. Check for mistakes.
) -> IterableListPageAsync[StoreListActor]:
"""List Actors in Apify store.

The returned page also supports iteration: `for item in client.list(...)` yields individual Actors
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This async client docstring uses sync iteration (for item in client.list(...)). Since list() returns an IterableListPageAsync, the correct usage for paging iteration is async for ... in client.list(...).

Suggested change
The returned page also supports iteration: `for item in client.list(...)` yields individual Actors
The returned page also supports iteration: `async for item in client.list(...)` yields individual Actors

Copilot uses AI. Check for mistakes.
) -> IterableListPageAsync[KeyValueStore]:
"""List the available key-value stores.

The returned page also supports iteration: `for item in client.list(...)` yields individual
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This async client docstring uses sync iteration (for item in client.list(...)). Since list() returns an IterableListPageAsync, the correct usage for paging iteration is async for ... in client.list(...).

Suggested change
The returned page also supports iteration: `for item in client.list(...)` yields individual
The returned page also supports iteration: `async for item in client.list(...)` yields individual

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants