Skip to content

docs: enrich module docstrings and add doctest examples#1498

Merged
timsaucer merged 4 commits intoapache:mainfrom
timsaucer:feat/module-docstrings
Apr 24, 2026
Merged

docs: enrich module docstrings and add doctest examples#1498
timsaucer merged 4 commits intoapache:mainfrom
timsaucer:feat/module-docstrings

Conversation

@timsaucer
Copy link
Copy Markdown
Member

Which issue does this PR close?

Part of #1394. This is "PR 1b" from the implementation plan in
#1394 (comment).

Rationale for this change

The per-module docstrings for functions.py, dataframe.py, expr.py,
and context.py were one-line summaries that pointed at the online
docs without explaining the module's role or giving any example. That
makes the repo harder to navigate both for humans skimming the source
and for AI coding assistants that can only see what ships with the
package. Several of the most commonly used DataFrame methods also
lacked runnable examples, even though peer methods (intersect,
except_all, distinct_on, union_by_name, join_on, ...) had
already been brought up to the project's example-in-docstring
convention.

What changes are included in this PR?

  • Enriched module docstrings for functions.py, dataframe.py,
    expr.py, and context.py. Each now opens with a one-line summary
    of the type's role, a paragraph of concept/usage guidance with
    :py:class: / :py:meth: cross-references, a compact doctest, and
    a :ref: pointer into the docs site.
  • Added doctest examples to six high-traffic DataFrame methods:
    select, aggregate, sort, limit, join, and union.
    Optional parameters are passed with keyword syntax, and examples
    reuse the same input data across variants so the effect of each
    option is easy to see.
  • pytest --doctest-modules is clean (266 → 276 passing doctests);
    full suite passes locally.

Are there any user-facing changes?

Documentation only — no API changes.

Expands the module docstrings for `functions.py`, `dataframe.py`,
`expr.py`, and `context.py` so each module opens with a concept summary,
cross-references to related APIs, and a small executable example.

Adds doctest examples to the high-traffic `DataFrame` methods that
previously lacked them: `select`, `aggregate`, `sort`, `limit`, `join`,
and `union`. Optional parameters are demonstrated with keyword syntax,
and examples reuse the same input data across variants so the effect of
each option is easy to see.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# under the License.

"""Session Context and it's associated configuration."""
""":py:class:`SessionContext` — entry point for running DataFusion queries.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we expect to be changing a bunch of the website stuff it feels like it would be nice to generate a preview in CI if not exceedingly expensive.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI does already build the docs. I suppose we could zip the site up and make it a downloadable artifact

Comment thread python/datafusion/dataframe.py
Comment thread python/datafusion/dataframe.py Outdated
timsaucer and others added 2 commits April 23, 2026 19:38
Change the score data from [1, 2, 3] to [1, 2, 5] so the grouped
result produces [3, 5] instead of [3, 3], removing ambiguity about
which total belongs to which team.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the redundant lit() in the dataframe.py module-docstring filter
example and use a plain string group key in the aggregate() doctest, so
both examples model the style SKILL.md recommends. Also document the
sort("a") string form and sort_by() shortcut in SKILL.md's sorting
section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timsaucer timsaucer merged commit 8741d30 into apache:main Apr 24, 2026
21 checks passed
@timsaucer timsaucer deleted the feat/module-docstrings branch April 24, 2026 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants