Skip to content

Enable doc tests in local and CI testing#1409

Merged
alamb merged 3 commits intoapache:mainfrom
rerun-io:nick/doctest_infra
Mar 5, 2026
Merged

Enable doc tests in local and CI testing#1409
alamb merged 3 commits intoapache:mainfrom
rerun-io:nick/doctest_infra

Conversation

@ntjohnson1
Copy link
Contributor

@ntjohnson1 ntjohnson1 commented Mar 4, 2026

Which issue does this PR close?

Rationale for this change

This enables doctests so if more examples are added to doc strings they can be validated locally and in CI.

What changes are included in this PR?

  • Turns on doctest so pytest verifies docstring examples
  • Minor update to the test.yml to fix test discovery with doctest turned on, and fixed a github lint that triggered for me since I touched that file
  • Fixes a few existing examples to be compliant

Are there any user-facing changes?

No

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables doctests in the DataFusion Python library to ensure docstring examples are validated both locally and in CI. It is Part 1 of a larger effort (#1400) to improve developer and agent usability through better documentation and tested examples.

Changes:

  • Enables --doctest-modules in pytest configuration (pyproject.toml) and sets testpaths to scope collection to python/tests and python/datafusion
  • Updates the GitHub Actions test.yml to fix a broken cache key reference and removes the explicit . path from the pytest command (deferring to testpaths)
  • Fixes two existing docstring examples in dataframe.py (into_view and fill_null) that were pseudo-code or would fail as doctests

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
pyproject.toml Enables --doctest-modules, adds doctest_optionflags, and sets testpaths to scope doctest collection
.github/workflows/test.yml Fixes broken cargo cache key (steps.rust-toolchain.outputs.cachekeymatrix.toolchain) and removes explicit . path from pytest command
python/datafusion/dataframe.py Fixes into_view and fill_null docstring examples to be valid, self-contained, and executable doctests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ntjohnson1 --

[tool.pytest.ini_options]
asyncio_mode = "auto"
asyncio_default_fixture_loop_scope = "function"
addopts = "--doctest-modules"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> df = df.fill_null(0) # Fill all nulls with 0 where possible
>>> # Fill nulls in specific string columns
>>> df = df.fill_null("missing", subset=["name", "category"])
>>> from datafusion import SessionContext, col
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ntjohnson1

Did you verify from the CI run logs that these tests are now run? I took a brief look and couldn't find anything that was new https://github.com/apache/datafusion-python/actions/runs/22687531822/job/65776433980?pr=1409

🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doctests run as a part of pytest so if you click on Run Tests you can see the new doctest entries.

There aren't that many examples in the code at the moment.
If you scroll all the way to the bottom you can now see source code showing up in addition to test code

python/datafusion/dataframe.py::datafusion.dataframe.DataFrame.fill_null PASSED [ 99%]
python/datafusion/dataframe.py::datafusion.dataframe.DataFrame.into_view PASSED [ 99%]
python/datafusion/dataframe_formatter.py::datafusion.dataframe_formatter.configure_formatter PASSED [ 99%]
python/datafusion/dataframe_formatter.py::datafusion.dataframe_formatter.get_formatter PASSED [ 99%]
python/datafusion/dataframe_formatter.py::datafusion.dataframe_formatter.reset_formatter PASSED [ 99%]
python/datafusion/dataframe_formatter.py::datafusion.dataframe_formatter.set_formatter PASSED [100%]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice -- I checked https://github.com/apache/datafusion-python/actions/runs/22676065624/job/65741468321 (a recent run from main) and indeed those tests are not there


python/tests/test_view.py::test_register_filtered_dataframe PASSED       [ 99%]
python/tests/test_wrapper_coverage.py::test_datafusion_missing_exports PASSED [100%]

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- thanks @ntjohnson1

@timsaucer any concern with merging this one?

@alamb
Copy link
Contributor

alamb commented Mar 5, 2026

I am going to say let's get this party started!

@alamb alamb merged commit 231ed2b into apache:main Mar 5, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants