Skip to content

CI: Make pep8 validation work with multiple files simultaneously #57914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

dontgoto
Copy link
Contributor

@dontgoto dontgoto commented Mar 19, 2024

The validation is bottlenecked due to the repeated sys calls, subprocess spawning, and pep8 calls that only validate a single file each. Running the validation on a list of files is approximately as fast as running it on a single file. This PR makes the first step, enabling the pep8 validation of a list of files, but for now still running it only on a single file. It will be followed by a few more PRs. The end result will be a complete refactoring of validate_docstrings.py, that will then take about 10 seconds to validate all docstrings.

@datapythonista

The validation is bottlenecked due to the repeated sys calls, subprocess spawning, and pep8 calls that only validate a single file each. Running the validation on a list of files is approximately as fast as running it on a single file.
@datapythonista datapythonista added Docs CI Continuous Integration labels Mar 19, 2024
@datapythonista
Copy link
Member

I'm travelling and I could only revew with the phone, but looks good.

Quick question, would it make sense to use ruff instead of flake8 for this?

@dontgoto
Copy link
Contributor Author

Thanks for the quick review!

Quick question, would it make sense to use ruff instead of flake8 for this?

I think it would be slightly faster, but someone would have to configure ruff to behave the same way as we intend flake8 to behave here. On my machine, the whole code_check.sh docstrings validation finishes in about 10 seconds, 7s of those are separate calls to numpydoc.validate (I put in a PR there that brings this down to 5.5s), and the remaining 3s are split between our validate.pyand pep8.

So I think even with ruff we would at best get to 5.5s instead of 8.5s if we keep numpydoc.validate. Might still be worth an issue to discuss the (future) role of ruff here and in other places.

@datapythonista
Copy link
Member

Yep, trying to improve 10s doesn't make sense. I know we are using ruff in the CI nowz bit I didn't check the exact state. The question was more about consistency, I didn't check if we already replaced flake8 with ruff.

@dontgoto
Copy link
Contributor Author

I see ruff being used as part of the pre-commit hooks. And there it let a docstring error that I introduced in a non-replaced docstring through. It's also replaced flake8 for the docstring formatting, but back when I introduced my error, only the code_check.sh showed up as failed.

Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Apr 25, 2024
@mroeschke
Copy link
Member

Thanks for the PR but appears there not much appetite for this change so closing

@mroeschke mroeschke closed this May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Docs Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants