Skip to content

Lazy load legacy imports in skimage top module #6892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Dec 10, 2024

Conversation

lagru
Copy link
Member

@lagru lagru commented Apr 13, 2023

Description

Closes #6891
Closes #6904

Let me know if I should enable lazy loading for other submodules as well.

Checklist

  • Docstrings for all functions
  • Gallery example in ./doc/examples (new features only)
  • Benchmark in ./benchmarks, if your changes aren't covered by an
    existing benchmark
  • Unit tests
  • Clean style in the spirit of PEP8
  • Descriptive commit messages (see below)

For reviewers

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.
  • There is a bot to help automate backporting a PR to an older branch. For
    example, to backport to v0.19.x after merging, add the following in a PR
    comment: @meeseeksdev backport to v0.19.x
  • To run benchmarks on a PR, add the run-benchmark label. To rerun, the label
    can be removed and then added again. The benchmark output can be checked in
    the "Actions" tab.

@lagru lagru added 🔧 type: Maintenance Refactoring and maintenance of internals 📈 type: Performance labels Apr 13, 2023
These functions are advertised in skimage's docstring. I don't really
see a reason why `data_dir` must be public in the root namespace. So
keep it hidden.
@lagru lagru force-pushed the expand-lazy-loading branch from d1c8afe to 1987eee Compare April 14, 2023 15:55
@stefanv
Copy link
Member

stefanv commented Apr 14, 2023

Hrm, I think this is the opposite of what I thought it would be like; we don't want those functions visible in __all__.

@lagru
Copy link
Member Author

lagru commented Apr 15, 2023

Oh, then I misunderstood you? To avoid further confusion: Do you want to skimage.img_as_float64 and others to still work but not be advertised by code completion, or included in from skimage import *?

That might be more tricky to achieve. I think most code completion tools pick up whats imported in __init__.pyi and use that. I guess the option would be to wrap patch __getattr__ in skimage/__init__.py...

Curiously, this still doesn't include __version__ in skimage.__all__.
Maybe, this is a bug or behavior in lazy_loader that should be updated?
@lagru
Copy link
Member Author

lagru commented Apr 25, 2023

I'm not sure whether I discovered a bug here in lazy_loader: even though "__version__" is included in the __all__ attribute of the PYI file, skimage.__all__ doesn't add runtime.

Edit: E.g. with the current state __version__ doesn't show up in import skimage; skimage.__all__.

This comment was marked as outdated.

@github-actions github-actions bot added the 😴 Dormant No recent activity label Nov 13, 2023
@stefanv
Copy link
Member

stefanv commented Sep 4, 2024

I'm not sure whether I discovered a bug here in lazy_loader: even though "__version__" is included in the __all__ attribute of the PYI file, skimage.__all__ doesn't add runtime.

Edit: E.g. with the current state __version__ doesn't show up in import skimage; skimage.__all__.

Probably because this is defined in the pyi file, instead of in the __init__.py.

@stefanv
Copy link
Member

stefanv commented Nov 19, 2024

This is pretty close to being ready; @lagru want to push it over the line?

@lagru
Copy link
Member Author

lagru commented Nov 20, 2024

Okay

import skimage
import sys
for x in sys.modules.keys():
    if "skimage" in x:
        print(x)

only returns

skimage._shared
skimage._shared.tester
skimage

now. And skimage.__all__ is available as well. Should be good to go in.

@lagru
Copy link
Member Author

lagru commented Nov 20, 2024

Hmm, one of the changes doesn't seem to play well with ski.util.lookfor. Locally, I get a segmentation fault when trying to test lookfor though pytest.

@jarrodmillman jarrodmillman added this to the 0.25 milestone Nov 20, 2024
I'm not sure why the order changed. Likely changing the import behavior
to use lazy loading fully at the top-level of `skimage`, changes the
order in which candidates are found by `lookfor` as well.

Try if this addresses all test failures.
@lagru lagru force-pushed the expand-lazy-loading branch from 6239648 to b4068da Compare November 22, 2024 12:23
@lagru
Copy link
Member Author

lagru commented Nov 24, 2024

Hmm, this actually reveals new aspects. Currently, functions such as img_as_float are available in the top namespace skimage as well as skimage.util. My intention was to make them available via lazy loading but not advertise them by not including them in __all__. But it turns out, that lazy_loader.attach_stub simply ignores __all__ defined in the stub and includes everything that is imported.

I'm still considering how to address this. So let's remove this from the 0.25 for now.

@lagru lagru modified the milestones: 0.25, 0.26 Nov 24, 2024
attach_stub will include every import it finds in the PYI file. However,
we no longer want to advertise utility functions in the top-level
namespace. This seems to be a workaround, that keeps things like
`skimage.img_as_float` usable at runtime, but doesn't expose them via
`__all__` or `__dir__`.

Additionally, this avoids listing these utility functions twice in
different places of our HTML documentation with Sphinx.
If .py and .pyi file were to define different __all__ (by using the one
returned by `attach_stub`), different tools might behave inconsistent,
depending on which file they prefer. Let's avoid that by being explicit.

Let's raise this issue upstream with lazy_loader too.
Comment on lines 17 to 18
# `attach_stub` currently ignores __all__ inside the stub file and simply
# returns every lazy-imported object, so we need to define `__all__` again.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See scientific-python/lazy-loader#133 for a proposal to address this duplication.

@lagru lagru modified the milestones: 0.26, 0.25 Nov 27, 2024
@lagru
Copy link
Member Author

lagru commented Nov 27, 2024

@jarrodmillman, I added it to the 0.25 milestone because it should be ready to merge now. But feel very welcome to bump it back to 0.26 again, if you'd rather just prioritize the release. 😊

@lagru lagru requested a review from mkcor November 30, 2024 19:01
'segmentation',
'transform',
'util',
'__version__',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is __doc__ going away? I recently read something about __doc__ changing soon, but maybe it was just its parsing...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where __doc__ was previously included? Am I misunderstanding?

Not sure, but I don't think we need to advertise __doc__ in __all__. It's an implementation detail of Python and it's available at runtime for all intents and purposes, e.g., help(skimage).

I just put __version__ in here because it might be useful to advertise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wasn't referring to anything 'previously included' or not and, in the context of this PR, I should have!

I was thinking 'in the absolute,' i.e., when I

import skimage as ski

and I run

ski.[tab]

which objects do I expect to be available?

lagru and others added 4 commits December 4, 2024 16:43
Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>
The objects available via both functions should be consistent.
Approach inspired by https://discuss.python.org/t/4202/.
Not sure if the previous approach would ever terminate the opened
process. I feel like using a dedicated function and `check_output`
simplifies things a lot and also keeps the global namespace cleaner.
It seems easier to import PytestTester as a private symbol from the
start instead of deleting it manually.
@lagru
Copy link
Member Author

lagru commented Dec 4, 2024

Okay, I fixed the inconsistency between __dir__ and __getattr__ (see 7452969) and took the opportunity to make a few, in my opinion, simplifying changes. I can revert those if you disagree.

Comment on lines 19 to 21
"""Add lazy-loaded attributes to keep consistent with `__getattr__`."""
patched_dir = {*globals().keys(), *__lazy_dir__()}
return sorted(patched_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is fixed by this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lazy loading expands the module attributes that are available via __getattr__. These new attributes are missing from the default __dir__() / dir(skimage). If we use __dir__ returned from attach_stub, then a whole bunch of pre-existing attributes like __file__, __doc__, etc. are missing. Both are inconsistent and it seemed like a good opportunity to preempt weird or confusing behavior by code inspection tools, etc. that don't expect messing with __dir__.

Spelling this out, I realize that this is also a problem with our subpackages that just use __dir__ from attach_stub as is. Not sure if it's worth it to go through all these. Neither am I sure how to conveniently address this from lazy_loader's site. globals() returns the scope from where the function is defined, so lazy_loader would need a different way to access the correct global scope...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we want to expose things like __file__ etc.? __dir__ looks exactly like what I would expect atm:

In [1]: dir(ski)
Out[1]:
['__version__',
 'color',
 'data',
 'draw',
 'exposure',
 'feature',
 'filters',
 'future',
 'graph',
 'io',
 'measure',
 'metrics',
 'morphology',
 'registration',
 'restoration',
 'segmentation',
 'transform',
 'util']

You can use various hacks to get hold of other attributes, but these are the ones we want our users to see.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really need to expose more (and I don't think we should), then you could:

def __dir__():
    return [some stuff] + __lazy_dir__()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we want to expose things like file etc.?

For the reasons stated above. Hiding builtin attributes feels very magical and unexpected to me, and like something that might lead confusing bugs. I'd like tab-completion in ipython to show me everything that's available and not a curated list.

Everyone is aware that _ means private and is used to __builtin_stuff__. Why would we need to hide these?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I do recognize all your arguments, I just don't find them as convincing as you do.

This is a common problem I run into 🙃

Since you'd like to see a change in behavior, I think you should make a convincing case for (a) why this is useful (and not confusing) to the user and (b) that it does not introduce unnecessary complexity (we could improve the code to make that the case, so I'm not too worried about this part).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe more importantly, we are way overdue on the release, so a friendly request that we merge this PR without the change to __dir__ and then do additional follow-up once we come to an agreement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added it as a talking point to today's meeting notes.

Anyway, I'm happy to iterate on this particular issue in another PR. And I'm not seeing this issue as a blocker for the release in the first place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted to status quo in 0a275cd with regards to hiding stuff in __dir__.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python's docs seem to be compatible with your stance @stefanv:

The resulting list is not necessarily complete and may be inaccurate when the object has a custom __getattr__().

[...] it attempts to produce the most relevant, rather than complete, information: [...]

Because dir() is supplied primarily as a convenience for use at an interactive prompt, it tries to supply an interesting set of names more than it tries to supply a rigorously or consistently defined set of names, and its detailed behavior may change across releases.

lagru and others added 4 commits December 6, 2024 19:19
Co-authored-by: Stefan van der Walt <stefanv@berkeley.edu>
Co-authored-by: Marianne Corvellec <marianne.corvellec@ens-lyon.org>
This flag used to be set in setup.py, which is no longer the case. So
it should be safe to remove.

Co-authored-by: Stefan van der Walt <stefanv@berkeley.edu>
There are no other good ways to get a list of subpackags and their
descriptions locally.

Also make sure that the newly documented attribute `__version__` isn't
confused up by the documentation tooling with the actual line where
`__version__` is defined.
@lagru lagru force-pushed the expand-lazy-loading branch from 8ea157d to 6d7a770 Compare December 6, 2024 20:12
lagru added 3 commits December 7, 2024 12:40
There's no consensus on whether builtin private attributes like
`__file__`, etc. should be returned by __dir__. So keep the status quo
in that regard but still hide former utility functions we no longer want
to advertise.
@stefanv stefanv merged commit a2a80b7 into scikit-image:main Dec 10, 2024
26 checks passed
@stefanv
Copy link
Member

stefanv commented Dec 10, 2024

Thanks Lars!

@lagru lagru deleted the expand-lazy-loading branch December 10, 2024 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔧 type: Maintenance Refactoring and maintenance of internals
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Module skimage has no __all__ attribute at runtime Lazy loading of skimage.data is rendered ineffective by import of data_dir
4 participants