Skip to content

Review docstring and possibly raise a userwarning for pearsonr based on preserve_dims=all #703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tennlee opened this issue Sep 27, 2024 · 2 comments · May be fixed by #776
Open

Review docstring and possibly raise a userwarning for pearsonr based on preserve_dims=all #703

tennlee opened this issue Sep 27, 2024 · 2 comments · May be fixed by #776
Labels
documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers

Comments

@tennlee
Copy link
Collaborator

tennlee commented Sep 27, 2024

Issue: pearsonr may be mathematically undefined when all dimensions are preserved, resulting in NaNs being returned. This isn't exactly wrong, but is perhaps not what a user would be expecting and could be made clearer

Solution: This should at least be documented, and possibly a user warning should also be raised in the code

See also: KGE uses pearsonr in its calculation so is also affected - it has this described in the docstring

@nicholasloveday
Copy link
Collaborator

I think a user warning is a good idea

@tennlee tennlee added documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers labels Nov 22, 2024
@nikeethr
Copy link
Collaborator

nikeethr commented Mar 14, 2025

NSE raises an error. Its not whether or not the computation is wrong - a generic tool like numpy can return NaNs when interacting with NaNs (i.e. when the input itself is invalid), but a score is not a generic tool - here the input argument itself is invalid for a score. (for this particular scenario, scores that need to have at least one dimension reduced, must have at least one dimension reduced - this is the case in NSE, and allowing for this computation - where this criteria (>=1 reduced dim) fails - makes the entire score meaningless and may obfuscate genuine computational errors).

NSE does raise a warning however for partially correct data - in the event that the parts that are correct can be used and the rest can be double checked.

In NSE in particular if I'm allowed to preserve all dimensions, I can't compute obs variance - it means I fundamentally don't understand how NSE works or something in my data pipeline is busted - this isn't a warning its a fundamental user input error. A warning is more acceptable if I did specify reduce dims, but I still had insufficient data to do the computation.

I think warnings should be used sparingly - e.g. for partially valid inputs like described earlier where the user may still be interested in proceeding with a computation. For data that is clearly or completely invalid, raising an error should be the default - they aren't road blocks, they just mean the result is incorrect or invalid and needs to be handled appropriately either by handling the exception itself or fixing the source of the error, because any further operation with this result without appropriate handling, is meaningless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants