Skip to content

Is Array-In -> Scalar-Out OK? #38

@mdhaber

Description

@mdhaber

The Array API Standard states:

Apart from array object attributes, such as ndim, device, and dtype, all operations in this standard return arrays (or tuples of arrays).

NEP 56 argued that this was not high enough priority to consider during the NumPy 2.0 transition:

Given that array scalars implement a largely array-compatible interface, this doesn’t seem like the highest-prio item regarding array API standard compatibility (or in general).

Consequently, NumPy 2.x is not compatible with this aspect of the standard. If there are other libraries in the ecosystem that run into problems because of this, I'd like to discuss it at the summit. My main goal would be to take a poll of whether the NEP 56 assessment is accurate. If not, I'd like to understand what resolution(s) projects would support. For instance:


For background, NumPy scalars (e.g. np.float64(1.)) are not instances of the fundamental NumPy array type (ndarray). Although NumPy scalars have many attributes of arrays and are accepted like 0-d arrays by many NumPy functions, they do not have all attributes required of arrays (e.g. mT) and they do not support boolean index assignment like true NumPy arrays1.

Many NumPy functions accept NumPy arrays and return NumPy scalars when the standard would require the return value to be a 0D array. For instance:

import numpy as np
x = np.arange(10.)
x[0]  # scalar
np.mean(x)  # scalar
y = np.asarray(1.)  # array
1.*y  # scalar

SciPy frequently runs into two kinds of problems because of this:

  • We try to follow an array type in = array type out rule. If SciPy functions forget to explicilty cast the result to an array immediately before return, they are likely to inadvertently returning a NumPy scalar.
  • We sometimes rely on the arrays be mutable during calculations2. If the result of an intermediate calculation is a NumPy scalar instead of a proper array, it needs to be explicitly cast to an array to make it mutable.

These problems can be worked around by sprinkling xp.asarray liberally throughout the codebase, but this doesn't seem like an ideal long-term solution. Rather than having all NumPy-dependent libraries work around this standard incompatibility on a line-by-line basis, I'd like to address it at the source.

Footnotes

  1. It might be argued that they don't need to support boolean index assignment to be considered "arrays", since the standard seems to make accomodations for immutable array types. In that case, note that the NumPy scalar types do not support dimensionality other than 0 or size other than 1. So while it is true that NumPy scalars are almost interchangeable with NumPy 0d arrays (aside from the missing attribute(s) and behaviors), I doubt that they can be considered standard-compatible according to the current wording of the standard.

  2. These days, we can't really rely on mutability of array types, and we are starting to use array_api_extra.at for all in-place-like operations. But this is actually a whole other can of worms that needs to be discussed at some point.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Data analysisStatistics, optimization, sparse data, plotting, and dataframes.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions