-
Notifications
You must be signed in to change notification settings - Fork 167
Backend-Agnostic Refactor Using Array API Compatibility #2019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 things to simplify things:
- Ignore the benchmarks for now, including the
gen_adata
there. There is anothergen_adata
in https://github.com/scverse/anndata/blob/main/src/anndata/tests/helpers.py that you can add a jax type to much more easily (i.e.,X_type
or orDEFAULT_KEY_TYPES
), and then you will get the entire test suit for free running against jax. Specifically, I would add it as a type forobsm
,varm
,X
, andlayers
while ignoring the otehrs. It doesn't make sense to use it inobs
orvar
, for example - No need to use the array_api where it's not needed (like for sparse as you point out). I would remove it from all the other subsetting functions and only have it in the default
single_dispatch
case - The test file you have is great! I would start there if you feel overwhelmed by the full test suite
src/anndata/tests/helpers.py
Outdated
@@ -103,6 +142,54 @@ def gen_vstr_recarray(m, n, dtype=None): | |||
) | |||
|
|||
|
|||
# def gen_vstr_recarray( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to worry bout jax for this! I don't even think strings are in the array api: https://data-apis.org/array-api/latest/API_specification/data_types.html#data-types
…d reverting some files to the original
Codecov ReportAttention: Patch coverage is
❌ Your project check has failed because the head coverage (78.09%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.
Additional details and impacted files@@ Coverage Diff @@
## main #2019 +/- ##
==========================================
- Coverage 87.47% 78.09% -9.38%
==========================================
Files 46 47 +1
Lines 7057 7332 +275
==========================================
- Hits 6173 5726 -447
- Misses 884 1606 +722
🚀 New features to boost your workflow:
|
if all( | ||
isinstance(x, Iterable) and not isinstance(x, (str, bytes)) for x in subset_idx | ||
): | ||
subset_idx = xp.ix_(*subset_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chekc type of subset_idx
against what is required from https://data-apis.org/array-api/latest/API_specification/indexing.html#indexing
This PR refactors the gen_adata function in benchmarks/utils.py to support backend-agnostic generation of AnnData objects, using the array-api-compat interface to infer and operate with the correct array namespace (e.g., NumPy or JAX). The goal is to allow tests and benchmarking utilities to seamlessly run across supported array backends without manual rewrites. The function now detects the namespace from a real array object instead of relying on a global fallback and uses backend-specific RNG to generate observation and variable metadata. I also updated the related test to parameterize over NumPy and JAX backends. However, I'm currently running into an issue where passing a scipy.sparse.csr_matrix into get_namespace() raises a TypeError, since sparse matrices are not supported by array-api-compat. As a workaround, I’m generating a dense array first to infer the namespace, and only falling back to sparse when explicitly requested in the attribute set. This works fine for NumPy but breaks for JAX, as sparse support isn’t available there. Still trying to figure out whether to raise, skip, or handle this more gracefully depending on the backend. Would appreciate any thoughts on best practices here.