Skip to content

Backend-Agnostic Refactor Using Array API Compatibility #2019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

amalia-k510
Copy link

This PR refactors the gen_adata function in benchmarks/utils.py to support backend-agnostic generation of AnnData objects, using the array-api-compat interface to infer and operate with the correct array namespace (e.g., NumPy or JAX). The goal is to allow tests and benchmarking utilities to seamlessly run across supported array backends without manual rewrites. The function now detects the namespace from a real array object instead of relying on a global fallback and uses backend-specific RNG to generate observation and variable metadata. I also updated the related test to parameterize over NumPy and JAX backends. However, I'm currently running into an issue where passing a scipy.sparse.csr_matrix into get_namespace() raises a TypeError, since sparse matrices are not supported by array-api-compat. As a workaround, I’m generating a dense array first to infer the namespace, and only falling back to sparse when explicitly requested in the attribute set. This works fine for NumPy but breaks for JAX, as sparse support isn’t available there. Still trying to figure out whether to raise, skip, or handle this more gracefully depending on the backend. Would appreciate any thoughts on best practices here.

Copy link
Contributor

@ilan-gold ilan-gold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 things to simplify things:

  1. Ignore the benchmarks for now, including the gen_adata there. There is another gen_adata in https://github.com/scverse/anndata/blob/main/src/anndata/tests/helpers.py that you can add a jax type to much more easily (i.e., X_type or or DEFAULT_KEY_TYPES), and then you will get the entire test suit for free running against jax. Specifically, I would add it as a type for obsm, varm, X, and layers while ignoring the otehrs. It doesn't make sense to use it in obs or var, for example
  2. No need to use the array_api where it's not needed (like for sparse as you point out). I would remove it from all the other subsetting functions and only have it in the default single_dispatch case
  3. The test file you have is great! I would start there if you feel overwhelmed by the full test suite

@@ -103,6 +142,54 @@ def gen_vstr_recarray(m, n, dtype=None):
)


# def gen_vstr_recarray(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to worry bout jax for this! I don't even think strings are in the array api: https://data-apis.org/array-api/latest/API_specification/data_types.html#data-types

Copy link

codecov bot commented Jul 7, 2025

Codecov Report

Attention: Patch coverage is 38.07829% with 174 lines in your changes missing coverage. Please review.

Project coverage is 78.09%. Comparing base (a51582d) to head (f5c387c).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/anndata/_core/index_modified.py 19.28% 159 Missing ⚠️
src/anndata/tests/helpers.py 82.14% 15 Missing ⚠️

❌ Your project check has failed because the head coverage (78.09%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

❗ There is a different number of reports uploaded between BASE (a51582d) and HEAD (f5c387c). Click for more details.

HEAD has 3 uploads less than BASE
Flag BASE (a51582d) HEAD (f5c387c)
4 1
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2019      +/-   ##
==========================================
- Coverage   87.47%   78.09%   -9.38%     
==========================================
  Files          46       47       +1     
  Lines        7057     7332     +275     
==========================================
- Hits         6173     5726     -447     
- Misses        884     1606     +722     
Files with missing lines Coverage Δ
src/anndata/tests/helpers.py 79.42% <82.14%> (-13.65%) ⬇️
src/anndata/_core/index_modified.py 19.28% <19.28%> (ø)

... and 20 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

if all(
isinstance(x, Iterable) and not isinstance(x, (str, bytes)) for x in subset_idx
):
subset_idx = xp.ix_(*subset_idx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chekc type of subset_idx against what is required from https://data-apis.org/array-api/latest/API_specification/indexing.html#indexing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants