Skip to content

Report ingestSummary even if AnnData extraction fails (SCP-5973) #2243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 22, 2025

Conversation

bistline
Copy link
Contributor

BACKGROUND & CHANGES

This fixes a corner case in Mixpanel reporting where AnnData files that fail during the initial extraction phase do not report an ingestSummary event. These events are the main indication SCP has in Mixpanel for AnnData ingest as only one event is ever sent per AnnData file. The lack of these extraction events means we are underreporting on AnnData ingest attempts. Now, if the initial extraction fails and is not going to be retried due to an OOM exception, the summary is sent.

Also, this fixes a regression introduced in #2242 where metadata properties in AnnData files that have names ending in values that match ontology-based properties (e.g. author_cell_type matching to cell_type) incorrectly attempt to validate that column. Now, exact name matches are enforced rather than relying on endsWith().

MANUAL TESTS

  1. Boot all services, sign in, and go to the upload wizard for a new/empty study and select the AnnData UX
  2. Upload the compliant_liver.h5ad AnnData file
    • Specify raw_location: does_not_exist in the Expression tab to ensure the extraction fails
  3. Complete the upload and wait for the job to fail
  4. Go the the Dev Mixpanel events queue and search for an ingestSummary event
  5. Confirm you see your upload with jobStatus: failed and numFilesExtracted: 0

@bistline bistline requested a review from eweitz April 17, 2025 17:25
Copy link

codecov bot commented Apr 17, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.83%. Comparing base (9381539) to head (a464021).
Report is 7 commits behind head on development.

Additional details and impacted files

Impacted file tree graph

@@               Coverage Diff               @@
##           development    #2243      +/-   ##
===============================================
+ Coverage        70.77%   70.83%   +0.06%     
===============================================
  Files              331      331              
  Lines            28306    28311       +5     
  Branches          2476     2476              
===============================================
+ Hits             20033    20055      +22     
+ Misses            8130     8113      -17     
  Partials           143      143              
Files with missing lines Coverage Δ
app/javascript/lib/validation/validate-anndata.js 84.67% <100.00%> (+0.22%) ⬆️
app/models/ingest_job.rb 55.97% <100.00%> (+0.20%) ⬆️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bistline bistline added the build failure: false positive Build error confirmed as false positive. E.g. upstream service has a problem. label Apr 17, 2025
Copy link
Member

@eweitz eweitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good! Looking forward to more accurate AnnData ingest metrics.

@@ -1115,10 +1115,20 @@ def extracted_raw_counts?(job)
extract_params.include?('raw_counts')
end

# determine if this job qualifies for sending an ingestSummary event
# will return false if summary exists, this is a DE job, or
# a successful AnnData extract (meaning downstream jobs are running)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helpful note! Looking only at the clause in the code, I was confused.

@bistline bistline merged commit d74fe21 into development Apr 22, 2025
4 of 5 checks passed
@github-actions github-actions bot deleted the jb-anndata-summary-fail branch April 22, 2025 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build failure: false positive Build error confirmed as false positive. E.g. upstream service has a problem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants