Added op3loader #921

Paulos2411 · 2025-02-26T08:20:47Z

Describe your changes

Created a new data loader which takes in sc data.

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE279945

Issue ticket number and link

Focuses on the Improvement of the task pertubation prediction

openproblems-bio/task_perturbation_prediction#86

Checklist before requesting a review

I have performed a self-review of my code
Check the correct box. Does this PR contain:
- Breaking changes
- New functionality
- Major changes
- Minor changes
- Bug fixes
- Documentation
Proposed changes are described in the CHANGELOG.md
CI Tests succeed and look good!

Link: https://openproblems.bio/documentation/advanced_topics/create_a_dataset_loader

rcannood · 2025-02-26T18:43:59Z

Thanks for contributing this, @Paulos2411 ! Let me know when you think this is ready for a review ☺️

szalata · 2025-03-18T14:37:12Z

src/datasets/loaders/scrnaseq/op3_loader/script.py

+
+    return filtered_adata
+
+def filter_by_counts(adata, par):


what does this filtering correspond to in OP3, @Paulos2411 ?

should be ok nevertheless. In OP3 benchmark it's covered later by filterbyexpr

Paulos2411 · 2025-03-18T17:06:33Z

Hi @rcannood, I think the PR is now ready to review :)

szalata · 2025-03-18T17:54:44Z

@Paulos2411 , seems to me that it's still missing the fields we discussed, like hvg, feature_name etc. That is, the obligatory fields from here: https://openproblems.bio/documentation/fundamentals/datasets

szalata · 2025-03-18T18:10:59Z

@Paulos2411 this is missing a workflow, like https://github.com/openproblems-bio/openproblems/blob/main/src/datasets/workflows/scrnaseq/process_openproblems_v1/main.nf

szalata · 2025-03-18T18:14:37Z

loader is mixed with filtering here. I wonder if filtering wouldn't fit better in "processors". filtering here is specific to OP3, but I suppose it's ok to create a dataset-specific processor

szalata · 2025-03-22T19:45:01Z

.gitignore

you shouldn't be adding your local unnecessary file to .gitignore

adding this to a review

szalata

Thank you for the contribution! I made a few comments

szalata · 2025-03-22T19:45:19Z

.gitignore

adding this to a review

szalata · 2025-03-22T19:47:12Z

src/datasets/loaders/scrnaseq/op3_loader/config.vsh.yaml

+  name: op3_loader
+  namespace: datasets/loaders/scrnaseq
+  description: |
+    Loads and preprocesses the OP3 dataset from GEO accession GSE279945.


what level of preprocessing can be left in the "loader", @rcannood ?

szalata · 2025-03-22T19:47:45Z

src/datasets/loaders/scrnaseq/op3_loader/script.py

+import pandas as pd
+import numpy as np
+import requests
+from tqdm import tqdm # liberary for displaying progress bars


spelling of the comment + this import comment can be skipped

szalata · 2025-03-22T19:51:03Z

src/datasets/loaders/scrnaseq/op3_loader/script.py

+
+    logger.info("Done")
+
+if __name__ == "__main__":


no need to define main as a separate function and calling it here - i.e. the main code should be written directly in scripts

can stay the way it is

szalata · 2025-03-22T19:53:00Z

src/datasets/loaders/scrnaseq/op3_loader/test.py

+    assert adata.uns["dataset_description"] == "Test description for OP3 dataset", "Incorrect .uns['dataset_description']"
+
+
+if __name__ == '__main__':


once again just write the script directly in the file - these files are not expected to be ever imported elsewhere

you can have a look at other similar scripts in the repo

szalata · 2025-03-22T19:53:53Z

src/datasets/workflows/scrnaseq/process_op3/config.vsh.yaml

+name: process_op3
+namespace: datasets/workflows/scrnaseq
+description: |
+  Fetch and process datasets from the Open Problems Perurbation Prediction (OP3) dataset.


"process" can encompass a lot - would be great to make it more specific

szalata · 2025-03-22T19:54:55Z

src/datasets/workflows/scrnaseq/process_op3/main.nf

have you managed to execute this nextflow pipeline, @Paulos2411 and check the output?

rcannood · 2025-05-23T13:21:26Z

Superseded by #926

Paulos2411 added 2 commits February 25, 2025 12:58

Added op3 loader

c03c07f

Minor Changes

09a290d

Updated Code with Output Data

5ca41c8

Paulos2411 force-pushed the main branch from 408a4dc to 5ca41c8 Compare March 11, 2025 18:18

Paulos2411 added 2 commits March 18, 2025 10:34

Minor Updates

88c4b97

Minor Updates

40662e4

szalata reviewed Mar 18, 2025

View reviewed changes

Paulos2411 and others added 2 commits March 18, 2025 18:00

Minor Update

79cbf60

Update script.py

e8b5b74

Paulos2411 marked this pull request as ready for review March 18, 2025 17:03

Paulos2411 requested a review from szalata March 18, 2025 17:07

Paulos2411 added 3 commits March 19, 2025 14:59

Added Workflow

aaf5018

Merge changes

febc600

workflow for op3_loader

b6576cd

szalata reviewed Mar 22, 2025

View reviewed changes

szalata suggested changes Mar 22, 2025

View reviewed changes

Paulos2411 and others added 5 commits March 25, 2025 16:06

Implemented Suggestions

c0c0089

removed space

53b7f0c

No changes

fa99889

move input file to a separate argument

24d66c9

add initial script

eb4f330

szalata mentioned this pull request May 19, 2025

About sc_counts.h5ad theislab/task-dge-perturbation-prediction-analysis#11

Open

Olga013 mentioned this pull request May 22, 2025

Added op3loader #926

Open

rcannood closed this May 23, 2025

		assert adata.uns["dataset_description"] == "Test description for OP3 dataset", "Incorrect .uns['dataset_description']"


		if __name__ == '__main__':

Added op3loader #921

Added op3loader #921

Uh oh!

Conversation

Paulos2411 commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes

Issue ticket number and link

Checklist before requesting a review

Uh oh!

rcannood commented Feb 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Paulos2411 commented Mar 18, 2025

Uh oh!

szalata commented Mar 18, 2025

Uh oh!

szalata commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

szalata commented Mar 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szalata left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rcannood commented May 23, 2025

Uh oh!

Uh oh!

Paulos2411 commented Feb 26, 2025 •

edited

Loading

szalata commented Mar 18, 2025 •

edited

Loading