Skip to content

Added op3loader #926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open

Added op3loader #926

wants to merge 27 commits into from

Conversation

Olga013
Copy link

@Olga013 Olga013 commented May 22, 2025

The continuation of Paulos' PR:
#921

Copy link

@szalata szalata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work!
I have added comments about a few things that need to be fixed, primarily op3_loader.sh

params_file="/tmp/datasets_openproblems_neurips2022_params.yaml"

## TODO: fill in the parameters below
echo "TODO: fill in the parameters below" && exit 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this had to be fixed/removed


set -e

params_file="/tmp/datasets_openproblems_neurips2022_params.yaml"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is about another dataset

echo "TODO: fill in the parameters below" && exit 1

cat > "$params_file" << 'HERE'
param_list:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this concerns another dataset, needs to be updated with information from our dataset, e.g. from the perturbation prediction repo

@rcannood rcannood mentioned this pull request May 23, 2025
9 tasks
Copy link

@szalata szalata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Thank you for adding it!
@rcannood unless you see more issues, we could test it and merge

@Olga013
Copy link
Author

Olga013 commented May 29, 2025

@rcannood, I also have some questions. Would it be possible for you to clarify them?

  1. I had a problem with normalization_methods, I did not manage to choose one when I specify them in the command line. Every time I got a default set of them containing 4 or 5 methods:
    https://github.com/Olga013/openproblems/blob/e9618766a374f1e744fd9292d4ef4a6f96f46d96/src/datasets/resource_scripts/op3_loader_test.sh#L23

    However, I managed to choose one method by creating and using a params file like this: https://github.com/Olga013/openproblems/blob/e9618766a374f1e744fd9292d4ef4a6f96f46d96/src/datasets/resource_scripts/op3_loader.sh#L16

    Do you have some suggestions, why this may happen?

  2. I found that the Nextflow script related to process_op3 workflow contained the function extract_metadata. But when I ran this workflow it did not work until I changed the function name from extract_metadata to extract_uns_metadata. In addition, I found that other alternative scripts contain the extract_metadata function (e.g. extract_metadata in the process_openproblems_v1 workflow).

    Do we need to change the rest of them to extract_uns_metadata?

  3. Do we need to add the sequence of commands for running the loader and processor workflows to the README.md file?

  4. Which the Nextflow config file is better to use (in which cases we refer to nextflow_helpers and in which cases we create our own file)? common/nextflow_helpers/labels_ci.config or /tmp/nextflow.config.

  5. Do we need to specify the requirements on the site that the version of the Nextflow should not be higher than 25.02.0-edge because of this issue? (I used 25.01.0-edge)

@rcannood
Copy link
Member

rcannood commented Jun 6, 2025

I had a problem with normalization_methods, I did not manage to choose one when I specify them in the command line. Every time I got a default set of them containing 4 or 5 methods:
https://github.com/Olga013/openproblems/blob/e9618766a374f1e744fd9292d4ef4a6f96f46d96/src/datasets/resource_scripts/op3_loader_test.sh#L23

Not sure what went wrong here, but bash commands shouldn't include colons.

So this:

  --dataset_reference: GSE279945 \
  --normalization_methods: log_cp10k \

Should be:

  --dataset_reference GSE279945 \
  --normalization_methods log_cp10k \
  1. I found that the Nextflow script related to process_op3 workflow contained the function extract_metadata. But when I ran this workflow it did not work until I changed the function name from extract_metadata to extract_uns_metadata. In addition, I found that other alternative scripts contain the extract_metadata function (e.g. extract_metadata in the process_openproblems_v1 workflow).

Do we need to change the rest of them to extract_uns_metadata?

Indeed, this should be extract_uns_metadata!

  1. Do we need to add the sequence of commands for running the loader and processor workflows to the README.md file?

Sure, that would help!

  1. Which the Nextflow config file is better to use (in which cases we refer to nextflow_helpers and in which cases we create our own file)? common/nextflow_helpers/labels_ci.config or /tmp/nextflow.config.

Depends on the needs of the dataset loader. If labels_ci.config works, then use that :)

  1. Do we need to specify the requirements on the site that the version of the Nextflow should not be higher than 25.02.0-edge because of this issue? (I used 25.01.0-edge)

This can be solved by updating viash_version in _viash.yaml to 0.9.4 !

@Olga013
Copy link
Author

Olga013 commented Jun 13, 2025

@rcannood, @szalata I think that PR is ready to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants