Harnessing the Data Revolution (HDR) Scientific Mood (Modeling out of Distribution) ML Challenge

Repository for Imageomics' CodaBench challenge as part of the broader HDR Scientific Mood ML Challenge.

We use the same base container as the challenges from the three other HDR Institutes; it is available Coming Soon. There are options to use packages not included in this container as long as they are in the pre-approved list; if your solution requires a package not listed there, please open an issue to request it (provided it is not already requested by someone else).

For more details on participating in the challenge, please see the pages in the CodaBench challenge itself.

To-Do

This repo is still under development as we construct the Y2 SMood Challenge: Beetles as Sentinel Taxa.

Update pages/ to describe Beetle Challenge.
Add metadata and image baselines.
Update ingestion and scoring programs for the format of this challenge (multiple images used for prediction).
Files may have the training CSV or other training data information--not to be added until start of challenge.
Input and Reference data will not be added until after challenge ends (maintained in private repo until then).

Full Bundle Structure

baselines/
    XX_code_submission/
        # Zip the contents of this folder to submit this baseline to the challenge on CodaBench
        clf.pkl
        metadata
        model.py
        requirements.txt
competition.yaml
files/
    ????
Imageomics_logo_butterfly.png
ingestion_program/
    ingestion.py
    metadata.yaml
    whitelist.txt
input_data/
    # Images used for testing submitted models (images must be zipped together directly under input_data, not within a subfolder (`val.zip`, `test.zip`)
pages/
    # These will be all the optional tabs under "Get Started" on the challenge page. They are used to describe the challenge for participants
    data.md
    evaluation.md
    overview.md
    starting_kit.md
    terms.md
reference_data/
    # Labels for the images in input_data/. Can be .txt, .csv, etc.
scoring_program/
    metadata.yaml
    score_combined.py

Notes on Structure

The base container specified in the competition.yaml must have the requirements for the ingestion and scoring programs. These can be manually imported through a requirements.txt, but then each of these files will need to do so at the beginning. It is better to install these requirements in the base container and allow for participants to include a requirements file with their model's needed programs.
- Example: For this challenge, our container must have (at minimum) pillow, pandas, and tqdm. The base container used for this competition will be provided.
- Any requirements used by participants must be on the approved whitelist (or participants must reach out to request their addition) for security purposes.
Scores must be saved to a score.json file where the keys detailed in the Leaderboard section of the competition.yaml are give as the keys for the scores.
This full collection of files and folders is zipped as-is to upload the bundle to CodaBench.
run.sh is a bash script to simulate the scoring process used for the Leaderboard on your local machine. This works by first building the docker container, and then running the bash script within the virtual environment. The script will
- Create a folder /ref for the CSV ground truth file and a folder /res for the generated prediction txt file.
- Run ingestion_program/ingestion.py to get the predictions from your model and output the predictions to the txt file.
- Run scoring_program/score_combined.py to evaluate the predictions by comparing to the ground truth. The final scores are then written to a JSON file.
To test with run.sh, you should provide your own curated validation dataset (e.g. subsampling from train split) including images and a simulated ground truth file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Harnessing the Data Revolution (HDR) Scientific Mood (Modeling out of Distribution) ML Challenge

To-Do

Full Bundle Structure

Notes on Structure

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ingestion_program		ingestion_program
input_data		input_data
pages		pages
reference_data		reference_data
scoring_program		scoring_program
.gitignore		.gitignore
Imageomics_logo_butterfly.png		Imageomics_logo_butterfly.png
LICENSE		LICENSE
README.md		README.md
competition.yaml		competition.yaml
run.sh		run.sh

License

Imageomics/HDR-SMood-Challenge

Folders and files

Latest commit

History

Repository files navigation

Harnessing the Data Revolution (HDR) Scientific Mood (Modeling out of Distribution) ML Challenge

To-Do

Full Bundle Structure

Notes on Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages