Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/disciplines/Data_Science/Data_Curation/.pages
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,7 @@ title: Data Curation

nav:
- index.md
- snakemake.md
- orcestra.md
- IO_Clinical_Trial_Curation
- Non-IO Clinical Trial Curation: Non_IO_Clinical_Trial_Curation
- Non-IO Clinical Trial Curation: Non_IO_Clinical_Trial_Curation
8 changes: 8 additions & 0 deletions docs/disciplines/Data_Science/Data_Curation/orcestra.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# ORCESTRA

ORCESTRA is a platform that enables the creation of reproducible data objects for biomedical research. It integrates data from various sources, processes it using standardized pipelines, and packages it into versioned data objects that can be easily shared and cited.

## See Also

- [Snakemake](snakemake.md)
- [ORCESTRA Version Controlling](../../../software_development/Version_Control/orcestra_vc.md)
15 changes: 15 additions & 0 deletions docs/disciplines/Data_Science/Data_Curation/snakemake.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Snakemake

[Snakemake](https://snakemake.github.io) is a workflow management system that allows you to create reproducible and scalable data analysis pipelines. It is particularly useful for bioinformatics and data science projects, where complex workflows often involve multiple steps and dependencies. For more general information and tutorials, visit their [official documentation](https://snakemake.readthedocs.io/en/stable/).

## Usage in the Lab

Many of our internal data processing pipelines are built using Snakemake, such as the [RNA-seq Kallisto pipeline](../../Bioinformatics/Tools/RNAseq_Pipelines/kallisto.md#usage), and we also use it to run pipelines for the [ORCESTRA](orcestra.md) platform.

Using Snakemake with the SLURM executor plugin allows us to efficiently manage and execute workflows on high-performance computing clusters, namely H4H. This is especially helpful for large-scale data processing tasks that require significant computational resources and time.

We host many of our Snakemake workflows, such as ORCESTRA PSet processing pipelines, in our [BHKLAB_DataProcessing Github organization](https://github.com/BHKLAB-DataProcessing).

## See Also

- [BHKLAB H4H Website](https://bhklab.github.io/HPC4Health/)
Loading