diff --git a/docs/disciplines/Data_Science/Data_Curation/.pages b/docs/disciplines/Data_Science/Data_Curation/.pages index 6b20de06..25b10580 100644 --- a/docs/disciplines/Data_Science/Data_Curation/.pages +++ b/docs/disciplines/Data_Science/Data_Curation/.pages @@ -2,5 +2,7 @@ title: Data Curation nav: - index.md + - snakemake.md + - orcestra.md - IO_Clinical_Trial_Curation - - Non-IO Clinical Trial Curation: Non_IO_Clinical_Trial_Curation \ No newline at end of file + - Non-IO Clinical Trial Curation: Non_IO_Clinical_Trial_Curation diff --git a/docs/disciplines/Data_Science/Data_Curation/orcestra.md b/docs/disciplines/Data_Science/Data_Curation/orcestra.md new file mode 100644 index 00000000..5dfcfe4f --- /dev/null +++ b/docs/disciplines/Data_Science/Data_Curation/orcestra.md @@ -0,0 +1,8 @@ +# ORCESTRA + +ORCESTRA is a platform that enables the creation of reproducible data objects for biomedical research. It integrates data from various sources, processes it using standardized pipelines, and packages it into versioned data objects that can be easily shared and cited. + +## See Also + +- [Snakemake](snakemake.md) +- [ORCESTRA Version Controlling](../../../software_development/Version_Control/orcestra_vc.md) diff --git a/docs/disciplines/Data_Science/Data_Curation/snakemake.md b/docs/disciplines/Data_Science/Data_Curation/snakemake.md new file mode 100644 index 00000000..0387e40e --- /dev/null +++ b/docs/disciplines/Data_Science/Data_Curation/snakemake.md @@ -0,0 +1,15 @@ +# Snakemake + +[Snakemake](https://snakemake.github.io) is a workflow management system that allows you to create reproducible and scalable data analysis pipelines. It is particularly useful for bioinformatics and data science projects, where complex workflows often involve multiple steps and dependencies. For more general information and tutorials, visit their [official documentation](https://snakemake.readthedocs.io/en/stable/). + +## Usage in the Lab + +Many of our internal data processing pipelines are built using Snakemake, such as the [RNA-seq Kallisto pipeline](../../Bioinformatics/Tools/RNAseq_Pipelines/kallisto.md#usage), and we also use it to run pipelines for the [ORCESTRA](orcestra.md) platform. + +Using Snakemake with the SLURM executor plugin allows us to efficiently manage and execute workflows on high-performance computing clusters, namely H4H. This is especially helpful for large-scale data processing tasks that require significant computational resources and time. + +We host many of our Snakemake workflows, such as ORCESTRA PSet processing pipelines, in our [BHKLAB_DataProcessing Github organization](https://github.com/BHKLAB-DataProcessing). + +## See Also + +- [BHKLAB H4H Website](https://bhklab.github.io/HPC4Health/)