bhklab · mtran-code · Aug 26, 2025
diff --git a/docs/disciplines/Data_Science/Data_Curation/.pages b/docs/disciplines/Data_Science/Data_Curation/.pages
@@ -2,5 +2,7 @@ title: Data Curation
 
 nav:
     - index.md
+    - snakemake.md
+    - orcestra.md
     - IO_Clinical_Trial_Curation
-    - Non-IO Clinical Trial Curation: Non_IO_Clinical_Trial_Curation
+    - Non-IO Clinical Trial Curation: Non_IO_Clinical_Trial_Curation
diff --git a/docs/disciplines/Data_Science/Data_Curation/orcestra.md b/docs/disciplines/Data_Science/Data_Curation/orcestra.md
@@ -0,0 +1,8 @@
+# ORCESTRA
+
+ORCESTRA is a platform that enables the creation of reproducible data objects for biomedical research. It integrates data from various sources, processes it using standardized pipelines, and packages it into versioned data objects that can be easily shared and cited.
+
+## See Also
+
+- [Snakemake](snakemake.md)
+- [ORCESTRA Version Controlling](../../../software_development/Version_Control/orcestra_vc.md)
diff --git a/docs/disciplines/Data_Science/Data_Curation/snakemake.md b/docs/disciplines/Data_Science/Data_Curation/snakemake.md
@@ -0,0 +1,15 @@
+# Snakemake
+
+[Snakemake](https://snakemake.github.io) is a workflow management system that allows you to create reproducible and scalable data analysis pipelines. It is particularly useful for bioinformatics and data science projects, where complex workflows often involve multiple steps and dependencies. For more general information and tutorials, visit their [official documentation](https://snakemake.readthedocs.io/en/stable/).
+
+## Usage in the Lab
+
+Many of our internal data processing pipelines are built using Snakemake, such as the [RNA-seq Kallisto pipeline](../../Bioinformatics/Tools/RNAseq_Pipelines/kallisto.md#usage), and we also use it to run pipelines for the [ORCESTRA](orcestra.md) platform.
+
+Using Snakemake with the SLURM executor plugin allows us to efficiently manage and execute workflows on high-performance computing clusters, namely H4H. This is especially helpful for large-scale data processing tasks that require significant computational resources and time.
+
+We host many of our Snakemake workflows, such as ORCESTRA PSet processing pipelines, in our [BHKLAB_DataProcessing Github organization](https://github.com/BHKLAB-DataProcessing).
+
+## See Also
+
+- [BHKLAB H4H Website](https://bhklab.github.io/HPC4Health/)