From 7efe657d326d0aa179649c588732d1a46f10f724 Mon Sep 17 00:00:00 2001
From: hadiparsianNIH
Date: Tue, 14 Jan 2025 13:43:06 -0800
Subject: [PATCH 01/16] Created AWS-GCP
---
GoogleCloud/README.md | 143 ++++++++++++++++++
.../Submodule_00_Glossary.md | 0
.../Submodule_00_background.ipynb | 0
.../Submodule_01_prog_setup.ipynb | 0
.../Submodule_02_basic_assembly.ipynb | 0
.../Submodule_03_annotation_only.ipynb | 0
.../Submodule_04_google_batch_assembly.ipynb | 0
.../Submodule_05_Bonus_Notebook.ipynb | 0
.../images}/AnnotationProcess.png | Bin
.../images}/MDI-course-card-2.png | Bin
.../images}/RNA-Seq_Notebook_Homepage.png | Bin
{images => GoogleCloud/images}/Setup10.png | Bin
{images => GoogleCloud/images}/Setup11.png | Bin
{images => GoogleCloud/images}/Setup12.png | Bin
{images => GoogleCloud/images}/Setup13.png | Bin
{images => GoogleCloud/images}/Setup14.png | Bin
{images => GoogleCloud/images}/Setup15.png | Bin
{images => GoogleCloud/images}/Setup16.png | Bin
{images => GoogleCloud/images}/Setup17.png | Bin
{images => GoogleCloud/images}/Setup18.png | Bin
{images => GoogleCloud/images}/Setup19.png | Bin
{images => GoogleCloud/images}/Setup2.png | Bin
{images => GoogleCloud/images}/Setup20.png | Bin
{images => GoogleCloud/images}/Setup21.png | Bin
{images => GoogleCloud/images}/Setup22.png | Bin
{images => GoogleCloud/images}/Setup23.png | Bin
{images => GoogleCloud/images}/Setup24.png | Bin
{images => GoogleCloud/images}/Setup25.png | Bin
{images => GoogleCloud/images}/Setup3.png | Bin
{images => GoogleCloud/images}/Setup4.png | Bin
{images => GoogleCloud/images}/Setup5.png | Bin
{images => GoogleCloud/images}/Setup6.png | Bin
{images => GoogleCloud/images}/Setup7.png | Bin
{images => GoogleCloud/images}/Setup8.png | Bin
{images => GoogleCloud/images}/Setup9.png | Bin
.../images}/TransPiWorkflow.png | Bin
{images => GoogleCloud/images}/VMdownsize.jpg | Bin
.../images}/architecture_diagram.png | Bin
.../images}/basic_assembly.png | Bin
{images => GoogleCloud/images}/cellMenu.png | Bin
.../images}/deBruijnGraph.png | Bin
{images => GoogleCloud/images}/fileDemo.png | Bin
{images => GoogleCloud/images}/gcbDiagram.jpg | Bin
{images => GoogleCloud/images}/glsDiagram.png | Bin
.../images}/jupyterRuntime.png | Bin
.../images}/jupyterRuntimeCircle.png | Bin
.../mdibl-compbio-core-logo-eurostyle.jpg | Bin
.../mdibl-compbio-core-logo-square.jpg | Bin
.../images}/module_concept.png | Bin
{images => GoogleCloud/images}/perl-logo.png | Bin
.../images}/rainbowTrout.jpeg | Bin
.../images}/transpi_workflow.png | Bin
.../images}/workflow_concept.png | Bin
.../quiz-material}/00-cp1.json | 0
.../quiz-material}/00-cp2.json | 0
.../quiz-material}/00-pc1.json | 0
.../quiz-material}/01-cp1.json | 0
.../quiz-material}/02-cp1-1.json | 0
.../quiz-material}/02-cp1-2.json | 0
.../quiz-material}/03-cp1-1.json | 0
.../quiz-material}/03-cp1-2.json | 0
.../quiz-material}/04-cp1-1.json | 0
.../quiz-material}/04-cp1-2.json | 0
.../quiz-material}/04-cp1-3.json | 0
.../quiz-material}/04-cp1-4.json | 0
65 files changed, 143 insertions(+)
create mode 100644 GoogleCloud/README.md
rename Submodule_00_Glossary.md => GoogleCloud/Submodule_00_Glossary.md (100%)
rename Submodule_00_background.ipynb => GoogleCloud/Submodule_00_background.ipynb (100%)
rename Submodule_01_prog_setup.ipynb => GoogleCloud/Submodule_01_prog_setup.ipynb (100%)
rename Submodule_02_basic_assembly.ipynb => GoogleCloud/Submodule_02_basic_assembly.ipynb (100%)
rename Submodule_03_annotation_only.ipynb => GoogleCloud/Submodule_03_annotation_only.ipynb (100%)
rename Submodule_04_google_batch_assembly.ipynb => GoogleCloud/Submodule_04_google_batch_assembly.ipynb (100%)
rename Submodule_05_Bonus_Notebook.ipynb => GoogleCloud/Submodule_05_Bonus_Notebook.ipynb (100%)
rename {images => GoogleCloud/images}/AnnotationProcess.png (100%)
rename {images => GoogleCloud/images}/MDI-course-card-2.png (100%)
rename {images => GoogleCloud/images}/RNA-Seq_Notebook_Homepage.png (100%)
rename {images => GoogleCloud/images}/Setup10.png (100%)
rename {images => GoogleCloud/images}/Setup11.png (100%)
rename {images => GoogleCloud/images}/Setup12.png (100%)
rename {images => GoogleCloud/images}/Setup13.png (100%)
rename {images => GoogleCloud/images}/Setup14.png (100%)
rename {images => GoogleCloud/images}/Setup15.png (100%)
rename {images => GoogleCloud/images}/Setup16.png (100%)
rename {images => GoogleCloud/images}/Setup17.png (100%)
rename {images => GoogleCloud/images}/Setup18.png (100%)
rename {images => GoogleCloud/images}/Setup19.png (100%)
rename {images => GoogleCloud/images}/Setup2.png (100%)
rename {images => GoogleCloud/images}/Setup20.png (100%)
rename {images => GoogleCloud/images}/Setup21.png (100%)
rename {images => GoogleCloud/images}/Setup22.png (100%)
rename {images => GoogleCloud/images}/Setup23.png (100%)
rename {images => GoogleCloud/images}/Setup24.png (100%)
rename {images => GoogleCloud/images}/Setup25.png (100%)
rename {images => GoogleCloud/images}/Setup3.png (100%)
rename {images => GoogleCloud/images}/Setup4.png (100%)
rename {images => GoogleCloud/images}/Setup5.png (100%)
rename {images => GoogleCloud/images}/Setup6.png (100%)
rename {images => GoogleCloud/images}/Setup7.png (100%)
rename {images => GoogleCloud/images}/Setup8.png (100%)
rename {images => GoogleCloud/images}/Setup9.png (100%)
rename {images => GoogleCloud/images}/TransPiWorkflow.png (100%)
rename {images => GoogleCloud/images}/VMdownsize.jpg (100%)
rename {images => GoogleCloud/images}/architecture_diagram.png (100%)
rename {images => GoogleCloud/images}/basic_assembly.png (100%)
rename {images => GoogleCloud/images}/cellMenu.png (100%)
rename {images => GoogleCloud/images}/deBruijnGraph.png (100%)
rename {images => GoogleCloud/images}/fileDemo.png (100%)
rename {images => GoogleCloud/images}/gcbDiagram.jpg (100%)
rename {images => GoogleCloud/images}/glsDiagram.png (100%)
rename {images => GoogleCloud/images}/jupyterRuntime.png (100%)
rename {images => GoogleCloud/images}/jupyterRuntimeCircle.png (100%)
rename {images => GoogleCloud/images}/mdibl-compbio-core-logo-eurostyle.jpg (100%)
rename {images => GoogleCloud/images}/mdibl-compbio-core-logo-square.jpg (100%)
rename {images => GoogleCloud/images}/module_concept.png (100%)
rename {images => GoogleCloud/images}/perl-logo.png (100%)
rename {images => GoogleCloud/images}/rainbowTrout.jpeg (100%)
rename {images => GoogleCloud/images}/transpi_workflow.png (100%)
rename {images => GoogleCloud/images}/workflow_concept.png (100%)
rename {quiz-material => GoogleCloud/quiz-material}/00-cp1.json (100%)
rename {quiz-material => GoogleCloud/quiz-material}/00-cp2.json (100%)
rename {quiz-material => GoogleCloud/quiz-material}/00-pc1.json (100%)
rename {quiz-material => GoogleCloud/quiz-material}/01-cp1.json (100%)
rename {quiz-material => GoogleCloud/quiz-material}/02-cp1-1.json (100%)
rename {quiz-material => GoogleCloud/quiz-material}/02-cp1-2.json (100%)
rename {quiz-material => GoogleCloud/quiz-material}/03-cp1-1.json (100%)
rename {quiz-material => GoogleCloud/quiz-material}/03-cp1-2.json (100%)
rename {quiz-material => GoogleCloud/quiz-material}/04-cp1-1.json (100%)
rename {quiz-material => GoogleCloud/quiz-material}/04-cp1-2.json (100%)
rename {quiz-material => GoogleCloud/quiz-material}/04-cp1-3.json (100%)
rename {quiz-material => GoogleCloud/quiz-material}/04-cp1-4.json (100%)
diff --git a/GoogleCloud/README.md b/GoogleCloud/README.md
new file mode 100644
index 0000000..1b2fd02
--- /dev/null
+++ b/GoogleCloud/README.md
@@ -0,0 +1,143 @@
+
+
+# MDI Biological Laboratory RNA-seq Transcriptome Assembly Module
+---------------------------------
+
+
+## Three primary and interlinked learning goals:
+1. From a *biological perspective*, demonstration of the **process of transcriptome assembly** from raw RNA-seq data.
+2. From a *computational perspective*, demonstration of **computing using workflow management and container systems**.
+3. Also from an *infrastructure perspective*, demonstration of **carrying out these analyses efficiently in a cloud environment.**
+
+
+
+# Quick Overview
+This module teaches you how to perform a short-read RNA-seq Transcriptome Assembly with Google Cloud Platform using a Nextflow pipeline, and eventually using the Google Batch API. In addition to the overview given in this README, you will find three Jupyter notebooks that teach you different components of RNA-seq in the cloud.
+
+This module will cost you about $7.00 to run end to end, assuming you shutdown and delete all resources upon completion.
+
+
+## Contents
+
++ [Getting Started](#getting-started)
++ [Biological Problem](#biological-problem)
++ [Set Up](#set-up)
++ [Software Requirements](#software-requirements)
++ [Workflow Diagrams](#workflow-diagrams)
++ [Data](#data)
++ [Troubleshooting](#troubleshooting)
++ [Funding](#funding)
++ [License for Data](#license-for-data)
+
+## **Getting Started**
+This learning module includes tutorials and execution scripts in the form of Jupyter notebooks. The purpose of these tutorials is to help users familiarize themselves with cloud computing in the specific context of running bioinformatics workflows to prep for and to carry out a transcriptome assembly, refinement, and annotation. These tutorials do this by utilizing a recently published Nextflow workflow (TransPi [manuscript](https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.13593), [repository](https://github.com/palmuc/TransPi), and [user guide](https://palmuc.github.io/TransPi/)), which manages and passes data between several state-of-the-art programs, carrying out the processes from initial quality control and normalization, through assembly with several tools, refinement and assessment, and finally annotation of the final putative transcriptome.
+
+Since the work is managed by this pipeline, the notebooks will focus on setting up and running the pipeline, followed by an examination of some of the wide range of outputs produced. We will also demonstrate how to retrieve the complete results directory so that users can examine more extensively on their own computing systems going step-by-step through specific workflows. These workflows cover the start to finish of basic bioinformatics analysis; starting from raw sequence data and carrying out the steps needed to generate a final assembled and annotated transcriptome.
+
+We also put an emphasis on understanding how workflows execute, using the specific example of the Nextflow (https://www.nextflow.io) workflow engine, and on using workflow engines as supported by cloud infrastructure, using the specific example of the Google Batch API (https://cloud.google.com/batch).
+
+
+
+**Figure 1:** The technical infrastructure diagram for this project.
+
+## **Biological Problem**
+The combination of increased availability and reduced expense in obtaining high-throughput sequencing has made transcriptome profiling analysis (primarily with RNA-seq) a standard tool for the molecular characterization of widely disparate biological systems. Researchers working in common model organisms, such as mouse or zebrafish, have relatively easy access to the necessary resources (e.g., well-assembled genomes and large collections of predicted/verified transcripts), for the analysis and interpretation of their data. In contrast, researchers working on less commonly studied organisms and systems often must develop these resources for themselves.
+
+Transcriptome assembly is the broad term used to describe the process of estimating many (or ideally all) of an organism’s transcriptome based on the large-scale but fragmentary data provided by high-throughput sequencing. A "typical" RNA-seq dataset will consist of tens of millions of reads or read-pairs, with each contiguous read representing up to 150 nucleotides in the sequence. Complete transcripts, in contrast, typically range from hundreds to tens of thousands of nucleotides in length. In short, and leaving out the technical details, the process of assembling a transcriptome from raw reads (Figure 2) is to first make a "best guess" segregation of the reads into subsets that are most likely derived from one (or a small set of related/similar genes), and then for each subset, build a most-likely set of transcripts and genes.
+
+
+
+**Figure 2:** The process from raw reads to first transcriptome assembly.
+
+Once a new transcriptome is generated, assessed, and refined, it must be annotated with putative functional assignments to be of use in subsequent functional studies. Functional annotation is accomplished through a combination of assignment of homology-based and ab initio methods. The most well-established homology-based processes are the combination of protein-coding sequence prediction followed by protein sequence alignment to databases of known proteins, especially those from human or common model organisms. Ab initio methods use computational models of various features (e.g., known protein domains, signal peptides, or peptide modification sites) to characterize either the transcript or its predicted protein product. This training module will cover multiple approaches to the annotation of assembled transcriptomes.
+
+## **Set Up**
+
+#### Part 1: Setting up Environment
+
+**Enable APIs and create a Nextflow Sercice Account**
+
+If you are using Nextflow outside of NIH CloudLab you must enable the required APIs, set up a service account, and add your service account to your notebook permissions before creating the notebook. Follow sections 1 and 2 of the accompanying [how to document](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToCreateNextflowServiceAccount.md) for instructions. If you are executing this tutorial with an NIH CloudLab account your default Compute Engine service account will have all required IAM roles to run the nextflow portion.
+
+**Create the Vertex AI Instance**
+
+Follow the steps highlighted [here](https://github.com/STRIDES/NIHCloudLabGCP/blob/main/docs/vertexai.md) to create a new user-managed notebook in Vertex AI. Follow steps 1-8 and be especially careful to enable idle shutdown as highlighted in step 7. For this module you should select **Debian 11** and **Python3** in the Environment tab in step 5. In step 6 in the Machine type tab, select **n1-highmem-16** from the dropdown box. This will provide you with 16 vCPUs and 104 GB of RAM which may feel like a lot but is necessary for TransPi to run.
+
+
+#### Part 2: Adding the Modules to the Notebook
+
+1. From the Launcher in your new VM, Click the Terminal option.
+
+2. Next, paste the following git command to get a copy of everything within this repository, including all of the submodules.
+
+> ```git clone https://github.com/NIGMS/Transcriptome-Assembly-Refinement-and-Applications.git```
+3. You are now all set!
+
+**WARNING:** When you are not using the notebook, stop it. This will prevent you from incurring costs while you are not using the notebook. You can do this in the same window as where you opened the notebook. Make sure that you have the notebook selected . Then click the . When you want to start up the notebook again, do the same process except click the  instead.
+
+## **Software Requirements**
+
+All of the software requirements are taken care of and installed within [Submodule_01_prog_setup.ipynb](./Submodule_01_prog_setup.ipynb). The key pieces of software needed are:
+1. [Nextflow workflow system](https://www.nextflow.io/): Nextflow is a workflow management software that TransPi is built for.
+2. [Google Batch API](https://cloud.google.com/batch/docs): Google Batch was enabled as part of the setup process and will be readily available when it is needed.
+3. [Nextflow TransPi Package](https://github.com/palmuc/TransPi): The rest of the software is all downloaded as part of the TransPi package. TransPi is a Nextflow pipeline that carries out many of the standard steps required for transcriptome assembly and annotation. The original TransPi is available from this GitHub [link](https://github.com/palmuc/TransPi). We have made various alterations to the TransPi package and so the TransPi files you will be using throughout this module will be our own altered version.
+
+## **Workflow Diagrams**
+
+
+
+**Figure 3:** Nextflow workflow diagram. (Rivera 2021).
+Image Source: https://github.com/PalMuc/TransPi/blob/master/README.md
+
+Explanation of which notebooks execute which processes:
+
++ Notebooks labeled 0 ([Submodule_00_Background.ipynb](./Submodule_00_Background.ipynb) and [00_Glossary.md](./00_Glossary.md)) respectively cover background materials and provide a centralized glossary for both the biological problem of transcriptome assembly, as well as an introduction to workflows and container-based computing.
++ Notebook 1 ([Submodule_01_prog_setup.ipynb](./Submodule_01_prog_setup.ipynb)) is used for setting up the environment. It should only need to be run once per machine. (Note that our version of TransPi does not run the `precheck script`. To avoid the headache and wasted time, we have developed a workaround to skip that step.)
++ Notebook 2 ([Submodule_02_basic_assembly.ipynb](./Submodule_02_basic_assembly.ipynb)) carries out a complete run of the Nextflow TransPi assembly workflow on a modest sequence set, producing a small transcriptome.
++ Notebook 3 ([Submodule_03_annotation_only.ipynb](./Submodule_03_annotation_only.ipynb)) carries out an annotation-only run using a prebuilt, but more complete transcriptome.
++ Notebook 4 ([Submodule_04_google_batch_assembly.ipynb](./Submodule_04_google_batch_assembly.ipynb)) carries out the workflow using the Google Batch API.
++ Notebook 5 ([Submodule_05_Bonus_Notebook.ipynb](./Submodule_05_Bonus_Notebook.ipynb)) is a more hands-off notebook to test basic skills taught in this module.
+
+## **Data**
+The test dataset used in the majority of this module is a downsampled version of a dataset that can be obtained in its complete form from the SRA database (Bioproject [**PRJNA318296**](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA318296), GEO Accession [**GSE80221**](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE80221)). The data was originally generated by **Hartig et al., 2016**. We downsampled the data files in order to streamline the performance of the tutorials and stored them in a Google Cloud Storage bucket. The sub-sampled data, in individual sample files as well as a concatenated version of these files are available in our Google Cloud Storage bucket at `gs://nigms-sandbox/nosi-inbremaine-storage/resources/seq2`.
+
+Additional datasets for demonstration of the annotation features of TransPi were obtained from the NCBI Transcriptome Shotgun Assembly archive. These files can be found in our Google Cloud Storage bucket at `gs://nigms-sandbox/nosi-inbremaine-storage/resources/trans`.
+- Microcaecilia dermatophaga
+ - Bioproject: [**PRJNA387587**](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA387587)
+ - Originally generated by **Torres-Sánchez M et al., 2019**.
+- Oncorhynchus mykiss
+ - Bioproject: [**PRJNA389609**](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA389609)
+ - Originally generated by **Wang J et al., 2016**, **Al-Tobasei R et al., 2016**, and **Salem M et al., 2015**.
+- Pseudacris regilla
+ - Bioproject: [**PRJNA163143**](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA163143)
+ - Originally generated by **Laura Robertson, USGS**.
+
+The final submodule ([Submodule_05_Bonus_Notebook.ipynb](./Submodule_05_Bonus_Notebook.ipynb)) uses an additional dataset pulled from the SRA database. We are using the RNA-seq reads only and have subsampled and merged them to a collective 2 million reads. This is not a good idea for real analysis, but was done to reduce the costs and runtime. These files are avalible in our Google Cloud Storage bucket at `gs://nigms-sandbox/nosi-inbremaine-storage/resources/seq2`.
+- Apis mellifera
+ - Bioproject: [**PRJNA274674**](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA274674)
+ - Originally generated by **Galbraith DA et al., 2015**.
+
+## **Troubleshooting**
+- If a quiz is not rendering:
+ - Make sure the `pip install` cell was executed in Submodule 00.
+ - Try re-executing `from jupytercards import display_flashcards` or `from jupyterquiz import display_quiz` depending on the quiz type.
+- If a file/directory is not able to be found, make sure that you are in the right directory. If the notebook is idle for a long time, gets reloaded, or restarted, you will need to re-run Step 1 of the notebook. (`%cd /home/jupyter`)
+- Sometimes, Nextflow will print `WARN:` followed by the warning. These are okay and should not produce any errors.
+- Sometimes Nextflow will print `Waiting for file transfers to complete`. This may take a few minutes, but is nothing to worry about.
+- If you are unable to create a bucket using the `gsutil mb` command, check your `nextflow-service-account` roles. Make sure that you have `Storage Admin` added.
+- If you are trying to execute a terminal command in a Jupyter code cell and it is not working, make sure that you have an `!` before the command.
+ - e.g., `mkdir example-1` -> `!mkdir example-1`
+
+## **Funding**
+
+MDIBL Computational Biology Core efforts are supported by two Institutional Development Awards (IDeA) from the National Institute of General Medical Sciences of the National Institutes of Health under grant numbers P20GM103423 and P20GM104318.
+
+## **License for Data**
+
+Text and materials are licensed under a Creative Commons CC-BY-NC-SA license. The license allows you to copy, remix and redistribute any of our publicly available materials, under the condition that you attribute the work (details in the license) and do not make profits from it. More information is available [here](https://tilburgsciencehub.com/about).
+
+
+
+This work is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-nc-sa/4.0/)
+
+The TransPi Nextflow workflow was developed and released by Ramon Rivera and can be obtained from its [GitHub repository](https://github.com/PalMuc/TransPi)
diff --git a/Submodule_00_Glossary.md b/GoogleCloud/Submodule_00_Glossary.md
similarity index 100%
rename from Submodule_00_Glossary.md
rename to GoogleCloud/Submodule_00_Glossary.md
diff --git a/Submodule_00_background.ipynb b/GoogleCloud/Submodule_00_background.ipynb
similarity index 100%
rename from Submodule_00_background.ipynb
rename to GoogleCloud/Submodule_00_background.ipynb
diff --git a/Submodule_01_prog_setup.ipynb b/GoogleCloud/Submodule_01_prog_setup.ipynb
similarity index 100%
rename from Submodule_01_prog_setup.ipynb
rename to GoogleCloud/Submodule_01_prog_setup.ipynb
diff --git a/Submodule_02_basic_assembly.ipynb b/GoogleCloud/Submodule_02_basic_assembly.ipynb
similarity index 100%
rename from Submodule_02_basic_assembly.ipynb
rename to GoogleCloud/Submodule_02_basic_assembly.ipynb
diff --git a/Submodule_03_annotation_only.ipynb b/GoogleCloud/Submodule_03_annotation_only.ipynb
similarity index 100%
rename from Submodule_03_annotation_only.ipynb
rename to GoogleCloud/Submodule_03_annotation_only.ipynb
diff --git a/Submodule_04_google_batch_assembly.ipynb b/GoogleCloud/Submodule_04_google_batch_assembly.ipynb
similarity index 100%
rename from Submodule_04_google_batch_assembly.ipynb
rename to GoogleCloud/Submodule_04_google_batch_assembly.ipynb
diff --git a/Submodule_05_Bonus_Notebook.ipynb b/GoogleCloud/Submodule_05_Bonus_Notebook.ipynb
similarity index 100%
rename from Submodule_05_Bonus_Notebook.ipynb
rename to GoogleCloud/Submodule_05_Bonus_Notebook.ipynb
diff --git a/images/AnnotationProcess.png b/GoogleCloud/images/AnnotationProcess.png
similarity index 100%
rename from images/AnnotationProcess.png
rename to GoogleCloud/images/AnnotationProcess.png
diff --git a/images/MDI-course-card-2.png b/GoogleCloud/images/MDI-course-card-2.png
similarity index 100%
rename from images/MDI-course-card-2.png
rename to GoogleCloud/images/MDI-course-card-2.png
diff --git a/images/RNA-Seq_Notebook_Homepage.png b/GoogleCloud/images/RNA-Seq_Notebook_Homepage.png
similarity index 100%
rename from images/RNA-Seq_Notebook_Homepage.png
rename to GoogleCloud/images/RNA-Seq_Notebook_Homepage.png
diff --git a/images/Setup10.png b/GoogleCloud/images/Setup10.png
similarity index 100%
rename from images/Setup10.png
rename to GoogleCloud/images/Setup10.png
diff --git a/images/Setup11.png b/GoogleCloud/images/Setup11.png
similarity index 100%
rename from images/Setup11.png
rename to GoogleCloud/images/Setup11.png
diff --git a/images/Setup12.png b/GoogleCloud/images/Setup12.png
similarity index 100%
rename from images/Setup12.png
rename to GoogleCloud/images/Setup12.png
diff --git a/images/Setup13.png b/GoogleCloud/images/Setup13.png
similarity index 100%
rename from images/Setup13.png
rename to GoogleCloud/images/Setup13.png
diff --git a/images/Setup14.png b/GoogleCloud/images/Setup14.png
similarity index 100%
rename from images/Setup14.png
rename to GoogleCloud/images/Setup14.png
diff --git a/images/Setup15.png b/GoogleCloud/images/Setup15.png
similarity index 100%
rename from images/Setup15.png
rename to GoogleCloud/images/Setup15.png
diff --git a/images/Setup16.png b/GoogleCloud/images/Setup16.png
similarity index 100%
rename from images/Setup16.png
rename to GoogleCloud/images/Setup16.png
diff --git a/images/Setup17.png b/GoogleCloud/images/Setup17.png
similarity index 100%
rename from images/Setup17.png
rename to GoogleCloud/images/Setup17.png
diff --git a/images/Setup18.png b/GoogleCloud/images/Setup18.png
similarity index 100%
rename from images/Setup18.png
rename to GoogleCloud/images/Setup18.png
diff --git a/images/Setup19.png b/GoogleCloud/images/Setup19.png
similarity index 100%
rename from images/Setup19.png
rename to GoogleCloud/images/Setup19.png
diff --git a/images/Setup2.png b/GoogleCloud/images/Setup2.png
similarity index 100%
rename from images/Setup2.png
rename to GoogleCloud/images/Setup2.png
diff --git a/images/Setup20.png b/GoogleCloud/images/Setup20.png
similarity index 100%
rename from images/Setup20.png
rename to GoogleCloud/images/Setup20.png
diff --git a/images/Setup21.png b/GoogleCloud/images/Setup21.png
similarity index 100%
rename from images/Setup21.png
rename to GoogleCloud/images/Setup21.png
diff --git a/images/Setup22.png b/GoogleCloud/images/Setup22.png
similarity index 100%
rename from images/Setup22.png
rename to GoogleCloud/images/Setup22.png
diff --git a/images/Setup23.png b/GoogleCloud/images/Setup23.png
similarity index 100%
rename from images/Setup23.png
rename to GoogleCloud/images/Setup23.png
diff --git a/images/Setup24.png b/GoogleCloud/images/Setup24.png
similarity index 100%
rename from images/Setup24.png
rename to GoogleCloud/images/Setup24.png
diff --git a/images/Setup25.png b/GoogleCloud/images/Setup25.png
similarity index 100%
rename from images/Setup25.png
rename to GoogleCloud/images/Setup25.png
diff --git a/images/Setup3.png b/GoogleCloud/images/Setup3.png
similarity index 100%
rename from images/Setup3.png
rename to GoogleCloud/images/Setup3.png
diff --git a/images/Setup4.png b/GoogleCloud/images/Setup4.png
similarity index 100%
rename from images/Setup4.png
rename to GoogleCloud/images/Setup4.png
diff --git a/images/Setup5.png b/GoogleCloud/images/Setup5.png
similarity index 100%
rename from images/Setup5.png
rename to GoogleCloud/images/Setup5.png
diff --git a/images/Setup6.png b/GoogleCloud/images/Setup6.png
similarity index 100%
rename from images/Setup6.png
rename to GoogleCloud/images/Setup6.png
diff --git a/images/Setup7.png b/GoogleCloud/images/Setup7.png
similarity index 100%
rename from images/Setup7.png
rename to GoogleCloud/images/Setup7.png
diff --git a/images/Setup8.png b/GoogleCloud/images/Setup8.png
similarity index 100%
rename from images/Setup8.png
rename to GoogleCloud/images/Setup8.png
diff --git a/images/Setup9.png b/GoogleCloud/images/Setup9.png
similarity index 100%
rename from images/Setup9.png
rename to GoogleCloud/images/Setup9.png
diff --git a/images/TransPiWorkflow.png b/GoogleCloud/images/TransPiWorkflow.png
similarity index 100%
rename from images/TransPiWorkflow.png
rename to GoogleCloud/images/TransPiWorkflow.png
diff --git a/images/VMdownsize.jpg b/GoogleCloud/images/VMdownsize.jpg
similarity index 100%
rename from images/VMdownsize.jpg
rename to GoogleCloud/images/VMdownsize.jpg
diff --git a/images/architecture_diagram.png b/GoogleCloud/images/architecture_diagram.png
similarity index 100%
rename from images/architecture_diagram.png
rename to GoogleCloud/images/architecture_diagram.png
diff --git a/images/basic_assembly.png b/GoogleCloud/images/basic_assembly.png
similarity index 100%
rename from images/basic_assembly.png
rename to GoogleCloud/images/basic_assembly.png
diff --git a/images/cellMenu.png b/GoogleCloud/images/cellMenu.png
similarity index 100%
rename from images/cellMenu.png
rename to GoogleCloud/images/cellMenu.png
diff --git a/images/deBruijnGraph.png b/GoogleCloud/images/deBruijnGraph.png
similarity index 100%
rename from images/deBruijnGraph.png
rename to GoogleCloud/images/deBruijnGraph.png
diff --git a/images/fileDemo.png b/GoogleCloud/images/fileDemo.png
similarity index 100%
rename from images/fileDemo.png
rename to GoogleCloud/images/fileDemo.png
diff --git a/images/gcbDiagram.jpg b/GoogleCloud/images/gcbDiagram.jpg
similarity index 100%
rename from images/gcbDiagram.jpg
rename to GoogleCloud/images/gcbDiagram.jpg
diff --git a/images/glsDiagram.png b/GoogleCloud/images/glsDiagram.png
similarity index 100%
rename from images/glsDiagram.png
rename to GoogleCloud/images/glsDiagram.png
diff --git a/images/jupyterRuntime.png b/GoogleCloud/images/jupyterRuntime.png
similarity index 100%
rename from images/jupyterRuntime.png
rename to GoogleCloud/images/jupyterRuntime.png
diff --git a/images/jupyterRuntimeCircle.png b/GoogleCloud/images/jupyterRuntimeCircle.png
similarity index 100%
rename from images/jupyterRuntimeCircle.png
rename to GoogleCloud/images/jupyterRuntimeCircle.png
diff --git a/images/mdibl-compbio-core-logo-eurostyle.jpg b/GoogleCloud/images/mdibl-compbio-core-logo-eurostyle.jpg
similarity index 100%
rename from images/mdibl-compbio-core-logo-eurostyle.jpg
rename to GoogleCloud/images/mdibl-compbio-core-logo-eurostyle.jpg
diff --git a/images/mdibl-compbio-core-logo-square.jpg b/GoogleCloud/images/mdibl-compbio-core-logo-square.jpg
similarity index 100%
rename from images/mdibl-compbio-core-logo-square.jpg
rename to GoogleCloud/images/mdibl-compbio-core-logo-square.jpg
diff --git a/images/module_concept.png b/GoogleCloud/images/module_concept.png
similarity index 100%
rename from images/module_concept.png
rename to GoogleCloud/images/module_concept.png
diff --git a/images/perl-logo.png b/GoogleCloud/images/perl-logo.png
similarity index 100%
rename from images/perl-logo.png
rename to GoogleCloud/images/perl-logo.png
diff --git a/images/rainbowTrout.jpeg b/GoogleCloud/images/rainbowTrout.jpeg
similarity index 100%
rename from images/rainbowTrout.jpeg
rename to GoogleCloud/images/rainbowTrout.jpeg
diff --git a/images/transpi_workflow.png b/GoogleCloud/images/transpi_workflow.png
similarity index 100%
rename from images/transpi_workflow.png
rename to GoogleCloud/images/transpi_workflow.png
diff --git a/images/workflow_concept.png b/GoogleCloud/images/workflow_concept.png
similarity index 100%
rename from images/workflow_concept.png
rename to GoogleCloud/images/workflow_concept.png
diff --git a/quiz-material/00-cp1.json b/GoogleCloud/quiz-material/00-cp1.json
similarity index 100%
rename from quiz-material/00-cp1.json
rename to GoogleCloud/quiz-material/00-cp1.json
diff --git a/quiz-material/00-cp2.json b/GoogleCloud/quiz-material/00-cp2.json
similarity index 100%
rename from quiz-material/00-cp2.json
rename to GoogleCloud/quiz-material/00-cp2.json
diff --git a/quiz-material/00-pc1.json b/GoogleCloud/quiz-material/00-pc1.json
similarity index 100%
rename from quiz-material/00-pc1.json
rename to GoogleCloud/quiz-material/00-pc1.json
diff --git a/quiz-material/01-cp1.json b/GoogleCloud/quiz-material/01-cp1.json
similarity index 100%
rename from quiz-material/01-cp1.json
rename to GoogleCloud/quiz-material/01-cp1.json
diff --git a/quiz-material/02-cp1-1.json b/GoogleCloud/quiz-material/02-cp1-1.json
similarity index 100%
rename from quiz-material/02-cp1-1.json
rename to GoogleCloud/quiz-material/02-cp1-1.json
diff --git a/quiz-material/02-cp1-2.json b/GoogleCloud/quiz-material/02-cp1-2.json
similarity index 100%
rename from quiz-material/02-cp1-2.json
rename to GoogleCloud/quiz-material/02-cp1-2.json
diff --git a/quiz-material/03-cp1-1.json b/GoogleCloud/quiz-material/03-cp1-1.json
similarity index 100%
rename from quiz-material/03-cp1-1.json
rename to GoogleCloud/quiz-material/03-cp1-1.json
diff --git a/quiz-material/03-cp1-2.json b/GoogleCloud/quiz-material/03-cp1-2.json
similarity index 100%
rename from quiz-material/03-cp1-2.json
rename to GoogleCloud/quiz-material/03-cp1-2.json
diff --git a/quiz-material/04-cp1-1.json b/GoogleCloud/quiz-material/04-cp1-1.json
similarity index 100%
rename from quiz-material/04-cp1-1.json
rename to GoogleCloud/quiz-material/04-cp1-1.json
diff --git a/quiz-material/04-cp1-2.json b/GoogleCloud/quiz-material/04-cp1-2.json
similarity index 100%
rename from quiz-material/04-cp1-2.json
rename to GoogleCloud/quiz-material/04-cp1-2.json
diff --git a/quiz-material/04-cp1-3.json b/GoogleCloud/quiz-material/04-cp1-3.json
similarity index 100%
rename from quiz-material/04-cp1-3.json
rename to GoogleCloud/quiz-material/04-cp1-3.json
diff --git a/quiz-material/04-cp1-4.json b/GoogleCloud/quiz-material/04-cp1-4.json
similarity index 100%
rename from quiz-material/04-cp1-4.json
rename to GoogleCloud/quiz-material/04-cp1-4.json
From e2216148a3c1aa1830482f708ca36eedc32bf69f Mon Sep 17 00:00:00 2001
From: hadiparsianNIH
Date: Tue, 14 Jan 2025 14:37:37 -0800
Subject: [PATCH 02/16] Changed the format of submodule 00
---
GoogleCloud/Submodule_00_background.ipynb | 58 +++++++++++++++++++++--
test.md | 30 ++++++++++++
2 files changed, 84 insertions(+), 4 deletions(-)
create mode 100644 test.md
diff --git a/GoogleCloud/Submodule_00_background.ipynb b/GoogleCloud/Submodule_00_background.ipynb
index 151f101..d30ea25 100644
--- a/GoogleCloud/Submodule_00_background.ipynb
+++ b/GoogleCloud/Submodule_00_background.ipynb
@@ -14,7 +14,7 @@
"id": "5e6d2086-4dbf-4a61-a5bb-8f08a269f3fa",
"metadata": {},
"source": [
- "## Welcome!\n",
+ "## Overview\n",
"\n",
"This is a series of notebooks that allows you to explore the biological and computational process of the transcriptome assembly. Through these notebooks, you will also learn to leverage the powerful capabilities of tools such as Nextflow and Google Life Science API to bring your computational capabilities to the next level!\n",
"\n",
@@ -25,6 +25,40 @@
"Good luck, and have fun!"
]
},
+ {
+ "cell_type": "markdown",
+ "id": "3518c1a9",
+ "metadata": {},
+ "source": [
+ "## Learning Objectives:\n",
+ "\n",
+ "1. **Assess prior knowledge:** A pre-check quiz verifies foundational understanding of DNA, RNA, transcription, and gene expression.\n",
+ "\n",
+ "2. **Introduce transcriptome assembly:** Learners gain an understanding of what transcriptome assembly is, why RNA sequencing is performed, and the overall workflow involved.\n",
+ "\n",
+ "3. **Explain the process of transcriptome assembly:** This includes understanding preprocessing, sequence assembly using de Bruijn graphs, assembly assessment (internal and external consistency, BUSCO), and refinement techniques.\n",
+ "\n",
+ "4. **Introduce workflow management:** Learners are introduced to the concept of workflows/pipelines in bioinformatics and the role of workflow management systems like Nextflow.\n",
+ "\n",
+ "5. **Explain the use of Docker containers:** The notebook explains the purpose and benefits of using Docker containers for managing software dependencies in bioinformatics.\n",
+ "\n",
+ "6. **Introduce the Google Cloud Life Sciences API:** Learners are introduced to the Google Cloud Life Sciences API and its advantages for managing and executing workflows on cloud computing resources.\n",
+ "\n",
+ "7. **Familiarize learners with Jupyter Notebooks:** The notebook provides instructions on how to navigate and use Jupyter Notebooks, including cell types and execution order."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6a23eec6",
+ "metadata": {},
+ "source": [
+ "## Prerequisites\n",
+ "\n",
+ "* **Basic Biology Knowledge:** A foundational understanding of DNA, RNA, transcription, and gene expression is assumed. The notebook includes quizzes to assess this knowledge.\n",
+ "* **Python Programming:** While the notebook itself doesn't contain complex Python code, familiarity with Python syntax and the Jupyter Notebook environment is helpful.\n",
+ "* **Command Line Interface (CLI) Familiarity:** The notebook mentions using `pip` (a command-line package installer), indicating some CLI knowledge is beneficial, although not strictly required for completing the quizzes and reviewing the material."
+ ]
+ },
{
"cell_type": "markdown",
"id": "22b95a28-fad7-4b6c-99ae-093c323f769c",
@@ -383,14 +417,30 @@
},
{
"cell_type": "markdown",
- "id": "489beca6-4a9e-4a2e-a646-6b276270d810",
+ "id": "8d3cf5c9",
+ "metadata": {},
+ "source": [
+ "## Conclusion\n",
+ "\n",
+ "This introductory Jupyter Notebook provided essential background information and a pre-requisite knowledge check on fundamental molecular biology concepts (DNA, RNA, transcription, gene expression) crucial for understanding transcriptome assembly. The notebook established the context for the subsequent modules, outlining the workflow involving RNA-seq data, transcriptome assembly techniques (including de Bruijn graphs, BUSCO analysis), and the use of Nextflow and Google Cloud Life Sciences API for efficient workflow execution and management. The inclusion of interactive quizzes and video resources enhanced learning and engagement, preparing learners for the practical applications and computational challenges presented in the following notebooks. Successful completion of the checkpoint quizzes demonstrates readiness to proceed to the next stage of the MDIBL Transcriptome Assembly Learning Module."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "421cebc3",
"metadata": {},
"source": [
- "## When you are ready, proceed to the next notebook: [`Submodule_01_prog_setup.ipynb`](./Submodule_01_prog_setup.ipynb)."
+ "## Clean Up\n",
+ "\n",
+ "Remember to proceed to the next notebook [`Submodule_01_prog_setup.ipynb`](./Submodule_01_prog_setup.ipynb) or shut down your instance if you are finished."
]
}
],
- "metadata": {},
+ "metadata": {
+ "language_info": {
+ "name": "python"
+ }
+ },
"nbformat": 4,
"nbformat_minor": 5
}
diff --git a/test.md b/test.md
new file mode 100644
index 0000000..b5de049
--- /dev/null
+++ b/test.md
@@ -0,0 +1,30 @@
+This Jupyter notebook is an introductory module for a transcriptome assembly learning course. Here's a breakdown of its prerequisites, API requirements, and cloud access needs:
+
+**Prerequisites:**
+
+* **Basic Biology Knowledge:** A foundational understanding of DNA, RNA, transcription, and gene expression is assumed. The notebook includes quizzes to assess this knowledge.
+* **Python Programming:** While the notebook itself doesn't contain complex Python code, familiarity with Python syntax and the Jupyter Notebook environment is helpful.
+* **Command Line Interface (CLI) Familiarity:** The notebook mentions using `pip` (a command-line package installer), indicating some CLI knowledge is beneficial, although not strictly required for completing the quizzes and reviewing the material.
+
+
+**APIs:**
+
+* **No APIs are directly used in this introductory notebook.** The notebook *mentions* the Google Cloud Life Sciences API as a tool that will be used in later modules, but it's not utilized within this specific file.
+
+
+**Cloud Platform Account Roles and Access:**
+
+* **None required for this notebook.** The Google Cloud Life Sciences API is mentioned for later modules, implying that access to a Google Cloud Platform account with appropriate permissions will be needed later, but this introductory notebook only requires local execution and doesn't necessitate any cloud interaction.
+
+
+**Necessary Cloud Platform Access:**
+
+* **None for this notebook.** This notebook focuses on background information and introductory concepts. Cloud access would be required for later parts of the course, based on its reference to Google Cloud Life Sciences API, but not for this particular Jupyter notebook.
+
+**Software Installations (mentioned in notebook):**
+
+* `jupyterquiz==2.0.7` (for the quizzes)
+* `jupytercards` (also for quizzes, though not used in this specific notebook)
+
+
+In summary, this notebook is a self-contained introductory lesson. While future modules will require cloud access and specific APIs, this particular file only requires basic biological knowledge, familiarity with Python and Jupyter, and installation of a couple of Python packages.
\ No newline at end of file
From 42dd2e9edac09247d6f8c966b4f134474bc65e6b Mon Sep 17 00:00:00 2001
From: github-action
Date: Tue, 14 Jan 2025 22:37:53 +0000
Subject: [PATCH 03/16] Github Action: Lint Notebooks
---
GoogleCloud/Submodule_00_background.ipynb | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/GoogleCloud/Submodule_00_background.ipynb b/GoogleCloud/Submodule_00_background.ipynb
index d30ea25..b174a7c 100644
--- a/GoogleCloud/Submodule_00_background.ipynb
+++ b/GoogleCloud/Submodule_00_background.ipynb
@@ -436,11 +436,7 @@
]
}
],
- "metadata": {
- "language_info": {
- "name": "python"
- }
- },
+ "metadata": {},
"nbformat": 4,
"nbformat_minor": 5
}
From 2dd6da9f813b3346ae96d9905910c8979ba6a292 Mon Sep 17 00:00:00 2001
From: hadiparsianNIH
Date: Tue, 14 Jan 2025 14:43:14 -0800
Subject: [PATCH 04/16] fixed the typo in submodule 00
---
GoogleCloud/Submodule_00_background.ipynb | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/GoogleCloud/Submodule_00_background.ipynb b/GoogleCloud/Submodule_00_background.ipynb
index b174a7c..99df77a 100644
--- a/GoogleCloud/Submodule_00_background.ipynb
+++ b/GoogleCloud/Submodule_00_background.ipynb
@@ -59,6 +59,14 @@
"* **Command Line Interface (CLI) Familiarity:** The notebook mentions using `pip` (a command-line package installer), indicating some CLI knowledge is beneficial, although not strictly required for completing the quizzes and reviewing the material."
]
},
+ {
+ "cell_type": "markdown",
+ "id": "f6eefc1e",
+ "metadata": {},
+ "source": [
+ "## Get Started"
+ ]
+ },
{
"cell_type": "markdown",
"id": "22b95a28-fad7-4b6c-99ae-093c323f769c",
From 912ea245e097b1f0d272eb35a72af5d33b709ec3 Mon Sep 17 00:00:00 2001
From: hadiparsianNIH
Date: Tue, 14 Jan 2025 16:17:50 -0800
Subject: [PATCH 05/16] changed the format of the GCP
---
GoogleCloud/Submodule_01_prog_setup.ipynb | 145 +++++++++++++++---
GoogleCloud/Submodule_02_basic_assembly.ipynb | 72 ++++++++-
.../Submodule_03_annotation_only.ipynb | 75 ++++++++-
.../Submodule_04_google_batch_assembly.ipynb | 80 +++++++++-
GoogleCloud/Submodule_05_Bonus_Notebook.ipynb | 78 +++++++++-
test.md | 30 ----
6 files changed, 403 insertions(+), 77 deletions(-)
delete mode 100644 test.md
diff --git a/GoogleCloud/Submodule_01_prog_setup.ipynb b/GoogleCloud/Submodule_01_prog_setup.ipynb
index 50ce29a..7cd93c5 100644
--- a/GoogleCloud/Submodule_01_prog_setup.ipynb
+++ b/GoogleCloud/Submodule_01_prog_setup.ipynb
@@ -6,11 +6,64 @@
"metadata": {},
"source": [
"# MDIBL Transcriptome Assembly Learning Module\n",
- "# Notebook 1: Setup\n",
+ "# Notebook 1: Setup"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f62d616c",
+ "metadata": {},
+ "source": [
+ "## Overview\n",
"\n",
"This notebook is designed to configure your virtual machine (VM) to have the proper tools and data in place to run the transcriptome assembly training module."
]
},
+ {
+ "cell_type": "markdown",
+ "id": "60145056",
+ "metadata": {},
+ "source": [
+ "## Learning Objectives\n",
+ "\n",
+ "1. **Understand and utilize shell commands within Jupyter Notebooks:** The notebook explicitly teaches the difference between `!` and `%` prefixes for executing shell commands, and how to navigate directories using `cd` and `pwd`.\n",
+ "\n",
+ "2. **Set up the necessary software:** Students will install and configure essential tools including:\n",
+ " * Java (a prerequisite for Nextflow).\n",
+ " * Mambaforge (a package manager for bioinformatics tools).\n",
+ " * `sra-tools`, `perl-dbd-sqlite`, and `perl-dbi` (specific bioinformatics packages).\n",
+ " * Nextflow (a workflow management system).\n",
+ " * `gsutil` (for interacting with Google Cloud Storage).\n",
+ "\n",
+ "3. **Download and organize necessary data:** Students will download the TransPi transcriptome assembly software and its associated resources (databases, scripts, configuration files) from a Google Cloud Storage bucket. This includes understanding the directory structure and file organization.\n",
+ "\n",
+ "4. **Manage file permissions:** Students will use the `chmod` command to set executable permissions for the necessary files and directories within the TransPi software.\n",
+ "\n",
+ "5. **Navigate file paths:** The notebook provides examples and explanations for using relative file paths (e.g., `./`, `../`) within shell commands."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "549be731",
+ "metadata": {},
+ "source": [
+ "## Prerequisites\n",
+ "\n",
+ "* **Operating System:** A Linux-based system is assumed (commands like `apt`, `uname` are used). The specific distribution isn't specified but a Debian-based system is likely.\n",
+ "* **Shell Access:** The ability to execute shell commands from within the Jupyter Notebook environment (using `!` and `%`).\n",
+ "* **Java Development Kit (JDK):** Required for Nextflow.\n",
+ "* **Miniforge** A package manager for installing bioinformatics tools.\n",
+ "* **`gsutil`:** The Google Cloud Storage command-line tool. This is crucial for downloading data from Google Cloud Storage."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a92f62a0",
+ "metadata": {},
+ "source": [
+ "## Get Started"
+ ]
+ },
{
"cell_type": "markdown",
"id": "958495ce-339d-4d4d-a621-9ede79a7363c",
@@ -71,7 +124,7 @@
"metadata": {},
"outputs": [],
"source": [
- "!pwd"
+ "! pwd"
]
},
{
@@ -89,9 +142,9 @@
"metadata": {},
"outputs": [],
"source": [
- "!sudo apt update\n",
- "!sudo apt-get install default-jdk -y\n",
- "!java -version"
+ "! sudo apt update\n",
+ "! sudo apt-get install default-jdk -y\n",
+ "! java -version"
]
},
{
@@ -99,9 +152,7 @@
"id": "7b3ffb16-3395-4c01-9774-ee568e815490",
"metadata": {},
"source": [
- "**Step 3:** Install Mambaforge, which is needed to support the information held within the TransPi databases.\n",
- "\n",
- ">Mambaforge is a package manager."
+ "**Step 3:** Install Miniforge (a package manager), which is needed to support the information held within the TransPi databases."
]
},
{
@@ -111,9 +162,45 @@
"metadata": {},
"outputs": [],
"source": [
- "!curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh\n",
- "!bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge\n",
- "!~/mambaforge/bin/mamba install -c bioconda sra-tools perl-dbd-sqlite perl-dbi -y"
+ "! curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh\n",
+ "! bash Miniforge3-$(uname)-$(uname -m).sh -b -p $HOME/miniforge"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c5584e2e",
+ "metadata": {},
+ "source": [
+ "Next, add it to the path."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "ad030cd1",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "os.environ[\"PATH\"] += os.pathsep + os.environ[\"HOME\"]+\"/miniforge/bin\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7b930ad7",
+ "metadata": {},
+ "source": [
+ "Next, using Miniforge and bioconda, install the tools that will be used in this tutorial."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4d4dd51e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! mamba install -c bioconda sra-tools perl-dbd-sqlite perl-dbi -y"
]
},
{
@@ -131,9 +218,9 @@
"metadata": {},
"outputs": [],
"source": [
- "!curl https://get.nextflow.io | bash\n",
- "!chmod +x nextflow\n",
- "!./nextflow self-update"
+ "! curl https://get.nextflow.io | bash\n",
+ "! chmod +x nextflow\n",
+ "! ./nextflow self-update"
]
},
{
@@ -152,7 +239,7 @@
"metadata": {},
"outputs": [],
"source": [
- "!gsutil -m cp -r gs://nigms-sandbox/nosi-inbremaine-storage/TransPi ./"
+ "! gsutil -m cp -r gs://nigms-sandbox/nosi-inbremaine-storage/TransPi ./"
]
},
{
@@ -190,7 +277,7 @@
"metadata": {},
"outputs": [],
"source": [
- "!gsutil -m cp -r gs://nigms-sandbox/nosi-inbremaine-storage/resources ./"
+ "! gsutil -m cp -r gs://nigms-sandbox/nosi-inbremaine-storage/resources ./"
]
},
{
@@ -234,7 +321,7 @@
"metadata": {},
"outputs": [],
"source": [
- "!chmod -R +x ./TransPi/bin"
+ "! chmod -R +x ./TransPi/bin"
]
},
{
@@ -295,22 +382,30 @@
},
{
"cell_type": "markdown",
- "id": "f80a7bab-98ae-45a6-845f-ad3c4138575a",
+ "id": "ffec658a",
"metadata": {},
"source": [
- "## When you are ready, proceed to the next notebook: [`Submodule_02_basic_assembly.ipynb`](./Submodule_02_basic_assembly.ipynb)."
+ "## Conclusion\n",
+ "\n",
+ "This notebook successfully configured the virtual machine for the MDIBL Transcriptome Assembly Learning Module. We updated the system, installed necessary software including Java, Mambaforge, and Nextflow, and downloaded the TransPi program and its associated resources from Google Cloud Storage. The `chmod` command ensured executability of the TransPi scripts. The VM is now prepared for the next notebook, `Submodule_02_basic_assembly.ipynb`, which will delve into the transcriptome assembly process itself. Successful completion of this notebook's steps is crucial for the successful execution of subsequent modules."
]
},
{
- "cell_type": "code",
- "execution_count": null,
- "id": "934165c2-8fbd-4801-979f-6db5d1e592ea",
+ "cell_type": "markdown",
+ "id": "666c1e4d",
"metadata": {},
- "outputs": [],
- "source": []
+ "source": [
+ "## Clean Up\n",
+ "\n",
+ "Remember to proceed to the next notebook [`Submodule_02_basic_assembly.ipynb`](./Submodule_02_basic_assembly.ipynb) or shut down your instance if you are finished."
+ ]
}
],
- "metadata": {},
+ "metadata": {
+ "language_info": {
+ "name": "python"
+ }
+ },
"nbformat": 4,
"nbformat_minor": 5
}
diff --git a/GoogleCloud/Submodule_02_basic_assembly.ipynb b/GoogleCloud/Submodule_02_basic_assembly.ipynb
index 1ce40fe..ed302d3 100644
--- a/GoogleCloud/Submodule_02_basic_assembly.ipynb
+++ b/GoogleCloud/Submodule_02_basic_assembly.ipynb
@@ -8,6 +8,8 @@
"# MDIBL Transcriptome Assembly Learning Module\n",
"# Notebook 2: Performing a \"Standard\" basic transcriptome assembly\n",
"\n",
+ "## Overview\n",
+ "\n",
"In this notebook, we will set up and run a basic transcriptome assembly, using the analysis pipeline as defined by the TransPi Nextflow workflow. The steps to be carried out are the following, and each is described in more detail in the Background material notebook.\n",
"\n",
"- Sequence Quality Control (QC): removing adapters and low-quality sequences.\n",
@@ -23,12 +25,58 @@
"> **Figure 1:** TransPi workflow for a basic transcriptome assembly run."
]
},
+ {
+ "cell_type": "markdown",
+ "id": "062784ec",
+ "metadata": {},
+ "source": [
+ "## Learning Objectives\n",
+ "\n",
+ "1. **Understanding the TransPi Workflow:** Learners will gain a conceptual understanding of the TransPi workflow, including its individual steps and their order. This involves understanding the purpose of each stage (QC, normalization, assembly, integration, assessment, annotation, and reporting).\n",
+ "\n",
+ "2. **Executing a Transcriptome Assembly:** Learners will learn how to run a transcriptome assembly using Nextflow and the TransPi pipeline, including setting necessary parameters (e.g., k-mer size, read length). They will learn how to interpret the command-line interface for executing Nextflow workflows.\n",
+ "\n",
+ "3. **Interpreting Nextflow Output:** Learners will learn to navigate and understand the directory structure generated by the TransPi workflow. This includes interpreting the output from various tools such as FastQC, FastP, Trinity, TransAbyss, SOAP, rnaSpades, Velvet/Oases, EvidentialGene, rnaQuast, BUSCO, DIAMOND/BLAST, HMMER/Pfam, and TransDecoder. This involves understanding the different types of output files generated and how to extract relevant information from them (e.g., assembly statistics, annotation results).\n",
+ "\n",
+ "4. **Assessing Transcriptome Quality:** Learners will understand how to assess the quality of a transcriptome assembly using metrics generated by rnaQuast and BUSCO.\n",
+ "\n",
+ "5. **Interpreting Annotation Results:** Learners will learn to interpret the results of transcriptome annotation using tools like DIAMOND/BLAST and HMMER/Pfam, understanding what information they provide regarding protein function and domains.\n",
+ "\n",
+ "6. **Utilizing Workflow Management Systems:** Learners will gain practical experience using Nextflow, a workflow management system, to execute a complex bioinformatics pipeline. This includes understanding the benefits of using a defined workflow for reproducibility and efficiency.\n",
+ "\n",
+ "7. **Working with Jupyter Notebooks:** The notebook itself provides a practical example of how to integrate command-line tools within a Jupyter Notebook environment."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "abf9345c",
+ "metadata": {},
+ "source": [
+ "## Prerequisites\n",
+ "\n",
+ "* **Nextflow:** A workflow management system used to execute the TransPi pipeline. \n",
+ "* **Docker:** Used for containerization of the various bioinformatics tools within the workflow. This avoids the need for local installation of numerous packages.\n",
+ "* **TransPi:** The specific Nextflow pipeline for transcriptome assembly. The notebook assumes it's present in the `/home/jupyter` directory.\n",
+ "* **Bioinformatics Tools (within TransPi):** The workflow utilizes several bioinformatics tools. These are packaged within Docker containers, but the notebook expects that TransPi is configured correctly to access and use them:\n",
+ " * FastQC: Sequence quality control.\n",
+ " * FastP: Read preprocessing (trimming, adapter removal).\n",
+ " * Trinity, TransAbyss, SOAPdenovo-Trans, rnaSpades, Velvet/Oases: Transcriptome assemblers.\n",
+ " * EvidentialGene: Transcriptome integration and reduction.\n",
+ " * rnaQuast: Transcriptome assessment.\n",
+ " * BUSCO: Assessment of completeness of the assembled transcriptome.\n",
+ " * DIAMOND/BLAST: Protein alignment for annotation.\n",
+ " * HMMER/Pfam: Protein domain assignment for annotation.\n",
+ " * Bowtie2: Read mapping for assembly validation.\n",
+ " * TransDecoder: ORF prediction and coding region identification.\n",
+ " * Trinotate: Functional annotation of transcripts."
+ ]
+ },
{
"cell_type": "markdown",
"id": "6cd0f4f2-5559-4675-9e97-24b0548b31af",
"metadata": {},
"source": [
- "## Time to get started! \n",
+ "## Get Started \n",
"\n",
"**Step 1:** Make sure you are in the correct local working directory as in `01_prog_setup.ipynb`.\n",
"> It should be `/home/jupyter`."
@@ -278,14 +326,30 @@
},
{
"cell_type": "markdown",
- "id": "b96dd6bb-a8ed-44bf-b1f4-bb284f8f0f3e",
+ "id": "b82f0b3a",
+ "metadata": {},
+ "source": [
+ "## Conclusion\n",
+ "\n",
+ "This Jupyter Notebook demonstrated a complete transcriptome assembly workflow using the TransPi Nextflow pipeline. We successfully executed the pipeline, encompassing quality control, normalization, multiple assembly generation with Trinity, TransAbyss, SOAP, rnaSpades, and Velvet/Oases, integration via EvidentialGene, and subsequent assessment using rnaQuast and BUSCO. The final assembly underwent annotation with DIAMOND/BLAST and HMMER/Pfam, culminating in comprehensive reports detailing the entire process and the resulting transcriptome characteristics. The generated output, accessible in the `basicRun/output` directory, provides a rich dataset for further investigation and analysis, including detailed quality metrics, assembly statistics, and functional annotations. This module provided a practical introduction to automated transcriptome assembly, highlighting the efficiency and reproducibility offered by integrated workflows like TransPi. Further exploration of the detailed output is encouraged, and the subsequent notebook focuses on a more in-depth annotation analysis."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b68484f3",
"metadata": {},
"source": [
- "## When you are ready, proceed to the next notebook: [`Submodule_03_annotation_only.ipynb`](Submodule_03_annotation_only.ipynb)."
+ "## Clean Up\n",
+ "\n",
+ "Remember to proceed to the next notebook [`Submodule_03_annotation_only.ipynb`](Submodule_03_annotation_only.ipynb) or shut down your instance if you are finished."
]
}
],
- "metadata": {},
+ "metadata": {
+ "language_info": {
+ "name": "python"
+ }
+ },
"nbformat": 4,
"nbformat_minor": 5
}
diff --git a/GoogleCloud/Submodule_03_annotation_only.ipynb b/GoogleCloud/Submodule_03_annotation_only.ipynb
index 423ca38..de5c72e 100644
--- a/GoogleCloud/Submodule_03_annotation_only.ipynb
+++ b/GoogleCloud/Submodule_03_annotation_only.ipynb
@@ -8,11 +8,64 @@
"# MDIBL Transcriptome Assembly Learning Module\n",
"# Notebook 3: Using TransPi to Performing an \"Annotation Only\" Run\n",
"\n",
+ "## Overview\n",
+ "\n",
"In the previous notebook, we ran the entire default TransPi workflow, generating a small transcriptome from a test data set. While that is a valid exercise in carrying through the workflow, the downstream steps (annotation and assessment) will be unrealistic in their output, since the test set will only generate a few hundred transcripts. In contrast, a more complete estimate of a vertebrate transcriptome will contain tens to hundreds of thousands of transcripts.\n",
"\n",
"In this notebook, we will start from an assembled transcriptome. We will work with a more realistic example that was generated and submitted to the NCBI Transcriptome Shotgun Assembly archive.\n"
]
},
+ {
+ "cell_type": "markdown",
+ "id": "8f4cd172",
+ "metadata": {},
+ "source": [
+ "## Learning Objectives:\n",
+ "\n",
+ "1. **Understanding the TransPi workflow and its components:** The notebook builds upon previous knowledge of TransPi, focusing on the annotation stage, separating it from the assembly process. It reinforces the understanding of the overall workflow and its different stages.\n",
+ "\n",
+ "2. **Performing an \"annotation-only\" run with TransPi:** The primary objective is to learn how to execute TransPi, specifically utilizing the `--onlyAnn` option to process a pre-assembled transcriptome. This teaches efficient use of the tool and avoids unnecessary recomputation.\n",
+ "\n",
+ "3. **Working with realistic transcriptome data:** The notebook shifts from a small test dataset to a larger, more realistic transcriptome from the NCBI Transcriptome Shotgun Assembly archive. This exposes learners to the scale and characteristics of real-world transcriptome data.\n",
+ "\n",
+ "4. **Using command-line tools for data manipulation:** The notebook uses `grep`, `perl` one-liners, and `docker` commands to count sequences, modify configuration files, and manage containerized applications. This improves proficiency in using these essential bioinformatics tools.\n",
+ "\n",
+ "5. **Interpreting TransPi output:** Learners analyze the `RUN_INFO.txt` file and other output files to understand the analysis parameters and results. This develops skills in interpreting computational biology results.\n",
+ "\n",
+ "6. **Understanding and using containerization (Docker):** The notebook introduces the concept of Docker containers and demonstrates how to utilize a BUSCO container to run the BUSCO analysis, highlighting the benefits of containerization for reproducibility and dependency management. This teaches practical application of containers in bioinformatics.\n",
+ "\n",
+ "7. **Running BUSCO analysis:** Learners execute BUSCO, a crucial tool for assessing the completeness of transcriptome assemblies. This extends their skillset to include running and interpreting BUSCO results.\n",
+ "\n",
+ "8. **Interpreting BUSCO and other annotation results:** The notebook includes checkpoints that challenge learners to interpret the BUSCO results, GO stats, and TransDecoder stats, fostering critical thinking and data interpretation skills.\n",
+ "\n",
+ "9. **Critical evaluation of data sources:** The notebook encourages learners to consider the source and context of the transcriptome data used, prompting reflection on data quality and limitations. This emphasizes responsible use of biological data.\n",
+ "\n",
+ "10. **Independent BUSCO analysis:** The final checkpoint task requires learners to independently run a BUSCO analysis on a new transcriptome, selecting a data source and lineage, and interpreting the results. This assesses the understanding and practical application of the concepts covered in the notebook."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "04994736",
+ "metadata": {},
+ "source": [
+ "## Prerequisites\n",
+ "\n",
+ "* **Nextflow:** The core workflow engine used to manage the TransPi pipeline.\n",
+ "* **Perl:** Used for a one-liner to modify the Nextflow configuration file.\n",
+ "* **Docker:** Used to run BUSCO in a containerized environment.\n",
+ "* **BUSCO:** The Benchmarking Universal Single-Copy Orthologs program for assessing genome completeness.\n",
+ "* **TransPi:** The specific transcriptome assembly pipeline. The notebook assumes this is pre-installed or available through Nextflow.\n",
+ "* **Command-line tools:** Basic Unix command-line utilities like `grep`, `ls`, `cat`, `pwd`, etc., are used throughout the notebook."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "16adea33",
+ "metadata": {},
+ "source": [
+ "## Get Started"
+ ]
+ },
{
"cell_type": "code",
"execution_count": null,
@@ -470,14 +523,30 @@
},
{
"cell_type": "markdown",
- "id": "ed64e9fa-7ae6-468c-8605-7600dbf9bbc0",
+ "id": "68cfd48a",
"metadata": {},
"source": [
- "## When you are ready, proceed to the next notebook: [`Submodule_04_gls_assembly.ipynb`](Submodule_04_gls_assembly.ipynb). "
+ "## Conclusion\n",
+ "\n",
+ "This Jupyter Notebook demonstrated the \"annotation only\" run of TransPi, utilizing a pre-assembled transcriptome of *Oncorhynchus mykiss* (Rainbow Trout) containing 31,176 transcripts. By modifying the `nextflow.config` file and leveraging the `--onlyAnn` option, we efficiently performed annotation steps, including Pfam and BLAST analyses, without repeating the assembly process. Furthermore, the notebook introduced the concept of Docker containers, showcasing their use in executing BUSCO analysis for assessing transcriptome completeness. The practical application of BUSCO, along with interpretation of the resulting output files (including GO stats and TransDecoder statistics), emphasized the importance of data context and critical evaluation of transcriptome assembly quality. Finally, the notebook concluded with a hands-on exercise, prompting users to perform their own BUSCO analysis on a different transcriptome, fostering a deeper understanding of the workflow and its applications."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5bc80021",
+ "metadata": {},
+ "source": [
+ "## Clean Up\n",
+ "\n",
+ "Remember to proceed to the next notebook [`Submodule_04_gls_assembly.ipynb`](Submodule_04_gls_assembly.ipynb) or shut down your instance if you are finished."
]
}
],
- "metadata": {},
+ "metadata": {
+ "language_info": {
+ "name": "python"
+ }
+ },
"nbformat": 4,
"nbformat_minor": 5
}
diff --git a/GoogleCloud/Submodule_04_google_batch_assembly.ipynb b/GoogleCloud/Submodule_04_google_batch_assembly.ipynb
index 45c1743..d1e300c 100644
--- a/GoogleCloud/Submodule_04_google_batch_assembly.ipynb
+++ b/GoogleCloud/Submodule_04_google_batch_assembly.ipynb
@@ -14,6 +14,8 @@
"id": "0512337d-7ade-44c7-832a-ae6970a7d980",
"metadata": {},
"source": [
+ "## Overview\n",
+ "\n",
"So far, all of the computational work executed has been run locally, using the compute resources available within this Jupyter notebook. Although this is functional, it is not the ideal setup for fast, cost-efficient data analysis.\n",
"\n",
"Google Batch is known as a scheduler, which provisions specific compute resources to be allocated for individual processes within our workflow. This provides two primary benefits:\n",
@@ -23,13 +25,59 @@
"Fortunately, Batch and Nextflow are compatible with each other allowing for any Nextflow workflow, including the TransPi workflow that we have been using, to be executable on Batch.\n",
"\n",
"\n",
- ">
\n",
+ ">
\n",
">\n",
"> **Figure 1:** Diagram illustrating the interactions between the components used for the Google Batch run. \n",
"\n",
"For this to work, there are a few quick adjustment steps to make sure everything is set up for a Google Batch run!"
]
},
+ {
+ "cell_type": "markdown",
+ "id": "8b495639",
+ "metadata": {},
+ "source": [
+ "## Learning Objectives:\n",
+ "\n",
+ "1. **Utilize Google Batch for efficient and cost-effective data analysis:** The notebook contrasts local computation with Google Batch, highlighting the benefits of the latter in terms of cost savings (auto-shutdown of unused resources) and speed (parallelization of tasks).\n",
+ "\n",
+ "2. **Integrate Nextflow workflows with Google Batch:** The notebook demonstrates how to configure a Nextflow pipeline (TransPi) to run on Google Batch, emphasizing the compatibility between these tools.\n",
+ "\n",
+ "3. **Manage files using Google Cloud Storage (GCS):** The lesson requires users to create or utilize a GCS bucket to store the necessary files for the TransPi workflow, addressing the challenge of accessing local files from external compute resources.\n",
+ "\n",
+ "4. **Configure a Nextflow pipeline for Google Batch execution:** This involves modifying the `nextflow.config` file to point to the GCS bucket, adjust compute allocations (CPU and memory), and specify the correct Google Batch profile. It shows how to use Perl one-liners for efficient configuration changes.\n",
+ "\n",
+ "5. **Interpret and compare the timelines of local and Google Batch runs:** By comparing the `transpi_timeline.html` files from both local and Google Batch executions, users learn to analyze the performance differences and understand the impact of resource allocation.\n",
+ "\n",
+ "6. **Execute and manage a Nextflow pipeline on Google Batch:** The notebook provides step-by-step instructions for running TransPi on Google Batch using specific command-line arguments and managing the output.\n",
+ "\n",
+ "7. **Understand and utilize Google Cloud commands:** The notebook uses `gcloud` and `gsutil` commands extensively, teaching users basic Google Cloud command-line interactions."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1dbd972f",
+ "metadata": {},
+ "source": [
+ "## Prerequisites\n",
+ "\n",
+ "* **A Google Cloud Storage (GCS) Bucket:** A bucket is needed to store the TransPi workflow's input files and output results. The notebook provides options to create a new bucket or use an existing one.\n",
+ "* **Sufficient Compute Resources:** The user needs to have sufficient quota available in their GCP project to handle the compute resources required by the TransPi workflow (CPUs, memory, disk space). The notebook uses a `nextflow.config` file to configure the Google Batch execution.\n",
+ "* **`gcloud` CLI:** The Google Cloud SDK (`gcloud`) command-line tool must be installed and configured to authenticate with the GCP project. The notebook uses `gcloud` commands to interact with GCP services.\n",
+ "* **`gsutil` CLI:** The `gsutil` command-line tool (part of the Google Cloud SDK) is used to interact with GCS.\n",
+ "* **Nextflow:** The Nextflow workflow engine must be installed locally on the Jupyter Notebook environment.\n",
+ "* **TransPi Workflow:** The TransPi Nextflow pipeline code must be available in the Jupyter Notebook environment's file system. The notebook assumes it's in a `TransPi` directory.\n",
+ "* **Perl:** The notebook uses Perl one-liners for file manipulation. Perl must be installed in the Jupyter Notebook environment."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9449ee77",
+ "metadata": {},
+ "source": [
+ "## Get Started"
+ ]
+ },
{
"cell_type": "code",
"execution_count": null,
@@ -526,19 +574,35 @@
"id": "96722e89-2d6a-4381-ba42-673e9be79a2e",
"metadata": {},
"source": [
- "### At this point, you have the toolkit necessary to run TransPi in various configurations and the baseline knowledge to interpret the output that TransPi produces. You also have the foundational knowledge of Google Cloud resources with the ability to utilize buckets and cloud computing to execute your computational task. Specifically, Batch which not only works with TransPi but also with any other Nextflow pipeline. We urge you to continue exploring TransPi, using different data sets, and also to explore other Nextflow pipelines as well."
+ "##### At this point, you have the toolkit necessary to run TransPi in various configurations and the baseline knowledge to interpret the output that TransPi produces. You also have the foundational knowledge of Google Cloud resources with the ability to utilize buckets and cloud computing to execute your computational task. Specifically, Batch which not only works with TransPi but also with any other Nextflow pipeline. We urge you to continue exploring TransPi, using different data sets, and also to explore other Nextflow pipelines as well."
]
},
{
- "cell_type": "code",
- "execution_count": null,
- "id": "0ac8a4e6-ad87-438a-9b74-86dd82fb6823",
+ "cell_type": "markdown",
+ "id": "5213f6a1",
"metadata": {},
- "outputs": [],
- "source": []
+ "source": [
+ "## Conclusion\n",
+ "\n",
+ "This module demonstrated the execution of the TransPi transcriptome assembly workflow on Google Batch, a significant advancement from local Jupyter Notebook execution. By leveraging Google Batch's scheduling capabilities, we achieved both cost efficiency through automated resource allocation and increased speed through parallelization of computational tasks. The integration of Nextflow with Google Batch streamlined the process, requiring only minor adjustments to the `nextflow.config` file to redirect file paths to Google Cloud Storage (GCS) buckets and optimize compute allocations. Comparison of local and Google Batch run timelines highlighted the benefits of cloud computing for large-scale bioinformatics analyses. This learning module equipped users with the skills to effectively utilize Google Batch for efficient and scalable execution of Nextflow pipelines, paving the way for more complex and data-intensive bioinformatics projects."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2661513f",
+ "metadata": {},
+ "source": [
+ "## Clean Up\n",
+ "\n",
+ "You would proceed to the next notebook [`Submodule_05_Bonus_Notebook.ipynb`](./Submodule_05_Bonus_Notebook.ipynb) or shut down your instance if you are finished."
+ ]
}
],
- "metadata": {},
+ "metadata": {
+ "language_info": {
+ "name": "python"
+ }
+ },
"nbformat": 4,
"nbformat_minor": 5
}
diff --git a/GoogleCloud/Submodule_05_Bonus_Notebook.ipynb b/GoogleCloud/Submodule_05_Bonus_Notebook.ipynb
index 596a1e9..c5d309b 100644
--- a/GoogleCloud/Submodule_05_Bonus_Notebook.ipynb
+++ b/GoogleCloud/Submodule_05_Bonus_Notebook.ipynb
@@ -14,6 +14,7 @@
"id": "c38bba56-40d9-4ca4-b58b-b9733b424b1f",
"metadata": {},
"source": [
+ "## Overview\n",
"In this notebook, we are going to explore how to run this module with a new dataset. These submodules provide a great framework for running a rigorous and scalable transcriptome assembly, but there are some considerations that must be made in order to run this with your own data. We will walk through that process here so that hopefully, you are able to take these notebooks to your research group and use them for your own analysis."
]
},
@@ -25,6 +26,45 @@
"The data we are using here comes from SRA. In this example, we are using data from an experiment that compared RNA sequences in honeybees with and without viral infections. The BioProject ID is [PRJNA274674](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA274674). This experiment includes 6 RNA-seq samples and 2 methylation-seq samples. We are only considering the RNA-seq data here. Additionally, we have subsampled them to about 2 millions reads collectively accross all of the samples. In a real analysis this would not be a good idea, but to keep costs and runtimes low we will use the down-sampled files in this demonstration. If you want to explore the full dataset, we recommend pulling the fastq files using the [STRIDES tutorial on SRA downloads](https://github.com/STRIDES/NIHCloudLabGCP/blob/main/notebooks/SRADownload/SRA-Download.ipynb). As with the original example in this module, we have concatenated all 6 files into one set of combined fastq files called joined_R{1,2}.fastq.gz We have stored the subsampled fastq files in this module's cloud storage bucket."
]
},
+ {
+ "cell_type": "markdown",
+ "id": "ae57ad92",
+ "metadata": {},
+ "source": [
+ "## Learning Objectives:\n",
+ "\n",
+ "1. **Adapting a Nextflow workflow:** The notebook demonstrates how to modify a Nextflow pipeline's configuration to point to a new dataset, highlighting the workflow's reusability and flexibility. This involves understanding how to change input parameters within a configuration file.\n",
+ "\n",
+ "2. **Data preparation and management:** Users learn how to download and manage data from the SRA (Sequence Read Archive) using `gsutil` (although a pre-downloaded, subsampled dataset is provided for convenience). This includes understanding file organization and paths.\n",
+ "\n",
+ "3. **Software installation and environment setup:** The notebook guides users through installing necessary software (Java, Mamba, sra-tools, perl modules, Nextflow) and setting up the computational environment. This emphasizes reproducibility and dependency management.\n",
+ "\n",
+ "4. **Running a transcriptome assembly:** The notebook shows how to execute the TransPi Nextflow pipeline with the new dataset, demonstrating the complete process from data input to (presumably) assembly output."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e6a8c2f6",
+ "metadata": {},
+ "source": [
+ "## Prerequisites\n",
+ "\n",
+ "* **Java:** The notebook installs the default JDK.\n",
+ "* **Miniforge** Used for package management.\n",
+ "* **sra-tools, perl-dbd-sqlite, perl-dbi:** Bioinformatics tools for working with SRA data.\n",
+ "* **Nextflow:** A workflow management system.\n",
+ "* **Docker** Either Docker pre-installed on the VM, or permissions to install and run Docker containers.\n",
+ "* **`gsutil`:** The Google Cloud Storage command-line tool."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "27475529",
+ "metadata": {},
+ "source": [
+ "## Get Started"
+ ]
+ },
{
"cell_type": "markdown",
"id": "dcf2a2d0-bc91-4a2a-9db0-62f1eee91f92",
@@ -73,9 +113,9 @@
"metadata": {},
"outputs": [],
"source": [
- "# install mamba and dependencies\n",
- "! curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh\n",
- "! bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge"
+ "# install Miniforge\n",
+ "! curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh\n",
+ "! bash Miniforge3-$(uname)-$(uname -m).sh -b -p $HOME/miniforge"
]
},
{
@@ -85,9 +125,9 @@
"metadata": {},
"outputs": [],
"source": [
- "# add mamba to your path\n",
+ "# add Miniforge to your path\n",
"import os\n",
- "os.environ[\"PATH\"] += os.pathsep + os.environ[\"HOME\"]+\"/mambaforge/bin\""
+ "os.environ[\"PATH\"] += os.pathsep + os.environ[\"HOME\"]+\"/miniforge/bin\""
]
},
{
@@ -95,7 +135,7 @@
"id": "39bb00de-3481-4cb0-a2fe-098cfdae51a6",
"metadata": {},
"source": [
- "Use mamba to install: sra-tools perl-dbd-sqlite perl-dbi from channel bioconda\n",
+ "Use Miniforge to install: sra-tools perl-dbd-sqlite perl-dbi from channel bioconda\n",
"\n",
"\n",
" Click for help
\n",
@@ -285,9 +325,33 @@
"source": [
"With the subsampled reads, the assembly should complete in about 2 hours using a n1-highmem-16 machine."
]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "38abe476",
+ "metadata": {},
+ "source": [
+ "## Conclusion\n",
+ "\n",
+ "This notebook demonstrated the adaptability of the MDIBL Transcriptome Assembly Learning Module's TransPi pipeline by applying it to a new RNA-Seq dataset from a honeybee viral infection study (PRJNA274674). While utilizing a subsampled dataset for demonstration purposes, the process highlighted the ease of integrating new data into the existing Nextflow workflow. By simply modifying the `nextflow.config` file to specify the new reads' location, the pipeline executed seamlessly, showcasing its robustness and reproducibility. This adaptability makes the module a valuable resource for researchers seeking to perform scalable and rigorous transcriptome assemblies on their own datasets, facilitating efficient and reproducible analyses within their research groups. The successful execution underscores the power of workflow management systems like Nextflow for streamlining bioinformatics analyses."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7f7d2cab",
+ "metadata": {},
+ "source": [
+ "## Clean Up\n",
+ "\n",
+ "Shut down your instance if you are finished."
+ ]
}
],
- "metadata": {},
+ "metadata": {
+ "language_info": {
+ "name": "python"
+ }
+ },
"nbformat": 4,
"nbformat_minor": 5
}
diff --git a/test.md b/test.md
deleted file mode 100644
index b5de049..0000000
--- a/test.md
+++ /dev/null
@@ -1,30 +0,0 @@
-This Jupyter notebook is an introductory module for a transcriptome assembly learning course. Here's a breakdown of its prerequisites, API requirements, and cloud access needs:
-
-**Prerequisites:**
-
-* **Basic Biology Knowledge:** A foundational understanding of DNA, RNA, transcription, and gene expression is assumed. The notebook includes quizzes to assess this knowledge.
-* **Python Programming:** While the notebook itself doesn't contain complex Python code, familiarity with Python syntax and the Jupyter Notebook environment is helpful.
-* **Command Line Interface (CLI) Familiarity:** The notebook mentions using `pip` (a command-line package installer), indicating some CLI knowledge is beneficial, although not strictly required for completing the quizzes and reviewing the material.
-
-
-**APIs:**
-
-* **No APIs are directly used in this introductory notebook.** The notebook *mentions* the Google Cloud Life Sciences API as a tool that will be used in later modules, but it's not utilized within this specific file.
-
-
-**Cloud Platform Account Roles and Access:**
-
-* **None required for this notebook.** The Google Cloud Life Sciences API is mentioned for later modules, implying that access to a Google Cloud Platform account with appropriate permissions will be needed later, but this introductory notebook only requires local execution and doesn't necessitate any cloud interaction.
-
-
-**Necessary Cloud Platform Access:**
-
-* **None for this notebook.** This notebook focuses on background information and introductory concepts. Cloud access would be required for later parts of the course, based on its reference to Google Cloud Life Sciences API, but not for this particular Jupyter notebook.
-
-**Software Installations (mentioned in notebook):**
-
-* `jupyterquiz==2.0.7` (for the quizzes)
-* `jupytercards` (also for quizzes, though not used in this specific notebook)
-
-
-In summary, this notebook is a self-contained introductory lesson. While future modules will require cloud access and specific APIs, this particular file only requires basic biological knowledge, familiarity with Python and Jupyter, and installation of a couple of Python packages.
\ No newline at end of file
From 9cf17a129c0a9a51a90879d8069b85b2fca15f0d Mon Sep 17 00:00:00 2001
From: github-action
Date: Wed, 15 Jan 2025 00:18:10 +0000
Subject: [PATCH 06/16] Github Action: Lint Notebooks
---
GoogleCloud/Submodule_01_prog_setup.ipynb | 6 +-----
GoogleCloud/Submodule_02_basic_assembly.ipynb | 6 +-----
GoogleCloud/Submodule_03_annotation_only.ipynb | 6 +-----
GoogleCloud/Submodule_04_google_batch_assembly.ipynb | 6 +-----
GoogleCloud/Submodule_05_Bonus_Notebook.ipynb | 6 +-----
5 files changed, 5 insertions(+), 25 deletions(-)
diff --git a/GoogleCloud/Submodule_01_prog_setup.ipynb b/GoogleCloud/Submodule_01_prog_setup.ipynb
index 7cd93c5..e06f90c 100644
--- a/GoogleCloud/Submodule_01_prog_setup.ipynb
+++ b/GoogleCloud/Submodule_01_prog_setup.ipynb
@@ -401,11 +401,7 @@
]
}
],
- "metadata": {
- "language_info": {
- "name": "python"
- }
- },
+ "metadata": {},
"nbformat": 4,
"nbformat_minor": 5
}
diff --git a/GoogleCloud/Submodule_02_basic_assembly.ipynb b/GoogleCloud/Submodule_02_basic_assembly.ipynb
index ed302d3..fa3fee2 100644
--- a/GoogleCloud/Submodule_02_basic_assembly.ipynb
+++ b/GoogleCloud/Submodule_02_basic_assembly.ipynb
@@ -345,11 +345,7 @@
]
}
],
- "metadata": {
- "language_info": {
- "name": "python"
- }
- },
+ "metadata": {},
"nbformat": 4,
"nbformat_minor": 5
}
diff --git a/GoogleCloud/Submodule_03_annotation_only.ipynb b/GoogleCloud/Submodule_03_annotation_only.ipynb
index de5c72e..93db97c 100644
--- a/GoogleCloud/Submodule_03_annotation_only.ipynb
+++ b/GoogleCloud/Submodule_03_annotation_only.ipynb
@@ -542,11 +542,7 @@
]
}
],
- "metadata": {
- "language_info": {
- "name": "python"
- }
- },
+ "metadata": {},
"nbformat": 4,
"nbformat_minor": 5
}
diff --git a/GoogleCloud/Submodule_04_google_batch_assembly.ipynb b/GoogleCloud/Submodule_04_google_batch_assembly.ipynb
index d1e300c..e6bf64c 100644
--- a/GoogleCloud/Submodule_04_google_batch_assembly.ipynb
+++ b/GoogleCloud/Submodule_04_google_batch_assembly.ipynb
@@ -598,11 +598,7 @@
]
}
],
- "metadata": {
- "language_info": {
- "name": "python"
- }
- },
+ "metadata": {},
"nbformat": 4,
"nbformat_minor": 5
}
diff --git a/GoogleCloud/Submodule_05_Bonus_Notebook.ipynb b/GoogleCloud/Submodule_05_Bonus_Notebook.ipynb
index c5d309b..d884c61 100644
--- a/GoogleCloud/Submodule_05_Bonus_Notebook.ipynb
+++ b/GoogleCloud/Submodule_05_Bonus_Notebook.ipynb
@@ -347,11 +347,7 @@
]
}
],
- "metadata": {
- "language_info": {
- "name": "python"
- }
- },
+ "metadata": {},
"nbformat": 4,
"nbformat_minor": 5
}
From ac69cf221da19568dd70dbd1786448fc1dbb7be1 Mon Sep 17 00:00:00 2001
From: hadiparsianNIH
Date: Wed, 15 Jan 2025 10:03:29 -0800
Subject: [PATCH 07/16] Created AWS folder
---
AWS/Submodule_00_background.ipynb | 450 ++++++++++++++++++
AWS/images/AnnotationProcess.png | Bin 0 -> 457222 bytes
AWS/images/MDI-course-card-2.png | Bin 0 -> 323296 bytes
AWS/images/RNA-Seq_Notebook_Homepage.png | Bin 0 -> 69105 bytes
AWS/images/Setup10.png | Bin 0 -> 5627 bytes
AWS/images/Setup11.png | Bin 0 -> 4928 bytes
AWS/images/Setup12.png | Bin 0 -> 50195 bytes
AWS/images/Setup13.png | Bin 0 -> 5216 bytes
AWS/images/Setup14.png | Bin 0 -> 5439 bytes
AWS/images/Setup15.png | Bin 0 -> 694639 bytes
AWS/images/Setup16.png | Bin 0 -> 45732 bytes
AWS/images/Setup17.png | Bin 0 -> 5317 bytes
AWS/images/Setup18.png | Bin 0 -> 5014 bytes
AWS/images/Setup19.png | Bin 0 -> 4008 bytes
AWS/images/Setup2.png | Bin 0 -> 4936 bytes
AWS/images/Setup20.png | Bin 0 -> 3684 bytes
AWS/images/Setup21.png | Bin 0 -> 5353 bytes
AWS/images/Setup22.png | Bin 0 -> 5536 bytes
AWS/images/Setup23.png | Bin 0 -> 3695 bytes
AWS/images/Setup24.png | Bin 0 -> 4298 bytes
AWS/images/Setup25.png | Bin 0 -> 4493 bytes
AWS/images/Setup3.png | Bin 0 -> 3472 bytes
AWS/images/Setup4.png | Bin 0 -> 43630 bytes
AWS/images/Setup5.png | Bin 0 -> 6245 bytes
AWS/images/Setup6.png | Bin 0 -> 4887 bytes
AWS/images/Setup7.png | Bin 0 -> 53526 bytes
AWS/images/Setup8.png | Bin 0 -> 5716 bytes
AWS/images/Setup9.png | Bin 0 -> 62133 bytes
AWS/images/TransPiWorkflow.png | Bin 0 -> 126703 bytes
AWS/images/VMdownsize.jpg | Bin 0 -> 274171 bytes
AWS/images/architecture_diagram.png | Bin 0 -> 232517 bytes
AWS/images/basic_assembly.png | Bin 0 -> 259778 bytes
AWS/images/cellMenu.png | Bin 0 -> 11549 bytes
AWS/images/deBruijnGraph.png | Bin 0 -> 152872 bytes
AWS/images/fileDemo.png | Bin 0 -> 23687 bytes
AWS/images/gcbDiagram.jpg | Bin 0 -> 333411 bytes
AWS/images/glsDiagram.png | Bin 0 -> 117910 bytes
AWS/images/jupyterRuntime.png | Bin 0 -> 16718 bytes
AWS/images/jupyterRuntimeCircle.png | Bin 0 -> 25647 bytes
.../mdibl-compbio-core-logo-eurostyle.jpg | Bin 0 -> 204024 bytes
AWS/images/mdibl-compbio-core-logo-square.jpg | Bin 0 -> 417769 bytes
AWS/images/module_concept.png | Bin 0 -> 85200 bytes
AWS/images/perl-logo.png | Bin 0 -> 100345 bytes
AWS/images/rainbowTrout.jpeg | Bin 0 -> 149372 bytes
AWS/images/transpi_workflow.png | Bin 0 -> 362376 bytes
AWS/images/workflow_concept.png | Bin 0 -> 264724 bytes
AWS/quiz-material/00-cp1.json | 117 +++++
AWS/quiz-material/00-cp2.json | 57 +++
AWS/quiz-material/00-pc1.json | 117 +++++
AWS/quiz-material/01-cp1.json | 94 ++++
AWS/quiz-material/02-cp1-1.json | 6 +
AWS/quiz-material/02-cp1-2.json | 6 +
AWS/quiz-material/03-cp1-1.json | 6 +
AWS/quiz-material/03-cp1-2.json | 6 +
AWS/quiz-material/04-cp1-1.json | 6 +
AWS/quiz-material/04-cp1-2.json | 6 +
AWS/quiz-material/04-cp1-3.json | 6 +
AWS/quiz-material/04-cp1-4.json | 6 +
58 files changed, 883 insertions(+)
create mode 100644 AWS/Submodule_00_background.ipynb
create mode 100644 AWS/images/AnnotationProcess.png
create mode 100644 AWS/images/MDI-course-card-2.png
create mode 100644 AWS/images/RNA-Seq_Notebook_Homepage.png
create mode 100644 AWS/images/Setup10.png
create mode 100644 AWS/images/Setup11.png
create mode 100644 AWS/images/Setup12.png
create mode 100644 AWS/images/Setup13.png
create mode 100644 AWS/images/Setup14.png
create mode 100644 AWS/images/Setup15.png
create mode 100644 AWS/images/Setup16.png
create mode 100644 AWS/images/Setup17.png
create mode 100644 AWS/images/Setup18.png
create mode 100644 AWS/images/Setup19.png
create mode 100644 AWS/images/Setup2.png
create mode 100644 AWS/images/Setup20.png
create mode 100644 AWS/images/Setup21.png
create mode 100644 AWS/images/Setup22.png
create mode 100644 AWS/images/Setup23.png
create mode 100644 AWS/images/Setup24.png
create mode 100644 AWS/images/Setup25.png
create mode 100644 AWS/images/Setup3.png
create mode 100644 AWS/images/Setup4.png
create mode 100644 AWS/images/Setup5.png
create mode 100644 AWS/images/Setup6.png
create mode 100644 AWS/images/Setup7.png
create mode 100644 AWS/images/Setup8.png
create mode 100644 AWS/images/Setup9.png
create mode 100644 AWS/images/TransPiWorkflow.png
create mode 100644 AWS/images/VMdownsize.jpg
create mode 100644 AWS/images/architecture_diagram.png
create mode 100644 AWS/images/basic_assembly.png
create mode 100644 AWS/images/cellMenu.png
create mode 100644 AWS/images/deBruijnGraph.png
create mode 100644 AWS/images/fileDemo.png
create mode 100644 AWS/images/gcbDiagram.jpg
create mode 100644 AWS/images/glsDiagram.png
create mode 100644 AWS/images/jupyterRuntime.png
create mode 100644 AWS/images/jupyterRuntimeCircle.png
create mode 100644 AWS/images/mdibl-compbio-core-logo-eurostyle.jpg
create mode 100644 AWS/images/mdibl-compbio-core-logo-square.jpg
create mode 100644 AWS/images/module_concept.png
create mode 100644 AWS/images/perl-logo.png
create mode 100644 AWS/images/rainbowTrout.jpeg
create mode 100644 AWS/images/transpi_workflow.png
create mode 100644 AWS/images/workflow_concept.png
create mode 100644 AWS/quiz-material/00-cp1.json
create mode 100644 AWS/quiz-material/00-cp2.json
create mode 100644 AWS/quiz-material/00-pc1.json
create mode 100644 AWS/quiz-material/01-cp1.json
create mode 100644 AWS/quiz-material/02-cp1-1.json
create mode 100644 AWS/quiz-material/02-cp1-2.json
create mode 100644 AWS/quiz-material/03-cp1-1.json
create mode 100644 AWS/quiz-material/03-cp1-2.json
create mode 100644 AWS/quiz-material/04-cp1-1.json
create mode 100644 AWS/quiz-material/04-cp1-2.json
create mode 100644 AWS/quiz-material/04-cp1-3.json
create mode 100644 AWS/quiz-material/04-cp1-4.json
diff --git a/AWS/Submodule_00_background.ipynb b/AWS/Submodule_00_background.ipynb
new file mode 100644
index 0000000..99df77a
--- /dev/null
+++ b/AWS/Submodule_00_background.ipynb
@@ -0,0 +1,450 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "b2476ae8-9aad-4594-a003-f92ad8b0e126",
+ "metadata": {},
+ "source": [
+ "# MDIBL Transcriptome Assembly Learning Module\n",
+ "# Notebook 0: Background Material"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5e6d2086-4dbf-4a61-a5bb-8f08a269f3fa",
+ "metadata": {},
+ "source": [
+ "## Overview\n",
+ "\n",
+ "This is a series of notebooks that allows you to explore the biological and computational process of the transcriptome assembly. Through these notebooks, you will also learn to leverage the powerful capabilities of tools such as Nextflow and Google Life Science API to bring your computational capabilities to the next level!\n",
+ "\n",
+ "Before you get started, please take this prerequisite that checks existing knowledge that will be assumed to be known through the rest of these workbooks.\n",
+ "\n",
+ "Throughout the notebooks, there will be periodic quizzes and knowledge checks that you are encouraged to do.\n",
+ "\n",
+ "Good luck, and have fun!"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3518c1a9",
+ "metadata": {},
+ "source": [
+ "## Learning Objectives:\n",
+ "\n",
+ "1. **Assess prior knowledge:** A pre-check quiz verifies foundational understanding of DNA, RNA, transcription, and gene expression.\n",
+ "\n",
+ "2. **Introduce transcriptome assembly:** Learners gain an understanding of what transcriptome assembly is, why RNA sequencing is performed, and the overall workflow involved.\n",
+ "\n",
+ "3. **Explain the process of transcriptome assembly:** This includes understanding preprocessing, sequence assembly using de Bruijn graphs, assembly assessment (internal and external consistency, BUSCO), and refinement techniques.\n",
+ "\n",
+ "4. **Introduce workflow management:** Learners are introduced to the concept of workflows/pipelines in bioinformatics and the role of workflow management systems like Nextflow.\n",
+ "\n",
+ "5. **Explain the use of Docker containers:** The notebook explains the purpose and benefits of using Docker containers for managing software dependencies in bioinformatics.\n",
+ "\n",
+ "6. **Introduce the Google Cloud Life Sciences API:** Learners are introduced to the Google Cloud Life Sciences API and its advantages for managing and executing workflows on cloud computing resources.\n",
+ "\n",
+ "7. **Familiarize learners with Jupyter Notebooks:** The notebook provides instructions on how to navigate and use Jupyter Notebooks, including cell types and execution order."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6a23eec6",
+ "metadata": {},
+ "source": [
+ "## Prerequisites\n",
+ "\n",
+ "* **Basic Biology Knowledge:** A foundational understanding of DNA, RNA, transcription, and gene expression is assumed. The notebook includes quizzes to assess this knowledge.\n",
+ "* **Python Programming:** While the notebook itself doesn't contain complex Python code, familiarity with Python syntax and the Jupyter Notebook environment is helpful.\n",
+ "* **Command Line Interface (CLI) Familiarity:** The notebook mentions using `pip` (a command-line package installer), indicating some CLI knowledge is beneficial, although not strictly required for completing the quizzes and reviewing the material."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f6eefc1e",
+ "metadata": {},
+ "source": [
+ "## Get Started"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "22b95a28-fad7-4b6c-99ae-093c323f769c",
+ "metadata": {},
+ "source": [
+ "\n",
+ " \n",
+ " Precheck:\n",
+ "
\n",
+ "\n",
+ ">Before you get started, please take this quick quiz that will verify some baseline knowledge on the ideas of DNA, RNA, Transcription, and Gene Expression."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8c479507-0e54-44e1-a727-d13dceaa1c7b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# This is an install that you need to run once to allow the quizes to be functional.\n",
+ "!pip install jupyterquiz==2.0.7\n",
+ "!pip install jupytercards"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c0a4331b-494c-4054-93c8-ef8433ca7b40",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from jupyterquiz import display_quiz"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a89c8abf-fad1-4b23-b3fc-1a5933193fa0",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "display_quiz(\"quiz-material/00-pc1.json\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e1d5ab0d-1671-47c6-888d-a8d2774df30c",
+ "metadata": {},
+ "source": [
+ "\n",
+ " \n",
+ " Note: Some Resources\n",
+ "
\n",
+ "\n",
+ ">If you feel unsure about your knowledge in any of these topics, please reference [Submodule_00_Glossary.md](./Submodule_00_Glossary.md) along with the National Human Genome Research Institute's [Glossary of Genomic and Genetic Terms](https://www.genome.gov/genetics-glossary)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "39b204c4-847a-49b3-afa0-c4da4216ace4",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#Run the command below to watch the video\n",
+ "from IPython.display import YouTubeVideo\n",
+ "\n",
+ "YouTubeVideo('abw2XAg1e_g', width=800, height=400)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "45b85d9f-4115-446d-8120-02c988a7769f",
+ "metadata": {},
+ "source": [
+ "## Why do we sequence RNA?\n",
+ "RNA-sequencing (RNA-seq) is the most common means by which biological samples are characterized at the molecular level. In brief, it is a means of measuring which genes have RNA copies (transcripts) present in a sample and in what relative abundance. The sample is prepared in such a way that DNA and proteins are degraded away, and then the remaining RNA is prepared such that it can be read (as a series of DNA bases A, C, G, and T) on a modern sequencer. Sequencing machines are generally classified as short read, which produces sequence read lengths of 50 to 150 nucleotides, or long-read, which can generate up to tens of thousands of bases. Short-read sequencers have been available for a longer time, and remain more capable of high throughput quantitative output, and these reads are the focus of our work here.\n",
+ "\n",
+ "The standard workflow analysis of RNA-seq data consists of these broad steps:\n",
+ "- Quality assessment and pre-processing\n",
+ "- Assignment of reads to transcripts/genes\n",
+ "- Normalization of reads between samples\n",
+ "- Assessment of differential expression of transcripts/genes between experimental conditions\n",
+ "- Interpretation of the resulting differential expression profiles\n",
+ "\n",
+ "Implicit in the workflow above is the existence of a target transcriptome to which the RNA-seq reads can be compared, aligned, and assigned for quantification. For well-studied organisms, such as human, mouse, zebrafish, or other model organisms, there are abundant reference materials available from such sites as [Ensembl](https://www.ensembl.org/), [NCBI](https://ncbi.nlm.nih.gov/), and the [UCSC Genome Browser](https://genome.ucsc.edu/).\n",
+ "\n",
+ "For less well-studied organisms, no such references are generally available, however, the RNA-seq data contains the information necessary to infer not only abundance but also the transcript sequences from which the data was generated. The process of inferring the starting transcripts from the data, termed ***Transcriptome Assembly***, is the focus of this module."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e392c553-1831-4978-af06-83359f8de746",
+ "metadata": {},
+ "source": [
+ "## Transcriptome Sequence Assembly\n",
+ "As a first approximation, sequence assembly of a single molecule (*e.g.*, a chromosome) can be thought of as analogous to the process of reconstructing a picture from smaller, overlapping segments of the picture. Overlapping pieces are identified and matched, extending the construct until an estimation of the complete picture is generated. To make this metaphor a bit more realistic, the subsegments of the original picture are *imperfect*, such that successful construction of the complete picture will require error identification (or at least estimation) and correction.\n",
+ "\n",
+ "In order to extend this analog to transcriptome assembly, imagine that instead of one picture, our smaller segments instead are drawn from many pictures. Now the process of reconstruction will necessarily include a step that attempts to separate the smaller segments into related groups, after which the assembly procedure proceeds.\n",
+ "\n",
+ "#### Preprocessing and Data Cleaning\n",
+ "For reasons described below, stringent quality assessment and filtering of the data is generally carried out before the assembly process is begun. The primary steps include:\n",
+ "- Removal of low-quality score data\n",
+ "- Removal of contaminant sequence data\n",
+ "- Removal of known functional RNA\n",
+ "\n",
+ "#### Sequence Assembly\n",
+ "
\n",
+ "\n",
+ "**Figure 1:** Conceptual diagram of a sequence-defined [de Bruijn graph](https://en.wikipedia.org/wiki/De_Bruijn_graph). (A) Each sequence in an RNA-seq is broken into overlapping *k*-mers. (B) Each *k*-mer becomes a node in the graph, shown in the example with *k*=6. Edges are drawn between nodes that match *k*-1 contiguous nucleotides. (C) Putative transcripts (shown in distinct colors) are represented as traversals of one of the many connected components of the graph generated by the starting sequence set.
\n",
+ "\n",
+ "#### Assembly Assessment\n",
+ "- Internal consistency\n",
+ " - Use of a de Bruijn graph is computationally efficient (especially compared to exhaustive pairwise alignment of all sequence reads), but all \"long-range\" information is weakened.\n",
+ " - The weakening of the long-range information necessitates further QC. The problem is that building complete transcripts from just *k*-mers and their probabilities means that we can generate sequences that are computationally possible but don't exist in the input data. Internal consistency refers to the process of aligning the original input reads to the output transcriptome. Transcripts that do not get sufficient coverage are flagged as probable artifacts.\n",
+ "- External consistency\n",
+ " - Studies of transcriptomes across many organisms have demonstrated common features. By \"external consistency\" we mean matching our new transcriptome to these expectations.\n",
+ " - [BUSCO](https://busco.ezlab.org/) is an innovative analysis and set of tools developed by the [EZlab at the Swiss Insitute of Bioinformatics](https://www.ezlab.org/). The fundamental idea behind BUSCO (**B**enchmarking **U**niversal **S**ingle-**C**opy **O**rthologs) derives from the Zdobnov group's analysis, which showed that for a defined phylogenetic range of organisms, there is a core set of protein-coding genes that are nearly universally present in only a single copy. The BUSCO tools test this assumption.\n",
+ " - The second standard process for external consistency is to align all predicted proteins for the new transcriptome to a complete set of proteins from a well-studied (e.g., fly or mouse) under the assumption that most of the proteins should match.\n",
+ "\n",
+ "#### Assembly Refinement\n",
+ "Assemblies are refined in several different manners:\n",
+ "- Removal of redundant (or likely so) transcripts, based on sequence similarity between assembled forms.\n",
+ "- Limitation to transcripts with predicted/conceptual translated protein sequences that match known proteins in other organisms.\n",
+ "\n",
+ "For Assembly refinement, the TransPi workflow relies primarily on the \"EvidentialGene\" tool."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "725afa76-ab95-4ab5-8b85-6b6795436c0e",
+ "metadata": {},
+ "source": [
+ "## Workflow Execution with Nextflow"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "394e8677-1f82-4e42-9e37-535777128a1a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#Run the command below to watch the video\n",
+ "from IPython.display import YouTubeVideo\n",
+ "\n",
+ "YouTubeVideo('FMcZD10Qrbs', width=800, height=400)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "624b90a4-732a-47a2-987e-e37d16f124bb",
+ "metadata": {},
+ "source": [
+ "\n",
+ "It is standard practice in modern biotechnology, bioinformatics, and computational biology that most complex analyses are carried out not by a single comprehensive program, but are instead carried out in a defined sequence of multiple programs. The process of running through these steps in the proper order is collectively called a ***workflow*** or ***pipeline***.\n",
+ "\n",
+ "
\n",
+ "\n",
+ "Workflow management systems, *e.g.*, Nextflow, provide a syntax for defining the order of steps and the associated flow of information between steps, while also providing management/control software that can read and carry out these workflows. The workflow control systems (which are generally platform-specific) are responsible for allocating resources, activating analysis steps, and also making sure that all steps occur in the proper order (e.g., only activating a sequence alignment program after the sequence quality control has been performed).\n",
+ "\n",
+ "
\n",
+ "\n",
+ "Workflows can be conceptually broken up into steps or modules (see the figure at left), which formalize the flow of information as inputs and outputs. A workflow conceptually ties the steps/modules together and enforces the dependencies (see the figure above), specifically in that if the output from one step is the input for a later step, the later step is blocked until the earlier step completes."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "77e96ae4-5e6e-4499-9f6e-bda00cbbabc5",
+ "metadata": {},
+ "source": [
+ "## Running Individual Analysis Steps with Docker\n",
+ "One of the most frustrating aspects of carrying out computational biology/bioinformatics programs is installing and maintaining the software programs that we use. These programs are built by a wide variety of research and industrial organizations, and they are built on a wide variety of platforms and utilize an even wider set of supporting libraries and auxiliary components. The reason this causes problems is that the underlying dependencies can conflict with those of other programs or the operating system.\n",
+ "\n",
+ "One of our primary tools for efficient maintenance is a container system such as [Docker](https://www.docker.com/).\n",
+ "#### What are container systems and what are containers?\n",
+ "A container system is a program that creates protected environments within your computer in which programs and their dependencies can be loaded only as long as they are needed to run the program of interest. The container system can load and unload containers as needed. One of the primary benefits of such systems is that once a container has been defined for a specific program, it can be reused repeatedly on the same computer or shared with others through online repositories.\n",
+ "#### Why do we use containers?\n",
+ "We use containers because they allow us to run a broad range of computer programs without having to manage all of their underlying programmatic dependencies. Having a program encapsulated in a container also preserves our ability to continue to use that version of the program, even if either the program or its dependencies are updated."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9497e93b-b398-41fb-87a5-f1082e13a0aa",
+ "metadata": {},
+ "source": [
+ "## Running workflows using the Google Cloud Life Sciences API\n",
+ "The [Google Cloud Life Sciences API (GLS)](https://cloud.google.com/life-sciences) is a service provided by Google that both understands workflows and also controls, including activation, program execution, and deactivation of Google Cloud computing servers.\n",
+ "\n",
+ "#### What do we gain by using GLS\n",
+ "- The key to cost-efficient cloud computing is to only use the resources you need for as long as you need them. \n",
+ "- GLS allows us to control our process from a modest, inexpensive machine that can interface with GLS to provision and use the more expensive machines needed for computing.\n",
+ "- GLS explicitly supports the Nextflow workflow system that we are using, mapping computational tasks onto GCP computing resources."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "97bd4c54-198a-4c94-a959-20d440a02156",
+ "metadata": {},
+ "source": [
+ "\n",
+ " \n",
+ " Checkpoint 1:\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b64e6f8a-84ba-4279-b002-25ae4f0755ae",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "display_quiz(\"quiz-material/00-cp1.json\", shuffle_questions = True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1b46f12f-f439-425e-ba98-28bfe1b5ec77",
+ "metadata": {},
+ "source": [
+ "## Jupyter Notebook Introduction\n",
+ "\n",
+ "All of the content contained within this module is presented in Jupyter notebooks which have the `.ipynb` file type. *You are in a Jupyter notebook right now.* Within each notebook is a series of cells that can be individually executed by pressing the `shift + enter` keys at the same time.\n",
+ "\n",
+ "Each cell has options as to how it is executed. For example, the text that you are reading right now in this cell is in the `Markdown` cell type, but there are also `code`, and `raw` cell types. In these modules, you will primarily be seeing `Markdown` and `code` cells. *You can choose what each cell type is by using the drop-down menu at the top of the notebook.*\n",
+ "\n",
+ ">
\n",
+ "\n",
+ "For the code cells, information carries over between cells, but in execution order. This is important because when looking at a series of cells you may be expecting a specific output, but receive a different output due to the order of execution."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "02c8af23-fc70-4265-a07f-50ef77c12564",
+ "metadata": {},
+ "source": [
+ "\n",
+ " \n",
+ " Example: Follow the steps in the cells below\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "da4d8800-63cb-4dd8-bc2e-b28a52513c76",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Execute 1st:\n",
+ "var1 = 100"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "70147ffa-fe71-40eb-bf83-12e7cb0cfdce",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Execute 2nd and 4th:\n",
+ "print(var1)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4bb0a293-4feb-4ca3-b3fd-caf4df3ce6af",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Execute 3rd:\n",
+ "var1 = 'not the same anymore'\n",
+ "# And now run the cell above"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "927a8df2-87f9-45aa-a096-4e1133a97f64",
+ "metadata": {},
+ "source": [
+ ">As you can see, `var1` got overwritten, and when you retroactively re-run the `print(var1)` cell, the output has changed, even though it is above the variable assignment.\n",
+ "\n",
+ "In the following notebooks, there will be some code cells that will take a long time to run. *Sometimes with no output.* So there are two ways to check if the cell is still executing:\n",
+ "\n",
+ "1. The first way to check is to look to the left of the code cell. There will be an indication that looks like this: `[ ]:` If it is empty, then the cell has never been executed yet. If it looks like this: `[*]:`, that means that it is actively executing. And if it looks like this `[53]:`, that means that it has completed executing.\n",
+ "2. The second way, which will check to see if anything in the entire notebook is executing is in the top right of the notebook (Image Below). If the circle is empty, then nothing is actively executing. If the circle is grayed out, then there is something executing.\n",
+ "\n",
+ ">
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "20730a46-7d8c-4dec-a6de-3dcf10fdb888",
+ "metadata": {},
+ "source": [
+ "\n",
+ " \n",
+ " Knowledge Check: \n",
+ "
\n",
+ "\n",
+ ">Change the cell below from a code cell to a markdown cell. *Don't forget to execute the cell.*"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7ebda978-4968-46f6-a330-927a978b43ef",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "change our cell type\n",
+ "# I WANT TO BE BIGGER\n",
+ "*I want to be tilted*\n",
+ "\n",
+ "**I want to be bold**\n",
+ "\n",
+ "`And I want to have a grey background`"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d473f3b2-3440-47a5-98b9-950ceb66704e",
+ "metadata": {},
+ "source": [
+ "\n",
+ " \n",
+ " Checkpoint 2:\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a080a9dd-9ea6-4d6a-b5d5-ccd3382f09ed",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "display_quiz(\"quiz-material/00-cp2.json\", shuffle_questions = True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "032f3aa2-8e73-4e64-bda7-a4a35e0f2a99",
+ "metadata": {},
+ "source": [
+ "\n",
+ " \n",
+ " Glossary: \n",
+ "
\n",
+ "\n",
+ "> Within the the file [`Submodule_00_glossary.md`](./Submodule_00_Glossary.md) you will find a compilation of useful terms that will be beneficial to refer to throughout the rest of the learning module."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8d3cf5c9",
+ "metadata": {},
+ "source": [
+ "## Conclusion\n",
+ "\n",
+ "This introductory Jupyter Notebook provided essential background information and a pre-requisite knowledge check on fundamental molecular biology concepts (DNA, RNA, transcription, gene expression) crucial for understanding transcriptome assembly. The notebook established the context for the subsequent modules, outlining the workflow involving RNA-seq data, transcriptome assembly techniques (including de Bruijn graphs, BUSCO analysis), and the use of Nextflow and Google Cloud Life Sciences API for efficient workflow execution and management. The inclusion of interactive quizzes and video resources enhanced learning and engagement, preparing learners for the practical applications and computational challenges presented in the following notebooks. Successful completion of the checkpoint quizzes demonstrates readiness to proceed to the next stage of the MDIBL Transcriptome Assembly Learning Module."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "421cebc3",
+ "metadata": {},
+ "source": [
+ "## Clean Up\n",
+ "\n",
+ "Remember to proceed to the next notebook [`Submodule_01_prog_setup.ipynb`](./Submodule_01_prog_setup.ipynb) or shut down your instance if you are finished."
+ ]
+ }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/AWS/images/AnnotationProcess.png b/AWS/images/AnnotationProcess.png
new file mode 100644
index 0000000000000000000000000000000000000000..11781db54433ebb89aeec7c025b68f0afda7e2f0
GIT binary patch
literal 457222
zcma&O2UJsAw>FH5VgnS!LKSJ!l@0;{>0O%i-n;Zp!~)W#cS7&I*8~JbdM8pt4-k5&
zfh1qJ_nxEg`Nw(3KSsh>**omL)|%~^&zw6UZMZ}
z=RZwIjE#Q|url9v@l^ZrC0xxIIN09bKG^=r<!taKMKY1*+NTiAuyhoW4ZfbSU
z^yd3}sdOFf3y`?Lz*XYc{ke&OCV@oPu30!NbRmuHfF}xHlf&)y%7Z9l62iS_gL!}
zgf)2reo|T;uqZyGzlV3_-(P_&_mf0?`IIt*lk=6+<+&~UGn2WkriYfkhuLh5=4t5a
z>N1GjyL#o`Pxh;yKYt#twNbYn$-eHs^($J)eN!_1m1q9WT#GtL)Zbo&lasR{?L*yD
zysQ8Bsw5=&DUsRF*SPua3BDNKzq~T#@e|l|UeoyPU1`H&&Uu`dBq^TOxUBZFM0~z?
z^L=~0RHjL1(0GHB(X?GfTOkmOIUcLF=ySMpwe|YnD+{E2`d%!VQky>M=3jUJdm4Wa
z_K^2})!L9MCA|nV^gq({S&i3PFR1{dycvJ6;ik+RBQ}&Fvhjs3GP9Eykai3M0*eh=
zJR%NBl>_l^zW?8s7{`6FOht~ehxWg3{`V~Y8vI(q7Z6xGf*KVRWcY;s2!A3`^0sVx
z$yJQ@@-jTu!|x*n*B%S)cab{m-Th#>r}%#t$K9fxQvxrL3v$=9|Mq`b$Q{ztgFeoU
zKmzK_XgZnEgVjEuiQNm4%iWU9RLzlg8hCT3YXEo@7!yN6&TIQ%&UHxSaKD=dt3PH+eW^&Ya>fPe9@)6ViOK-9Td{h9
zvb><%?-k(Pw10B-X517PW!Ck75jjp3o`&5j)vo%m)3Pf{F6t*R00IERw5PKixHl)8
zc-?KGX+kL~E!&|*jdIs)=WDN!@jHsS+dfj;)|JxG&=C84{jAFMZPKJLaxpD7L9n66
zY43*DM0huXr|6&-)fT9yPmSs8%p(3o@TD_J8IvJ6`(vYoM&wdgMb85tS`qAeA4c
z&-<9*54n5|!tLcK-fRB9)Z=eG`FkL?L!w=0OE%}+P31P@&@j>)&&od&ee94jGLNn=
zX!DrM@YtCM^f|FsAfy;qvhPK%2^R?0yfiXlNpl_Lf~Q^h``Qpf++zV*LvA({3O0h8
zz(P5Ba4T$gX1VVijX9K&VJg!Le30c7Ta!CnL&mRLV
zosoqpqeH+K9uJjJ_CiV*ZsY@LHT3xDCPm^OZ}8avdeOX|9ah-bD9|}_kKivwz7)3%
zP9HaJX3_i?S;BAH&o+)^P`fTi@Q>nbldB#Qed~i@BaoB)ofwUf8*F~OTuHrzZcOYd
z&=iQ9IswNiI$#c(xZ!1F)a>I9M|PhhEo>nboMZc#i;c~R^dXwvp~NCI7_5jpQ&6Ji
zDGF=_808P3MQuY_SF30IEVIcUh${eRkI)ARjGYr1u8>7UlZXhwbH`U#z}&+QKFg6>_#64Dy*s2`p&?Yo*c4kc=7rgi7J
z9q2W~8m=ImCi}%ENwY7mU!LYBWK(x6clNPbdf;=s8ny8mDp~^-IIE7Im}ziQ;h%FO
zP4_$Ki4Vyl{$t-WL-FrDSC@Ko@7jNvzvEQ^=9Dho|9I-~2ZfWl`8vjTTg)o07J#;F
zFV|@CP&F@w+|S^sq~-GBW=S4QEb_}|bZ|g}MJ~sRa_PU5zB!!Nn_9GO?**nnx1V5(w1JheZ4OBHU`q+!=1
z{yg@NantL}($esq#$}@BrBGJux#nU6C$$D;cA@Q4%pn{=DQ)not~&TrB?R-Maj2a*
zPL63e>I4A>ciAv>+^6$Nkrl8KR)QE8Dx-R$gX>_Mr$b41mm~1Lm=TFvC*mV7mb81JK8{DtR4z&ROLe@B48^#
zceYScYd%GetqU*PtsPV5aaobwoG8(<2at>Z5nI(raq-EPZ~e-DQ4!OtxXHaBq~Hsm
zER};)%~pz22|4YVFv_mIupO;5eAb`Wyraqgm0|AG?KqDDR1QeTxS=-=wHknSLm*{f
zGiDQ&=y)-i@xRBnDjc!@H=Z~eqJdb>NjINV@oz+8Tb28?%9Z#XoiD?eT8Mzz?jKeYUh#ax1hrv#TfDiL%yiz&nFSy|oLnK#dAE_$OELa8v2%BL8k
z0{jr_`R&Wp)Fk3vDVIv1mQu=)(;j~>dOrXHm`^-8=cd~2J4YRnKOJ;9z;7*fQdSe$
z_Otx7sYt$cSf!|caCFX9tk->hAcvO5LvTXHbrPtbR%|XS7Zj1nyiiq*hWt;!6Lah7
z8^+{Yx_=(~$GAEU&!~>)-GW$(u(yluos+3DiaC$f-9o3KW4=aON3rP$XKq>kKJV59
zce6QYQ{K@TYHR{6_K)o*0blZ
zx5vL`z|!??H|T#_^z^WpCZg?BRvLiaWa|K6XP{?GB(ZTue*6f)g3K;bg&s%$lN
z3f@8mC}!JBXnEz@bC{&E^}9R7qt%$4^#r0V*@7W{;dGpz$JwafGyoW21OZ?Z5>AfJ
zkZ0~NQQ$?w&Iw$F-zfh8jPvV8uu;qz@*IUeYp89l7y=DdoFAVpH@HCnz~(cqcrAZA
z5cWMs6Gl$AKKlTXM9yO^mLnaJof9QG4Z7|Ary~!>MPtgx&;MN7pY9(wHfIWyq4AF!
z6hg2~_Nx~W`|<>-%5zC;Y3ljQ78vQC^9bmXUk+`ElM@VmS<&SZ-+GKS8xSlZg9v!2blaBTdg%CnF*M4U~a@
zSVDPr77!R`NPUtwlcvG({CVc9-PTwfq(~VLzVYYuY2S-I5x(`$o&O7x;g0JlEo_!l
z%LJ8r;T23Hm_D4K>2w)mayfe!f5miaQMItsiigF
z0IMiSg(aSB@1LWaD_}IcCpculx{E*}Hr7s7{e3S2AUlYY?n^we*?W$<@M%8t_l8kJ
zU==vIT-p)187C`9hGT%yd_(sx90inaGgD6V4xMy*9)eJ=HE_}
zX(d2}Q2zi?&QjpfRl(8y2tCpU_Bk&IjH=H9D1t*1I1i(MfY=`S2qBz^N5Y%}7F(Va
z5tH+LIo%Wj$K3kUQ;OYu`sRkppHu(Cs4wLpHhF*HY4Z6_ONClX5C(Ivd4HElEl`j>
zkiS(hZ|3HsqPQNvgmPjmAVct0UgE<
z*UA1;@IZ$ooORYOeh)_ctA>A#z~=y;nl`T{=&7arC=i`JjEY^?Cw;!m1Y8CZd6eNqe@*Yoy+V^qaMk(n!Oou={2ya64${V*
zX2d!!_fpw-ZDMjo#ASwNqmj;|hAY`o_&jth;^Z7>Eym)S{PmhD8G_fCVo6zI*0@7!
zBiwD1k;U{dz$N-cD8}$#VI~jnsJ_(f4qz!D1A80E+U`USfvBye#bPlGZB8EHKil{)
ztU?y+ZM7!LjIa21!am+}YTn3iQ!vBM8ccSDVl|T#^Q^MLzUP#ThZh)9dfl9hqPryZn@?)1rWd;o|cbqF7u$U7_NJYKGZ)_dt
zkbEif)rsKUe-}9e{*Ndcm1{TdX2!(CXom^Nu1ygHjfkHghHpcE90p!_a)aF5>viaK
zRg<^S`PhaMzo-GE0x)q)Ff5@on-h|*l-jS!`O0lBQpnSKk$A4uC;#{?Hun;=$k9v!
zoLTPG$a%j0I>b5VlaE+NsNnt`7z3u%$I5|J3d}I?e0zp=m%6lam+*0`Q
zZyRdoyUfLB`qesB|9x|nuhv)&WCx&8Bcpi=W`rw<1ZkKn5tZ{BBVQ>w3nsDC+gx*x
zPr9`lFkP;sHe0EnG1L)590(+%L=MflGbHo@b!gw9Q#@v%xxzX}j;VmXd7XX?{yJGo
zs8W`+cLA_YoTox>a2r<8UpJKCJKaszu;EqrW!1D4;+|;Wodl#uI0EyJo6VMclQ(DH
zqT#U?k>|^ManKjM_P@T0qSomy5&6=K9Kzq8%vLz?8~H9RCH{Ybh`LuP`0O(SuMy;e?2N=lP)a(rP6?sMYSipUNQj1;;o_-|d8qdttzm?U
zKBdi=d@-s1DLt|B)B?ciTZ=fCJ#>afF@!L#XMA3N4}>nW{eT^TCp0m~3Iu5bUC9!>
zp~)XD$l|#Du_v$^0ug8El-rWv$r~LVZ<0%#I|DNyO0XHu6juZMfvqz0FY&IDk;&spxVH+z8FX
zEF0wBirwc)L650z4%a`?qGPV2rN;m9uAp8DpTA~l+RB{(tofZL9@bdJW8*OwKBNJu
zkoBLK9o(qgTjegZvV~11oNQ}Bc!#90G7y0f%OL~X483f_xtZaY!8@if3K+Dp8CB-Z
zf(7~QOQcx93p3~EW=2PXcPI-uiX7}JtnyL**Cw0>)X#RK!b)fYjA@#Y#o~wXKAI>*
zKC^oM@IuGOjg10OI&qi7L%IhIO7HYayEdVMu0_!ApFJCW$fLo1&TydxTT+CI4m9v$A8{DSS?Qk|
z;}4|l%De$*d^PncZo%L%k$j5T_Q^>fx6S-3f5vCh_+$o;2AZbW&&_I8kcz9ojUyVr
zr2rn+L6T;RE>^HzJ*J;~X!y&0x^)#GO8q}_p3HobpWN7&bXGKz8pMl_e(%VC`0oFW
zhL>dL(e_N@+j9LUb~9Iek1`+8U|7i>uB1lTPw88UvpekoY&RK2;AvbF1*v;!eW#}p
zXEauv*`mRQjuuN`A2$ajP@9XpsX8DYyg7{+GBEJp#CE9!WD<*_1AAF}i?kZ*-BvpM
zNul3QUEjXd@^4xLviPzhdCM#E&mix5ST?JNr0#18zLBmV33W^AT|0&R>>m;QC^#$F
zFpk7U{vkavrVON$kN0dMN2p|zVVlpfMg-i#NtD&@S{-#@@WC1F_qN6WN%S1f11
zgD2C)-gHY(uPVV*jKn}dkuZ;`svHOTJQZHM!EcT_M>ZJOGsubK3@NZUb=+w{$EcQ#
zyHT{o0*C?iVGASTbyoBqmz7WtH1X>`^!B{bKsk{hx+mFf%Yi)8RfA6O4Oq06upj+u
zq@KuWs=!^81MQrWONapZY3c_UilX6b9ksA_gbk`xN^WnmOi$)AnU;%XR5hWk50Ami
zy7HGJ%akVhSrKj6!Thhz885Ju;oWZA$taEnrLZ%?1?-4cguU%Zt%*IbS$ggre3rd{
zyqf}%Cg9muv@%v#5;?-@_xOI>Nf1oEX6Z(By((ueIEDFHlKA(dDfR-jtmDuV+AxiS
zjcHuA<%T>%mk&8kC61vI>OB<@>aR*tT0~n`HI^G{+lq(MIm-vOVKEj^tA;dNqPhsz
zhBSFmgO-v`_5zBAJd^=w&Sg31$a-7^{w94|s!H7-wNc>K$KKF&oOS=N72@IBKD+XX
z0NSPbS>@m5oQJ_6axOffm(62A
z#wR9@d9F#O*Z?>(i?|<-F6EUJ$u0Yg=Skezgy`|x!t#rbBpFRx;^jdVL1bX(ow0I3>!6S%SM<1=>P}*LCUpL6
zn#(fCpRhCCbITaYymLE`t0odKXsGC?Ha@y)_f77@=m}tue=BETyWY-ryQ~|D!8NtBWCRYoBBG0hXdWt;|uN)C{Z4_q>^sydtm_Zk@slN
z@LJ8c-q;))w|#e&(M0vN+y}3WE2!O&t4yl6xtm!qCS|^d7@Uk-Zr+P}we5bN-LPIL
z3)tI;D9X8ozle9mfNpywGfgHVMNDfBbCo^hR_>do#*6kg&&)(>v|VQXeQee3=beY{&^31h{EOPS5EWQ*
zIWySlw;LZ%uO*+%LDMX+$&KUiB37)k-0K4vgJCk7)pt@COC%Ht<=_PsGZGCqt*4AI
z;<27>QciYUVQWdrOh51Mk!n+XFVp-0&P33ZYjrBq`HmoCM7aZYQEK$=sL)Ma8l^(j
z$RE+TPaIv}%R;wBxqpSa=to}a@>mb|Mk^50FnaPHy7^wjcUT0PzJ{M^vnQvot#*I}
z^@0a%pq^o_%Ba?+K*5beYQTyksp7F_>8}m~8bznJqqp`ne^7#%jQ3^HMrph~3a3~9
zC|~M0&axM(Tyj-_>vQjF+j0?LeI0-J_>VZFjzb2zmi`6jV-_ag#{?A0Ou5f&m7n
zl@Qu0U3Zw)koHpxPgNH7@zk4ZG17AFy$(7iVFYK(ZJB~_2u4DXV-C}9$Fnf5$1?ZJ
zs%bwKTmVUGCsHmS(zm372pvs^yJ<^>aP9-mY#nA9*;E?Uftn~Q1=&F~5^wf}mz&*jaK(;%fX<-06Jjo{StX2C`q
z;+*r(P(T!6%wko4>b1;(B!+^UbHYnxLWgmwW*^P9Gp>1;H@UK+5?H@8$-#6@taz@%py|pH6q83%4*s)^&WE$EL*M^15|H3(7vs4T
zpE&b^}oOtaz252I#IC+f{@roOFt%xkOx*h-or$I@00
zZJ!Lzah+%~v}
z%th*khR&ul?v#^DVO!4xn-3=80~2q1?AD_e+rtWsP(RC+W>KzteJ?-mW-b!9EfUv3
zRo(5O{f1hr9*!@rD8Mo3mpCj<*Jx1C7#pl6SZO=pL-ETE>0gmcQl!>W4cA9c>StVe8;hD6v5Lpp
zfvw936>7*#x6WFUKZzBlmKVm+{DsePHmR6m`?rAJ(A|76d6RPi|G$9Jx1?SRfduQR
zPT-I_=R~SkqfG@glrwPtW(vr3frD-Zo0iUi4xbv4W3i`Gsb^p7NdG~zT$#nh%gyVe
z?#~pNDC<%QZ>XNZQpPM!dG077CKG?L`UiM!8BRY)g<(L@FV{qP_Uq({=X@Bqf6=uX
zZXheHlE*~nvzUG;@-Rt{tB**6n}n7wRGydR?i~}Tuba58#-t{7@0BR${fJnR40}@Z
zXbd!P#J{U}!9n{LSA87=q%J9#k
z0VxNae%tkD2E-6KPXkxmYvdX!W>iF{?CtYtS|Quc9aI`HNb{DR>(|e^
z0$=(~51g%aWlY%d)0**B(zxY&;G~|@F4dZy|S?E#rx_Nzz&YGsv
z13ph>y||>wR~W7myB$*
zxxK!Q#S{Ui%x6fybFJVBx8*RE=d5_5zDo$~v(Rs~7=Mfl!T=YwU?%8e{>7WWPy90+
zw{uDvcLouT;|w*0-^sA_mJ`7J)^?R^;xEzpi#6c4aiw?cT1Oz>F((b=4N)vZFK;|M0oJxJeiESdV8aVfu)ca{!BivWL|||9G1d0KusIe!n22z@D9Luq*!Do*se+`4`mVAZ
zeALA_kJY$!0}ZwA3*iF#;epvB`b}MJNY*D0UPVjn1H(nvUBZH~jqZ~MM7iDIf&MXk
z?fBwCm_m#ThZXdday3h!V~g{e>XeW7$Cw&OjZPR4)F{YYIb7z#RO<8R>u$S81vBlj
zpG?)gcH&hwYLUC1sU#Led?9@G=_Zn5XNyEN1tsO>BRB;EF{U4a1O&bjk6TkMTm9U7n)A#LqeUJh
zqtW}uir%+cCDEUQh(gh3;N*m19wsY8#)7(N*~LC{rP6}8afXH~^9G+^=;cG2pB<#t
zSxXx`OFp3@bGe%^ik`PKzu&Etzas5$G%hJu>TegW|pI|7yb3+Jq!X*`tz7YAlK{BDP>;NlST
zdZNC0XZ;~cLWJz_n;l5%gL(b#_$Yo)t;(;wY9{I$U-K)l41v!N*>sb?ssl>remH^a
zdvP(gW2SC}lznO2&grW8v_m&^@e?L7AwGY25k;BcEesq_@B-V1FcOu1ndUy55uCO`
zh7A-^u`LH^JIx76xeXQ(=sZEkAX2{7c|eUUc;M}U-Hw1TjQL0N*A!kE(u9cNJc%d`
z)l?@o<~;8^iK$#OB)AJpI8UNaserlSYi8+#3Ls9
z9u}7hF8Mwm2oUu-r?L5&c}+2g>pnVyy));viFzp4U8u%Z$QMTYI6(;=XaIEq*m@AX
z^J+IChx?~CRY4Qg?kN!a)AN})SLrgOGx!&bW?u)LkajJ2r|FnS$P1Jay+KWhrXDb=
z{Z=dGUla#Zti_i?2)Tl$1KU_kRa>~7Z#USH3I28p
z;{umlkF&z<;NRrVKg*`RH%pCh1)IjjAnlK*=23PWLX*eNvIN-0+@>;ohvt0>hdUPP
zQHk4To~#B9xjKm%wH-}*3Eo^|fX&au?nwhc-vWF#XiO1(FMGz=HV
zXgI3}{T>75aeLu@eRlQ61?ThUFN9KpZYzekU-AHCiWNo-$*$|HQOl&Pv$bYR_>=~*
zU!9y|*ELmf?KQW8dby)*(K`Q=oh&O3huYZ*IAye6YV-$4|JOIzs8u%S`?>(TgQL
zp%koi795|JEP?S(E=XY)mgKVQ!
zO-}geLja`Lbtj>7f0Gr5c;gn-Hutmq%?!_lxxPS6AR^5(e8)OC6mXPr)MEWY&c-Gg
zX&;P2!;LON${~+4&Ek|Sj&b?F#mEYmYu{)eOz08ZQHQ=<01F+j44^9Vf>A!NF}+TJ
zPy3s_7aO3%b*-f|*v~g)i|F+^Vyg=&Ij@e)vxE
zeEG&BYaI>;%T&o_7M}X-z>WYCBf^OFv{{QZ)-4L_=d~({gL`F@_JJ#
znBdvYv($&s0^|*v82c|aI!!JNxH8_edg6%R`0XnK|F-Mgm0!faUGVRd|C>R5O~>!;
z>r*#`sa80$H^2K?@vUv9{W|z*P(!mIJlA3lGpXd%<1NsKC?cF?nVCydSc=wvOQ<`E
z5o41|X!WKZd@3Oci!~bvd2yWb+-7?$hw5m3bXt)LfR$U*NLB%-B_|#AF}tqM6d?<(
z8u%vFU|5rn#CeI`N|F69mms^t%R>QTA0uZ{Z-5#(kG6ZD!gW
zmYuhb;6M^7Vcc0aRz#tb#*dDxoF!hzH9;+GDM*?p4Z!w_+zaz%ZnPl4HKYHX#0zQv
za44-0`pk@xyVJc4XGk)7GiE{gdO#v_-NsKJgTCQ{`L?M)5kq6khtS)1F1e#xH}A)B
z$8-c-QuEu7b4s6C^qD(etXt`z^#?rfmLSRqqR0;rUN2~xNC)Zk6yHY
z*ig|JCbSu3`1;y)E_70#F|t=dsUDlxpGB7D6&E=^>)~aj>OZO@p406E<(L{XZ{#-&
zSh$aV4gd%dBV<>x;1=LDWf3B(
z5dtfxFxkID-_l!_Z`t|va`d8NrhE8g>`@Zm1npc@vN1@jTL!)&Mwcj6_+04i^s3;S
zloUG}xHGixOMc@~V)Q;t2mmM3u>SCzMMc#+ft^JWJ#?^zO!Mxu9?P%(GVN3_kkh~v
zka$e`#GHkK&+&FjmtmE(gG*j=O3c{oc9dD64^LFDj2&B!ONtCb<63!>9udQGcYeL$
z3NVdSL@DT~N7LLH0onp09*(?!e#=}SU{6X)08Al5qKftr}0?T20!wr_Xdfj+$22FM5~+?kQm$iTur!KRS=$x-!Iw
zMbFA-;xsroJ>j%I2j8RCi|<6{wF;-l23re8afqBecePb~C4}vb=5bmKrT3@tLtXRV
zE1>!E@m-M7(JK~5eSloPtxH)d1Lkup|ckxB;VHsS~8qoub%wd!>zFrZdg=xq`e)A4ML<6=OOLmYMd
zSZgyX8@Yi}81!GleuZtTG|Xdfm%%avH5UaiQ~>nhSm%wn-+?Ie#ywn-jZj;%@o!oD
ze+E<1#%uk_xR%3xt?y~?RvgFk-AQYt`ci8*UemaBYP%kfv&-=AX$n$WXG5Pl_un$=
zGc{9Tqo-mO1R|43rpYJP1{p|ZIaB-Ujll=jhj&JBW>e6w58}4lQ6no>6ZOQn`0VBs
ziGTeE>
z&$zwQR;k0VJXsT#<~VhR0=A$e=N;8t!6?qA-o~csZ_5Hzxm+gf(@w9_D=OP8T!|LG
zFoN?-B~tT8r1jL511&m`Lk8zt^=*Q6MOprj)I_kwU$>V4st&Uy9yaB4-}DE~qtCuW
zD^=vTfME7sO4|LEvx_RBinZAadw(4>htLy`8VC3eXgbT+gxWQ2yxX
z&(j}-h-v3ma1W;DVV!DUS|6(K=18hAh|Co{&NU#V@ciDy*Iuj!Vn}xilmGo_V?59|
zb*yNfP?I1JS4bIw8UdAF?KGPGN0Zu9t;-%YEiEFC-v#hLsNYM!DE5%lYp;WVw1Wkl
zETIDUWj|Q3?sjniM6HCzN`y{CSV+2$nRL9=$9;@f#IMknUaKPnr
z0Hg`NU81vuNIZts1dn6~6jf}#xlWcG`&yYhb^2maELl&wRsMef7qXEUoD}F4c3DxyUhSpvIn>I
zZlJVzrm$YuxvVyiqA1A9-5W9cE*{8O&
zS&;a2x_spXqJI9cypo+$yEU|)JCJMf)zHBHaX(TycM41j8eg7fMAqD>W6&kqx`UD+
zUV`6oZ*(nmbQtqVDhR-IJ}JA^k$8UYxno1{{YG!)`ZZ!#eLekXZ9ZE%3U
zXJhr_{=a+2bbM;y#lChYDuvb4V|#4maN`HN)6-xIh?=X4*VpPBT=XaBdhp^BE;NR(
zw#9jf``u`(i~p`-iu|32WpT6}Cr(k*5w=j4+qS08EdIMdt_WPkCFU{qq69R8%UBib
z?E1Wp_53Yq*-PA84tK<`RHC2ey-p~EjxGqSx0@qw8IX#mSYEX_=5s$1@0KZ>YV!2;
z`L^yQQWiPQoNfb{{^fdX54ZIr)p
zF-!pb`rxUb(R3S=dQN%YUfVN&)_C)Gqs&i>tEYNO92L~|e>UF#1}eC`K9G&RUCX^c
zg_m`*x`Sya^_M>Jb+=R#R{(5vGx*fFve1;uN%krj&ZsgSgL{quv;ZMxZaQC
z?d6JV$D888TQgxqyfl-!)wixx+&k_*-wo|U*@aCP3EGWT_sVj(N*>{$M7|lX=gboN
zrh`gA%IlpTZs)Mb?OCul{hPI(Dw>fHnce7hEx`=&3$62uG=?T6>2(xRf20dtURoi8
z=*el6s!i3u6ZtDt6Hz44i24G5WT)?Mq7!U4m|~FHlwR=`Njfxw>%to=ZRWAk;eaQ{
zhNg`1tHT+BL`~=0YLo?^W*b=;h(w8;i__ypklDkMRYXk7Vt71f8rismr|UmqM+Ett
z(4UK}UCs`6UG+2jV;B^yalAZ_aXTLMC;Z>(fS`&P$BhAR{)K<|%0EAf)!*|v=wUKb
zV#XC7Nu#Djs+!ndH-GPM(tAtRyg1-J8@#F!)K^aAsh7gPE~V_6K-69ANIkUbf;@B*n2EHyXc)Rt53bST
zp5s!9ZwJJKjjn#GqMYI25RdYI>N%vt0=pjcK4`Ba7619ZTITk8^LlI`p
zPDYy^?TVJvAlROMIPnNofeyemgq}B*3rS$rb(R!<{#-|O58yg^&?vXiFKt2?JJiyQ
zPQH@WWZL&}0*;chG_BP$_{%g6nyBUKaCpX>g7FI{&*LTOqnN8YYyC#LA@zlA*?KdL
z`O_2N+o2!{5I+^`aWKM2+;@AL*A(>#HkgI04&9m709;@Y4Kmw~;0Zn=jRpE9yJQDG
zqwf)j`=A@k`A`@-C4s}pE~N{_&!m`z={8XD1ROk!k6UME7b1x2&p9}7!95A0V!+#_
z#ZrWIQaVjICUmoBVRm1UeC?(&%*_d0I*guJbSy2w-lx@5^`(au7&UzP`QdwsHEh`_
zmjlh2C86Au9N_=sosjpszuVe(;t_8xVR7i((KC@FNR;(~r|UptaVez4(0_q1vAXnC
zLleo4tOc8Qt412OUtIgttdTlkX%Zj=nbcN8muEuK1lz%^(|u=|Na{=#rlH#@ZbaoS
zI_lSP4-W~=`CvsgHID-=B=c$aRG=fMep7<0D3vEZ%tI
zSB+`CfQ+1{9z-LLCz#B;unI#WF`M{WDH~f%X=o>(Q%*?)@Q$ecvN9=rgwb6&U`{CWSe{BpG<0CyqAu4PJ(*&y+GmKUA79A|+`y(8o
zLDyFvj&+PIH~qFJ;bi7FF-T;4%~O6HO1yTDU+6#CZr@FKyMtbtH{
zX6uC?YVEwqZRVQOJ&@26jtSW`EndPhF}6R8kMzq~w#!-e$XTv?Otp_`;dV}LdQOpD
zUzYl7%>7~^T}nZV6v(2`vOAuP!N7Cu@$TD=^TJ;qY@O!J!ReZ(Hz9y$r6mI?!6Le$
zCq}EhS~lgu4$g>*m=S%e#VYSL@+_aOK&r&~Q^nWe*Z7D;R~d!iwI8jCJ1+@`@m3oN
z-Fm+W!Zk0e(UarpavDpHYO~M*O-ZY?Mi6tDvg%3#Y4Nou{yV&h0d@WHM15Hm)04Od
z`E(o2FhFH60W;pgrik*>40!BIt(uf95BhbTfeCX>~(1hg#|7SYyG>958j*Qg9Dy%_v$
z%ZYqZc`DW&+3~k@I4J$sl{_WDh>MF-xy#awX;B~cN!Qq$1ZQ0*kydkjece1v
zhd;g(FT^@C#Bm81gv?7F&CdlDstJXjOU`OM%H$QiIaux%QQhb@;|9H}=A(-E7K}4p
zya(#&M5cxB+~3*VC<76o_QVld-RoR#LDk(sQ?0yIB#Z@e1*+lVp{V~W2k`Cbr^PLb
zl^U9AI?N!OYZ2CKW6D5gzUbY3*a{b!K6Q(03LkT4^GBP*h7p*#5N>uV|P||JjtI-IuB7j6tn?&(EEP8tu5^uMMtxTio!12qDY1W}*zJ9%b7V_h0Hax+baxz_t7csmrCLN9mAt(Y
z0Ys6YkTgV$qU;Q@U4b^Qj%;b}-s)dab=tQp(d_Y2jrCB@d}8wW8kLE{NMa
znY%CkN0Xb&2V9%Di)pek{lD^X=77uRky8YmO-r+ax1oKth#Tc=KFXg~W_#sai+YZ$
z?3bET+u8DBQd)BAU{rKg?3Emj?!0~aJ+NY(y8Sf$`Pw3$XZ_+d=`;MqK2q}Z77Dud
ztI?{yl0a%SrVA4-_K+xCr_J(XJe<{J-uGCI!hAF{VEbpQW1hrJzS{}!$Mu}5#F-Lb
zbLC{GsPq>5ph(pHZ>A+)6ScQMH=h+17Pe=6>EzvD@GYVUK+xyZWV?{1Nf^`surQ!&fYk&p2f5QB@{vD_`<|O*lOGi7IQw@SX6OIl
z+g>F~mXXBO(v`4Q$}hd`#--MpvmG_;uZpNK5alnHeB=;>ze6oeux@L<#vmC*krT?a
z#GYusIomsM`-0SzWJDHy(SsR7aa^VKaBvJ5hM&_IDC%ze2nWjhwjzsIu_G-f?MC&88zbhLb}?pMPc$|HM3dS|diIZKE!Q%QJ*
zx*pd@F}oelB&Ms8fGiZ@;&R2R575*xd*f6Er+91m%+WV~!4Q(Xs;3voA)ATfJ>4=%
zg4=(a_tURt_y_$}!3UmIl-!j69?)?%{wVO0>zMcL293X1Y*Mhn_0gN&_27?enwgtj
z^{E9lfV^AuJ`gu*^;Tcc5C*$A^q|HDhUj?qj}#CCNAz_AKX9aG#D}%QwjPvQ?OK*F
z!H-hyx`b0B`&6ECJ7wJ>yHDQ&s8w~}N*no~kPc)t#3isBDc~OURx9urC{#FqINoew
z=6Y~bVIyf}s;OW+?Gtjs)756M)#AA8k!Vw{Ys*8v;R5$QX=ws1HXb$fqqvX7a+UA0
zr|M2(DNp^|r1q$wwc>>AOqrFFizD~LqZ7b&2T3S;TL9IU9L%t3LR$