Continuous analysis of SARS-CoV-2 allows monitoring the virus spread, understanding how it is evolving over time, and facilitating the implementation of effective public health measures. This course aims to provide an introduction to bioinformatics through the analysis of genomic data of the SARS-CoV-2 virus. In this course, viral genomes obtained through different types of sequencing will be analyze and the different lineages of virus variants will be determined.
- Learn to use the command line and BASH scripting language, developing essential skills for file management and running bioinformatics tools in Linux environments.
- Apply command-line tools for quality control of sequence data.
- Identify the formats of files commonly used in SARS-CoV-2 sequencing.
- Use bioinformatics tools in SARS-CoV-2 analysis pipelines, applying Illumina and Nanopore sequencing techniques for the processing, analysis, and visualization of genomic data.
- Use Pangolin for viral lineage identification from existing datasets.
The course is aimed at students, researchers, or clinical/healthcare professionals. No prior bioinformatics skills are required.
The course will use Google Colaboratory, which is free but requires a Google account for use. Google accounts are free.
We thank the team responsible for developing the SARS-CoV-2 Bioinformatics for Beginners course, available at the repository https://github.com/WCSCourses/SARS-COV-2_B4B, whose content and resources have been essential for the creation of this tutorial. We also acknowledge the support of Wellcome Connecting Science and COG-Train in the development of the original course.
Module 1: Introduction to Google Colaboratory.
Modules 1, 2, and 3 are located in the repository Introduction
To get started, go to https://colab.research.google.com/.
On the Colab home page, select "File" and then "Open notebook".
There's a tab for "GitHub"; select that tab and paste the following URL into the search bar under "Enter a GitHub URL or search by organization or user":
https://github.com/cabana-online/Introduction_tutorial
After a brief search, you'll see the notebook:
01.Intro_a_colab.ipynb
Select it, and you'll see the notebook open.
To run the cells, you'll need to sign in with your Google account (feel free to create one). Using Colab is free.
Note: There are limitations in the free version, but they won't be reached in this course.
Go to https://colab.research.google.com/.
Select the repository
https://github.com/cabana-online/Introduction_tutorial
After a brief search, you'll see the notebook:
02.Module_2_linux.ipynb
Select it, and you'll see the notebook open.
Go to https://colab.research.google.com/.
Select the repository
https://github.com/cabana-online/Introduction_tutorial
After a brief search, you'll see the notebook:
03.Module_3_NGS.ipynb
Select it, and you'll see the notebook open.
Module 4: Data quality control.
Go to https://colab.research.google.com/.
Select the repository
https://github.com/cabana-online/Course_SARS_CoV-2
After a brief search, you'll see the notebook:
04.Module_4_QC.ipynb
Select it, and you'll see the notebook open.
This module is divided into two parts.
-
Part 1: Nanopore data analysis
-
Part 2: Illumina data analysis
Go to https://colab.research.google.com/.
Select the repository
https://github.com/cabana-online/Course_SARS_CoV-2
After a brief search, you'll see the notebooks:
05.Module_5_P1_nanopore.ipynb
05.Module_5_P2_illumina.ipynb
Select them, and you'll see the notebooks open.
Go to https://colab.research.google.com/.
Select the repository
https://github.com/cabana-online/Course_SARS_CoV-2
After a brief search, you'll see the notebook:
06.Module_6_lineages.ipynb
Select it, and you'll see the notebook open.
Module 7: Phylogenomic analysis .
Go to https://colab.research.google.com/.
Select the repository
https://github.com/cabana-online/Course_SARS_CoV-2
After a brief search, you'll see the notebook:
07.Module_7_phylogenomics.ipynb