Note: This Syllabus is subject to change.
The goal of this course is to initiate wetlab biologists on the Ph.D. track to some basic skills in bioinformatics and computational genomics, and acquire basic computer literacy needed to perform analyses of next generation sequencing experiments. We will attend fundamental concepts in the analysis and interpretation of genomics data with a focus on single-cell and microbiome technologies, and explore the structure of key data types and their associated file formats. Twelve lectures is far to short a course to be considered comprehensive. Nonetheless we strive to empower students with basic skills to advocate for and analyze their own data.
The course takes place over 12 weeks with 1.5 hour lectures on Fridays 10:30 AM - 12:00 PM. Most lectures are accompanied by either readings or some light homework or both. Any assignments must be completed on time for full credit, no exceptions. There is no final exam.
Late assignments will be given a maximum of 50% of full credit up to one lecture after the original due date. When homework is not assigned, points will be given for attendance. Readings are indicated on the syllabus. Each question set is worth 10 points and will be graded based on completeness. Links to lecture slides and readings will be provided in the syllabi below.
Attendance is required no exceptions. You may obtain permission ahead of time or with extenuating circumstances after the fact from the graduate school (email to Emma Yates Kassler). Homework assignments are required on time (see schedule below) regardless of attendance or for half credit one lecture late. Unexcused absences result in an incomplete grade.
After each lecture, a link will be posted to provide feedback. Attendance will be recorded by the completion of the "5 minute feedback". There will be three questions, with space for brief answers:
- What was the most important concept or take home message you learned from today's class?
- What concepts or topics from today's class do you not understand well enough?
- What suggestions do you have for improving or enhancing today's class?
Frequently asked questions or topics of confusion will be addressed in the subsequent lecture.
Lecture topics:
- Introduction
- Course overview
- History of bioinformatics
- Introduction to Unix
Lecture slides:
readings:
feedback:
Lecture topics:
- More basic unix and tour the unix file system
Lecture slides:
feedback:
Homework Assignment 1:
Lecture topics:
- What is reproducible science?
- Git Basics; creating and cloning a repo
- Adding, committing, and pull requests
- Reproducible science: the compileable manuscript (Guest Peter Nguyen)
Lecture slides:
Readings:
- "A Quick Introduction to Version Control with Git and GitHub" Blischak et al., PLOS COMPBIO 2016
- "FAIRshake: Toolkit to Evaluate the FAIRness of Research Digital Resources" Clarke et al., Cell Systems 2020
- "A Beginner's Guide to Conducting Reproducible Research" Alston & Rick, Bulletin of the ESA 2021
- "A Manifesto for Reproducible Science" Munafo et al., Nature Human Behavior, 2017 -Optional-
Resources:
feedback:
Homework Assignment 2:
Lecture topics:
- What is PCA?
- Overview of hierarchical clustering and its pitfalls
Lecture slides:
Readings:
Principal components analysis primer (2022)
Resources:
Step-by-step PCA (same author)
feedback:
Homework Assignment 3:
Lecture topics:
- Understanding NGS file formats
- Understanding NGS quality assessment
Lecture slides:
feedback:
Homework Assignment 4:
Lecture topics:
- Search for characters or patterns in a text file using the grep command
- Learn to use bedtools to accomplish genome arithmetic.
Lecture slides:
feedback
Homework Assignment 5:
Lecture topics:
- Understanding computational context of RNA-Seq.
- Learning to judge data for quality metrics of RNA-Seq.
- Quick differential expression analysis.
Lecture slides:
feedback
Additional files
Homework Assignment 6:
Lecture topics:
- Single cell analysis workflow
- Challenges of analysis
- Quality control
Lecture slides:
feedback
Readings and websites:
- Orchestrating Single-Cell Analysis with Bioconductor (OSCA)
- Seurat
- Scanpy - single cell analysis with python
- scTransform
- Pearson Residuals for Normalization
Lecture topics:
- Data integration methods
- Cell type annotation
- Differential Expression
Lecture Slides:
feedback
Readings and websites:
Homework Assignment 7:
Lecture topics:
16S rRNA sequence data processing and analysis
Homework Assignment 8: Follow and complete the tutorial for dada2 prior to July 28th. Please also install the following R packages: vegan, phyloseq, lmerTest, lme4, ggplot2, dplyr, ape, reshape2
https://www.dropbox.com/scl/fo/eepu0rvg25n4ycml7livv/h?rlkey=yedzeixm8192u9wn9zgks3fux&dl=0
Microbiome Analysis Part 1 Homework, due August 15
Microbiome Analysis Part 2 Homework
Lecture topics:
- Color Perception
- Visual Accessibility
- Visualization of Data
Lecture slides:
Helpful Books on the Subject:
day | date | lecturer | topic | due |
---|---|---|---|---|
Friday | 05/19 | HAZELETT | History of Bioinformatics | |
Friday | 05/26 | HAZELETT | Basic Unix Skills | |
Friday | 06/02 | HAZELETT | Version Control & Git | Assignment 1 |
Friday | 06/09 | HAZELETT | PCA, heatmaps, clusterProfiler | Assignment 2 |
Friday | 06/16 | COETZEE | NGS File Formats | Assignment 3 |
Friday | 06/23 | COETZEE | Searching in files & Genome Arithmetic | Assignment 4 |
Friday | 06/30 | COETZEE | Advanced Search | Assignment 5 |
Friday | 07/07 | COETZEE | scRNA-seq I (single sample) | Assignment 6 |
Friday | 07/14 | COETZEE | scRNA-seq II (patient cohort) | |
Friday | 07/21 | BREAK | Assignment 8 | |
Friday | 07/28 | VUJKOVIC-CVIJIN | Microbiome I | Assignment 7 |
Friday | 08/04 | VUJKOVIC-CVIJIN | Microbiome II | Assignmnet 8 |
Friday | 08/11 | COETZEE | Visualization & Color | Assignment 9 |