2020 BIOS 611 - Intro to Data Science

Lecture Zoom Link: https://uncsph.zoom.us/j/96603579699?pwd=Ny9RUHFrL2lNQUUxZEJ1UTNmVFQvdz09 Lab Zoom: https://uncsph.zoom.us/j/96708005184?pwd=SFNBQXpnVHNvNnZLMktxZjJsZkVsUT09 You will need the passwords to join - this is distributed via the Slack channel. You should have gotten an email invite to the slack before the course started. Email me if you need help: toups@email.unc.edu

Course Information

The goals of this course:

Familiarity with Data Science Tools like R, Python, git, Make, Docker, etc
Good Data Science Practices

We will be covering the entire data science project lifetime, from data ingest, quality control, analysis and reporting. An emphasis will be placed on effectively communicating correct results (even when they are negative) and upon giving feedback to colleagues.

To do these tasks effectively we will also focus heavily on using Git, Make and Docker.

About your Instructor

I have both academic and "real world" experience as a scientist, software engineer, and data scientist.

Portfolio

By the end of the semester each student will have produced a portfolio including:

A complete analysis in R demonstrating data wrangling, modeling, visualization.
An interactive Shiny Dashboard
A hybrid analysis using R, Python, Make and Docker (or Julia).

Slack

This year in particular it will be useful for our class to communicate online using our Slack Channel.

Course Schedule

Fall 2020 Classes are unfortunately weird because of Covid 19. Classes will be held Monday and Wed from 5:20 pm to 6:35 pm. Recitation/lab time will be Tuesdays from 3:00-4:00 pm. Labs will be a chance to work with me directly or virtually on material covered during lecture.

Hyflex

BIOS 611 will be Hyflex this semester. This means that we will be holding course in person but also broadcasting them live (presumably over Zoom). Maximum occupancy and social distancing requirements mean that only some students will attend any given class, with others tuning in over zoom and still others watching the recorded lectures at a different time.

Course Schedule

NB - this is the first time I've taught this course. We might deviate from this syllabus.

Day	Time	Class Type	Subject	HW
Aug 10 (Monday) 2020	5:20-6:35 pm	Lecture	Intro and Demo
Aug 11 (Tuesday) 2020	3:00-4:00 pm	Lab/Recitation	Compute Resources
Aug 12 (Wednesday) 2020	5:20-6:35 pm	Lecture	Compute Resources & R 1
Aug 17 (Monday) 2020	5:20-6:35 pm	Lecture	Programming Languages via R (R 2) & Datasets	[HW1](https://github.com/Vincent-Toups/datasci611/blob/master/homeworks/hw1.md)
Aug 18 (Tuesday) 2020	3:00-4:00 pm	Lab/Recitation	Project 1 Setup
Aug 19 (Wednesday) 2020	5:20-6:35 pm	Lecture	Linux and Bash …
Aug 24 (Monday) 2020	5:20-6:35 pm	Lecture	Docker & Make & Party
Aug 25 (Tuesday) 2020	3:00-4:00 pm	Lab/Recitation	Setting up Our Project
Aug 26 (Wednesday) 2020	5:20-6:35 pm	Lecture	Tidy Data & ggplot
Aug 31 (Monday) 2020	5:20-6:35 pm	Lecture	Tidy Data & ggplot 2
Sep 01 (Tuesday) 2020	3:00-4:00 pm	Lab/Recitation
Sep 02 (Wednesday) 2020	5:20-6:35 pm	Lecture	Git Concepts and Practices
Sep 07 (Monday) 2020	5:20-6:35 pm	~	Labor Day
Sep 08 (Tuesday) 2020	3:00-4:00 pm	Lab/Recitation
Sep 09 (Wednesday) 2020	5:20-6:35 pm	Lecture	Relational Data Operations
Sep 14 (Monday) 2020	5:20-6:35 pm	Lecture	Agile Data Science?
Sep 15 (Tuesday) 2020	3:00-4:00 pm	Lab/Recitation
Sep 16 (Wednesday) 2020	5:20-6:35 pm	Lecture	Classification
Sep 21 (Monday) 2020	5:20-6:35 pm	Lecture	Parameter Fitting and Optim
Sep 22 (Tuesday) 2020	3:00-4:00 pm	Lab/Recitation
Sep 23 (Wednesday) 2020	5:20-6:35 pm	Lecture	Model Val & Char
Sep 28 (Monday) 2020	5:20-6:35 pm	Lecture	The Dark Art of Clustering
Sep 29 (Tuesday) 2020	3:00-4:00 pm	Lab/Recitation
Sep 30 (Wednesday) 2020	5:20-6:35 pm	Lecture	Best in Show: Gradient Boosting Machines
Oct 05 (Monday) 2020	5:20-6:35 pm	Lecture	GBMs in Practice
Oct 06 (Tuesday) 2020	3:00-4:00 pm	Lab/Recitation
Oct 07 (Wednesday) 2020	5:20-6:35 pm	Lecture	Project Presentations
Oct 12 (Monday) 2020	5:20-6:35 pm	Lecture	Shiny Introduction
Oct 13 (Tuesday) 2020	3:00-4:00 pm	Lab/Recitation
Oct 14 (Wednesday) 2020	5:20-6:35 pm	Lecture	Docker Recap and Shiny
Oct 19 (Monday) 2020	5:20-6:35 pm	Lecture	Programming Languages and Python
Oct 20 (Tuesday) 2020	3:00-4:00 pm	Lab/Recitation
Oct 21 (Wednesday) 2020	5:20-6:35 pm	Lecture	Pandas, Dplyr, SQL 1
Oct 26 (Monday) 2020	5:20-6:35 pm	Lecture	Pandas, Dplyr, SQL 2
Oct 27 (Tuesday) 2020	3:00-4:00 pm	Lab/Recitation
Oct 28 (Wednesday) 2020	5:20-6:35 pm	Lecture	Scikit Learn
Nov 02 (Monday) 2020	5:20-6:35 pm	Lecture	A Taste of Neural Networks
Nov 03 (Tuesday) 2020	3:00-4:00 pm	Lab/Recitation
Nov 04 (Wednesday) 2020	5:20-6:35 pm	Lecture	Regular Expressions
Nov 09 (Monday) 2020	5:20-6:35 pm	Lecture	Data Science Ethics
Nov 10 (Tuesday) 2020	3:00-4:00 pm	Lab/Recitation
Nov 11 (Wednesday) 2020	5:20-6:35 pm	Lecture	Virtual Panel w/ Datascientists
Nov 16 (Monday) 2020	5:20-6:35 pm	Lecture	Presentations
Nov 17 (Tuesday) 2020	3:00-4:00 pm	Lab/Recitation

Projects

Grades will be based primarily on projects with the following steps:

Students will submit an initial proposal “README” file describing the project
Students will work individually to produce a first draft and submit it on Github
Each student will review a handful of project drafts and provide thoughtful feedback
Students will rate the quality of the feedback received from their peers
Students will submit a final project draft
Graders will review the project for high level organization and readability
Students will give a short presentation about their project (only projects 1 and 3)

The grade will be based on the 1) quality of feedback provided to peers, 2) the grader’s review, and 3) the presentation.

Feedback

Students will give feedback on other student's projects which will be graded. Feedback should be succinct, relevant and actionable. It should cover:

Does the project use tidyverse functions to keep code succinct, efficient and readable? Where could a tidyverse function be added to improve the code?
Are the plots appropriate for the data types, the hypotheses being tested, and the points being communicated?
How can the code be organized or documented more clearly?
Is the purpose of the project communicated clearly?
Is the source of the data made clear?
Is the interpretation of figures clearly explained?
Is the purpose and interpretation of analysis steps clearly communicated?
Are overall take-home messages clearly communicated?

The nature of data science is that our results are often uninteresting and/or negative. This is not a problem with a project or presentation. If anything, communicating negative results is even more important, in practice, than communicating positive ones.

Project 1

A “complete” analysis in R, demonstrating data wrangling, modeling, visualization and delivery using R markdown.

Project 2

An interactive dashboard built with Shiny.

Project 3

A polyglot analysis using R, Python, Make and Docker.

Project Grading

Projects will be graded on the following:

A project should be easily runnable by anyone who checks out the git repository who has Docker installed.
Git commits should be small and cover single changes to the code base after the initial phase of the project.
The git repository shouldn't contain non-code artifacts. All results should be buildable from code and source data alone.
The code should be organized and easy to understand at a high level.
For project (1) the final result should be a PDF file generated via Latex or RMarkdown that summarizes the results. For project 2 the result is a shiny application.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.local/share/rstudio		.local/share/rstudio
homeworks		homeworks
lecture-01-demo		lecture-01-demo
lecture-01-demo2		lecture-01-demo2
lecture-01		lecture-01
lecture-02		lecture-02
lecture-03		lecture-03
lecture-04		lecture-04
lecture-05		lecture-05
lecture-06		lecture-06
lecture-07		lecture-07
lecture-08		lecture-08
lecture-09		lecture-09
lecture-10		lecture-10
lecture-11		lecture-11
lecture-12		lecture-12
lecture-13		lecture-13
lecture-14		lecture-14
lecture-15		lecture-15
lecture-16		lecture-16
lecture-17		lecture-17
lecture-18		lecture-18
lecture-22		lecture-22
.gitignore		.gitignore
README.md		README.md
README.org		README.org
elisp.el		elisp.el
project-proposal-rubric.md		project-proposal-rubric.md
project1-goals.md		project1-goals.md
project1-goals.org		project1-goals.org
todo.org		todo.org

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Table of Contents

2020 BIOS 611 - Intro to Data Science

Course Information

About your Instructor

Portfolio

Slack

Course Schedule

Hyflex

Course Schedule

Projects

Feedback

Project 1

Project 2

Project 3

Project Grading

About

Uh oh!

Releases

Packages

Languages

sag129/datasci611

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

2020 BIOS 611 - Intro to Data Science

Course Information

About your Instructor

Portfolio

Slack

Course Schedule

Hyflex

Course Schedule

Projects

Feedback

Project 1

Project 2

Project 3

Project Grading

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages