This frequentist course focuses on statistical evidence from randomized experiments versus observational studies along with applications of randomization, e.g., A/B testing for website optimization.
By the end of the course, students are expected to:
- Distinguish between experimentally-generated data and observational data, with particular reference to the strength of ensuing statistical conclusions regarding causality.
- Fit and interpret regression models for observational data, with particular reference to adjustment for potential confounding variables.
- Apply the principle of “block what you can, randomize what you cannot” in designing an A/B testing experiment.
This is an assignment-based course. The following deliverables will determine your course grade:
Assessment | Weight |
---|---|
Lab Assignment 1 | 12% |
Lab Assignment 2 | 12% |
Lab Assignment 3 | 12% |
Lab Assignment 4 | 12% |
Quiz 1 | 25% |
Quiz 2 | 25% |
Lecture Attendance (iClicker) | 2% |
This course occurs during Block 6 in the 2023/24 school year.
Course notes can be accessed here. Typically, you should review these notes before each lecture. Moreover, there is optional reading material.
Lecture | Topic | Optional Reading Material |
---|---|---|
1 | Multiple Comparisons |
|
2 | Confounding and Randomized versus Non-randomized Studies |
|
3 | Randomization and Blocking |
|
4 | More Blocking and Power |
|
5 | More Power and Early Stopping in A/B Testing | |
6 | Observational Data: Stratifying and Modelling |
|
7 | Observational Data: Different Sampling Schemes | |
8 | Matched Case-Control Scheme, Ordinal Regressors, and Final Wrap-Up |
See the lecture learning objectives for a detailed breakdown of lecture-by-lecture learning objectives.
- Seltman HJ, Experimental Design and Analysis, 2015.
- Oehlert GW, A First Course in Design and Analysis of Experiments, 2010.
- O’Neil, Cathy and Schutt, Rachel. "Causality," Ch. 11 of Doing Data Science: Straight Talk from the Frontline, O’Reilly Media, 2013.
- Tang, Diane, et al. "Overlapping Experiment Infrastructure: More, Better, Faster Experimentation." Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2010.
Further reading:
- Work by Judea Pearl, such as "The Book of Why".
This course is taught in R
(we will follow the tidyverse
style guide) and Stan
with a reasonable mathematical, statistical, and programming basis. We strongly recommend reviewing the following courses:
- DSCI 551: Descriptive Statistics and Probability for Data Science, for basic statistical and probabilistic concepts, and familiarity with the mathematical notation.
- DSCI 552: Statistical Inference and Computation I, for statistical inference concepts with a frequentist approach.
- DSCI 561: Regression I, for Ordinary Least-squares (OLS).
- DSCI 562: Regression II, for generalized linear models (GLMs).
- DSCI 531: Data Visualization I, for plotting tools using the package
ggplot2
.
See the general MDS policies.
The course is built upon previous years' materials developed by previous instructors.
© 2024 G. Alexi Rodríguez-Arelis, Daniel Chen, Benjamin Bloem-Redd, Tiffany Timbers, and Vincenzo Coia
Software licensed under the MIT License, non-software content licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License. See the license file for more information.