Email: cjcarroll [at] usfca [dot] edu
Class Time: Mon/Thurs 10a-12p or 1p-3p in SFD 154-156
Office Hours: Wed 11a-12p on Zoom and Thurs 3:30p-4:30p in person (subject to change during quiz/exam/pitch weeks - watch the course slack!)
Book: Applied Linear Statistical Models by Kutner, Nachtsheim, Neter, and Li
Syllabus: Link
This course is an intensive introduction to linear models, with a focus on both principles and practice. Topics include simple and multiple linear regression; logistic regression; transformations; regression diagnostics and addressing violations of regression assumptions; and variable selection techniques. Programming language is Python.
*subject to change **Note this is a Tuesday!
Week | Dates | Topics | Reading | Notable Events |
---|---|---|---|---|
Week 1 | 8/21 | Intro, Simple Linear Regression: parameter estimation, mean, variance | Sec. 1.5 & 1.6 | |
Week 2 | 8/25 & 8/28 | SLR: parameter distributions, sum of squares decomposition | Sec. 1.7, 2.1, 2.2, 2.7 | HW1 Due |
Week 3 | 9/1 (No Class - Labor Day) & 9/4 | SLR: confidence and prediction intervals, model assumptions and diagnostics; ANOVA and the F-test | 2.4-5, 3.1-3 | HW2 Due |
Week 4 | 9/8 & 9/11 | Multiple Linear Regression: matrix notation; projection interpretation and geometry of regression | 6.1-6, 7.1-6 | HW3 Due + Exam 1 |
Week 5 | 9/15 & 9/18 | MLR: Feature engineering & Modeling Problems | 3.7, 7.1-6, 8.3 | HW4 Due |
Week 6 | 9/22 & 9/25 | Modeling problems: multicollinearity, assumption violations, influential points, outliers, transformations | Ch. 10 | HW5 Due |
Week 7 | 9/30** & 10/2 | Regression Pitfalls & Model Selection | 6.8, 8.1-2, 9.1-4 | HW 6 Due |
Week 8 | 10/6 & 10/9 (Final Project Workday) | Logistic Regression | Ch. 14 | Exam 2 (10/7) + Final Project Due (10/9) |
When in doubt, rely on the following:
- Put the work in and be proactive in asking for help.
- When others are stuck, offer your help in return.
- Do not underestimate the power of office hours!!
- When you are confused, work with a partner & zoom into details.
- When you understand, teach others & zoom out to debrief.
- Use common sense whenever possible.
By the end of this course, you will be able to:
-
Understand the structural forms of classical simple and multiple linear and logistic regression models, as well as the assumptions underlying these models;
-
Be familiar with the principles underlying, and results related to, parameter estimation for these classes of models;
-
Formulate and test hypotheses using these basic models, as well as use them for both prediction and interpretation;
-
Be able to use Python to load and clean data, fit regression models, and generate various outputs like ANOVA tables, confidence intervals for parameters, diagnostic assessments, etc.;
-
Be able to check or to test whether or not diagnostics conform to the assumptions that underlie classical regression;
-
Rigorously identify and manage outliers and influential observations and understanding the potential implications of leaving them uncensored;
-
Be able to modify regression models to encourage fitted residuals to more closely conform to the assumptions that underlying classical regression, i.e., handle multicollinearity, heteroskedasticity, autocorrelation, a non-normal error term, and specification errors;
-
Understand how to use dichotomous dependent variables, indicator (or dummy) independent variables, and interaction terms;
-
Communicate the results of complete and well-reasoned regression analysis as an data analyst; and
-
Critically evaluate and apply theoretical and applied concepts in statistical inference, honing your ability to engage with advanced statistical research and methodology.
The class will be using Canvas & Github to distribute all resources.
Attendance and Professionalism: 5%
Attendance is expected in all live lectures. Valid excuses for absence with permission will be accepted with documentation, but students are required to watch the lecture videos and submit class exercises and activities on Canvas if any. Students who miss the live lectures with a valid excuse are required to submit the exercises within 24 hours of class time (3pm PST/PDT next day).
Professional behavior is expected both during classtime and outside of class when working with your fellow students. Issues with unprofessional behavior will result in deductions in your professionalism score.
Homework: 15%
-
You will be assigned computational and theoretical homework assignments to be completed and turned in on Canvas every Friday before midnight at 11:59p Pacific time, with a 48 hour grace period.
-
Students are encouraged to discuss and work together on assignments, but each student must turn in their own original work. If there is evidence that the work turned in is not original work, which includes copying another student’s homework or using any solutions found online, all credit for that homework set will be forfeited. Homework is not to be posted to online help sites. These sites will be checked frequently.
-
No late homework past the grace period will be accepted.
Quizzes: 60%
-
There will be 2 written quizzes in Module I, tentatively scheduled for: Thurs, Sept 11 and Tues, Oct 7. Details will be discussed in classes leading up to exam dates.
-
No make-up or early exams will be given in order to ensure fairness and integrity of the class. Missing an exam without proper documentation of a personal illness or family emergency will result in a score of zero for that exam. Any documentation must be submitted to the instructor before the exam in question at the student’s earliest convenience.
Final Project 20%
The final project will be a computational group case study that brings together the techniques learned throughout the semester. The project description and groups will be published on Canvas. A final report of the project is due on Oct 9 at 3p.