Skip to content

marcdotson/data-5600

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DATA 5600 Introduction to Regression and Machine Learning for Analytics

  • Monday and Wednesday 1:30-2:45 pm
  • Room: HH 260

This course introduces supervised learning for business applications. Students learn about model building, feature engineering, parameter estimation, hyperparameter tuning, model evaluation, and communication. The course is focused on interpretable regression and classification models, including simple, multiple, and logistic regression. Prerequisites: DATA 3100 and DATA 3300

By the end of this course, you will be able to:

  1. Explain the importance of regression and machine learning in business.
  2. Demonstrate cleaning and structuring data for regression and machine learning.
  3. Build and interpret appropriate models that use data to inform business decisions.
  4. Clearly present model results in a business context.

Office Hours

Tools

This course is heavily focused on skill-building. Each student will need to bring a laptop, either their own or one rented from USU. All course materials will be available on Canvas. While students are welcome to use their preferred tools, the following data stack is recommended and will be demonstrated throughout the course.

Python

Python is a general purpose, open source programming language developed by computer scientists. It is the most commonly used programming language for data wrangling, visualizations, and modeling.

Positron

A code editor or integrated development environment (IDE), outside of an open source programming language, is arguably your most important tool as a data analyst or data engineer. Positron is a next-generation data analytics and data engineering IDE. Built on VS Code’s open source core, Positron combines the multilingual extensibility of VS Code with essential data tools common to language-specific IDEs.

GitHub

Git is a powerful collaboration and version control system. While it is the industry standard for software development, it is easy to adopt this framework to provide structure for data analytics as well. GitHub is an online hosting service where each project lives in its own repository maintained by me as the mentor. After a project is completed, you can copy (i.e., fork) the project repository for you to maintain. Learning to use Git and GitHub not only aids in collaboration, it will ultimately allow you to develop an online portfolio of work.

Quarto

Quarto for communicating results with presentations, reports, dashboards, etc.

AI

Learning is our goal. In DATA 5600, generative AI tools can be very useful. They can help debug code, provide example quiz questions for you to test your knowledge, and help explain concepts in new ways. I am very excited about these tools and encourage their use for learning. You will need to use your judgement to determine how to best utilize these tools. They “succeed as a tool, fail as a crutch” (Carly Fox from our department), and while you can use AI tools to help you with assignments, make sure you are using them to facilitate your learning and not to just check a box. “Let’s get better together faster” (Lianne Wappett from our department).

For more details, refer to the data stack training.

Studying

Students should consider the following study tips.

  1. Prepare for class by studying the assigned materials, taking notes, and coming with questions.
  2. Take notes, ask questions, and participate in class discussions.
  3. Consistently apply what you’re learning to your project.
  4. Work with classmates and utilize office hours.
  5. Use your class repository to organize all notes and project work.

Students should consider the following coding tips.

  1. Learn by doing: Code in class and complete exercises.
  2. Pay careful attention to details.
  3. Don’t code from scratch. Start with previous work and solutions.
  4. Look at and emulate good code.
  5. Literally sketch what transformed data should look like in the end.

Assessment

All assignments are due on Canvas at the start of the class session they are associated with in the schedule.

Letter grades will follow the standard rubric.

A 93-100% B- 80-82% D+ 67-69%
A- 90-92% C+ 77-79% D 63-66%
B+ 87-89% C 73-76% D- 60-62%
B 83-86% C- 70-72% E 0-59%

No credit will be given for late work unless an arrangement is made prior to the deadline. Students are encouraged to review your graded work and ask questions to avoid repeated mistakes. Grades will be determined as follows.

Exercises (20%)

Each lecture ends with an exercise that is due at the beginning of the following class when a student will be called on at random to share their solution. While students are encouraged to work together, each student is required to submit their own work. Additionally, for each exercise every student will be randomly assigned to review one other student’s exercise.

Students won’t get credit if they don’t submit their solution on time, aren’t present and prepared to share when called on at random, or don’t complete their randomly assigned review.

Quizzes (30%)

While students are encouraged to study in groups, each of the quizzes is to be completed individually, without assistance from other classmates. Each quiz will be available on Canvas during the associated lectures and is intended to be used as a gauge of how well students understand the course material.

Projects (50%)

Students will complete two group projects, one focused on multiple regression and the second focused on classification. The groups will both present and submit a technical report developed on a GitHub repository. The week before the presentations, they will submit a draft of their slides to get feedback and have time for revision. The other students in the class, as well as the group members themselves, will help evaluate each of the presentations.

Schedule

Please note that the instructor reserves the right to change the syllabus, including the schedule, at any time and for any reason. In such circumstances, he will provide students sufficient advance notice as it relates to assignment deadlines.

Week 01

  • Regression and Machine Learning
  • Modeling Workflow

Week 02

  • Decisions and Data

Week 03

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages