Skip to content

module 5 #49

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,25 @@ website:
- text: '   7.1. Exercises'
href: modules/module4/module4-25-testing_your_svm_rbf_knowledge.qmd
- href: modules/module4/module4-28-what_did_we_just_learn.qmd
- section: "**M5. Preprocessing Numerical Features, Pipelines and Hyperparameter Optimization**"
contents:
- href: modules/module5/module5-00-module_learning_outcomes.qmd
- href: modules/module5/module5-01-the_importance_of_preprocessing.qmd
- text: '   1.1. Exercises'
href: modules/module5/module5-02-questions_on_why.qmd
- href: modules/module5/module5-05-case_study:_preprocessing_with_imputation.qmd
- text: '   2.1. Exercises'
href: modules/module5/module5-06-imputation.qmd
- href: modules/module5/module5-09-case_study:_preprocessing_with_scaling.qmd
- text: '   3.1. Exercises'
href: modules/module5/module5-10-name_that_scaling_method.qmd
- href: modules/module5/module5-13-case_study:_pipelines.qmd
- text: '   4.1. Exercises'
href: modules/module5/module5-14-pipeline_questions.qmd
- href: modules/module5/module5-17-automated_hyperparameter_optimization.qmd
- text: '   5.1. Exercises'
href: modules/module5/module5-18-exhaustive_or_randomized_grid_search.qmd
- href: modules/module5/module5-21-what_did_we_just_learn.qmd

# Since we are declaring options for two formats here (html and revealjs)
# each qmd file needs to include a yaml block including which format to use for that file.
Expand Down
20,641 changes: 20,641 additions & 0 deletions data/housing.csv

Large diffs are not rendered by default.

29 changes: 29 additions & 0 deletions modules/module5/module5-00-module_learning_outcomes.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
format:
html:
page-layout: full
---

# 0. Module Learning Outcomes

::: {.panel-tabset .nav-pills}

## Video

<iframe
class="video"
src="https://www.youtube.com/embed/9zkvbdTvS6w"
title="Module 5 Video - Module Learning Outcomes"
frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
></iframe>

## Slides

<iframe
class="slide-deck"
src="slides/module5_00.html"
></iframe>

:::
29 changes: 29 additions & 0 deletions modules/module5/module5-01-the_importance_of_preprocessing.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
format:
html:
page-layout: full
---

# 1. The Importance of Preprocessing

::: {.panel-tabset .nav-pills}

## Video

<iframe
class="video"
src="https://www.youtube.com/embed/aBpgJh0ikoA?start=2473&end=3131&rel=0"
title="Module 5 Video - The Importance of Preprocessing"
frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
></iframe>

## Slides

<iframe
class="slide-deck"
src="slides/module5_01.html"
></iframe>

:::
104 changes: 104 additions & 0 deletions modules/module5/module5-02-questions_on_why.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
---
format: html
---

<script src='../../src/quiz.js'></script>

# 1.1. Exercises

## Questions on Why

<div id='mcq1'></div>
<script>
generateQuiz(
'mcq1',
'Question 1',
'Which model will still produce meaningful predictions with different scaled column values?',
{
'Decision Trees': 'You are right! Decision Trees visit a single feature at a time, unlike 𝑘-NN models, which calculate distances using the features all together.',
'𝑘-NN': 'This works in a similar way to SVM, where distance is calculated and features are observed together and not independently of each other.',
'SVM': 'This works in a similar way to 𝑘-NN, where distance is calculated and features are observed together and not independently of each other.',
},
'Decision Trees',
);
</script>

<div id='mcq2'></div>
<script>
generateQuiz(
'mcq2',
'Complete the following statement',
'Preprocessing is done ____.',
{
'To the model before training': 'It is done before training but not to the model.',
'To the data before training the model': '',
'To the model after training': 'It’s not done to the model or after training.',
'To the data after training the model': 'You are half right, but it’s not done after training the model.',
},
'To the data before training the model',
);
</script>

## Motivation True and False

<div id='mcq3'></div>
<script>
generateQuiz(
'mcq3',
'Question 1',
'Columns with lower magnitudes compared to columns with higher magnitudes are less important when making predictions.',
{
'True': 'These types of models find examples that are most similar to the text example in the <i>training</i> set.',
'False': 'Great! Just because a feature has smaller values does not mean it’s less informative.',
},
'False',
);
</script>

<div id='mcq4'></div>
<script>
generateQuiz(
'mcq4',
'Question 2',
'A model less sensitive to the scale of the data makes it more robust.',
{
'True': 'Nailed it!',
'False': 'Models that are more sensitive to scale can be problematic.',
},
'True',
);
</script>

## Preprocessing Questions

<div id='mcq5'></div>
<script>
generateQuiz(
'mcq5',
'Question 1',
'<code>StandardScaler</code> is a type of what?',
{
'Converter': 'You are close but the terminology is not correct.',
'Categorizer': 'Not quite.',
'Model': 'We are not modeling with <code>StandardScaler</code> since it is <i>transforming</i> the data and not predicting on it.',
'Transformer': '',
},
'Transformer',
);
</script>

<div id='mcq6'></div>
<script>
generateQuiz(
'mcq6',
'Question 2',
'What data does <code>StandardScaler</code> alter?',
{
'Training only': 'It definitely alters the training set but that’s not all.',
'Testing only': 'It does alter the test data but it doesn’t stop there.',
'Both training and testing': '',
'Neither training or testing': 'We need to scale our data though!',
},
'Both training and testing',
);
</script>
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
format:
html:
page-layout: full
---

# 2. Case Study: Preprocessing with Imputation

::: {.panel-tabset .nav-pills}

## Video

<iframe
class="video"
src="https://www.youtube.com/embed/aBpgJh0ikoA?start=3137&end=3722&rel=0"
title="Module 5 Video - Case Study: Preprocessing with Imputation"
frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
></iframe>

## Slides

<iframe
class="slide-deck"
src="slides/module5_05.html"
></iframe>

:::
Loading
Loading