GitHub - UBC-MDS/DSCI_575_adv-mach-learn: DSCI 575 Advanced machine learning

Course learning outcomes

In this course, we will learn some advanced machine learning methods in the context of natural language processing (NLP) applications, including Markov chains, hidden Markov models, recurrent neural networks, and self-attention and transformers.

Click to expand!

By the end of the course, students will be able to:

Perform basic text preprocessing.
Define Markov chains and apply them for generation and inference.
Explain the concept of stationary distribution in Markov chains.
Describe Hidden Markov Models (HMMs) and perform inference and decoding.
Summarize Recurrent Neural Networks (RNNs) and their variations.
Explain self-attention and transformers and apply them in NLP tasks.

Deliverables

Click to expand!

The following deliverables will determine your course grade:

Assessment	Weight	Where to submit
Lab Assignment 1	12%	Gradescope
Lab Assignment 2	12%	Gradescope
Lab Assignment 3	12%	Gradescope
Lab Assignment 4	12%	Gradescope
Class participation	2%	iClicker Cloud
Quiz 1	25%	Canvas
Quiz 2	25%	Canvas

See Calendar for the due dates.

Lecture schedule

This course will be run in person. We will meet three times every week: twice for lectures and once for the lab. You can refer to the Calendar for lecture and lab times and locations. Lectures of this course will be a combination traditional live lecturing, class activities, and pre-recorded videos. Drafts of the lecture notes for each week will be made available earlier in the week.

This course occurs during Block 6 in the 2024/25 school year.

Lecture	Topic	Assigned videos/Readings	Resources and optional readings
0	Course Information	📹 Videos: 16.1
1	Markov Models		Markov chains in action
2	Language models, PageRank, text preprocessing	📹 Videos: 16.2	OpenAI GPT3 demo Dan Jurafsky's videos on PageRank Dan Jurafsky's video on tokenization
3	Hidden Markov models		Nando de Freitas' lecture on HMMs A gentle intro to HMMs by Eric Fosler-Lussier
4	HMMs decoding and inference)	(optional) HMM Baum-Welch (unlisted)	Nando de Freitas' lecture on HMMs A gentle intro to HMMs by Eric Fosler-Lussier
5	Introduction to Recurrent Neural Networks (RNNs)		The Unreasonable Effectiveness of Recurrent Neural Networks Highly recommended: Sequence Processing with Recurrent Networks
6	Introduction to Transformers	📹 Videos: Introduction to Self-Attention
7	Applications of Transformers
8	Large Language Models

The labs are going to be in person. There will be a lot of opportunity for discussion and getting help during lab sessions. Please make good use of this time.

Installation

We are providing you with a conda environment file which is available here. You can download this file and create a conda environment for the course and activate it as follows.

conda env create -f env-dsci-575.yml
conda activate 575

We've only attempted to install this environment file on a few machines, and you may encounter issues with certain packages from the yml file when executing the commands above. This is not uncommon and may suggest that the specified package version is not yet available for your operating system via conda. When this occurs, you have a couple of options:

Modify the local version of the yml file to remove the line containing that package.
Create the environment without that package.
Activate the environment and install the package manually either with conda install or pip install in the environment.

Note that this is not a complete list of the packages we'll be using in the course and there might be a few packages you will be installing using conda install later in the course. But this is a good enough list to get you started.

Course communication

Click to expand!

We all are here to help you learn and succeed in the course and the program. Here is how we'll be communicating with each other during the course.

Clarifications on the lecture notes or lab questions

If there is any clarification on the lecture material or lab questions, I'll post a message on our course channel and tag you. It is your responsibility to read the messages whenever you are tagged. (I know that there are too many things for you to keep track of. You do not have to read all the messages but please make sure to carefully read the messages whenever you are tagged.)

Questions on lecture material or labs

If you have questions about the lecture material or lab questions please post them on the course Slack channel rather than direct messaging me or the TAs. Here are the advantages of doing so:

You'll get a quicker response.
Your classmates will benefit from the discussion.

When you ask your question on the course channel, please avoid tagging the instructor unless it's specific for the instructor (e.g., if you notice some mistake in the lecture notes). If you tag a specific person, other teaching team members or your colleagues are discouraged to respond. This will decrease the response rate on the channel.

Please use some consistent convention when you ask questions on Slack to facilitate easy search for others or future you. For example, if you want to ask a question on Exercise 3.2 from Lab 1, start your post with the label lab1-ex2.3. Or if you have a question on lecture 2 material, start your post with the label lecture2. Once the question is answered/solved, you can add "(solved)" tag before the label (e.g., (solved) lab1-ex2.3). Do not delete your post even if you figure out the answer on your own. The question and the discussion can still be beneficial to others.

Questions related to grading

For each deliverable, after I return grades, I'll let you know who has graded what in our course Slack by opening an issue in the course GitHub repository. If you have questions related to grading

First, make sure your concerns are reasonable (read the "Reasonable grading concerns" policy).
If you believe that your request is reasonable, open a regrade request on Gradescope.
If you are unable to resolve the issue with the TA, send a Slack message to the instructor, including the appropriate TA in the conversation.

Questions related to your personal situation or talking about sensitive information

I am open for a conversation with you. If you want to talk about anything sensitive, please direct message me on Slack (and tag me) rather than posting it on the course channel. It might take a while for me to get back to you, but I'll try my best to respond as soon as possible.

Reference Material

Click to expand!

Online resources

Google NLP API
Stanford CS224d: Deep Learning for Natural Language Processing
LDA2vec: Word Embeddings in Topic Models
7 Types of Artificial Neural Networks for Natural Language Processing
https://distill.pub/
Model-Based Machine Learning
RNNs in TensorFlow, a practical guide and undocumented features
A list of readings about RNNs
For NLP in R, see Julia Silge's blog posts on sentiment analysis of Jane Austen novels: part 1, part 2, part 3, part 4.
RNN resources

Books

Jurafsky, D., & Martin, J. H. Speech and language processing.
Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning (Vol. 1). Cambridge: MIT press.
Jacob Eisenstein. Natural Language Processing
Goldberg, Y. (2017). Neural network methods for natural language processing. Synthesis Lectures on Human Language Technologies, 10(1), 1-309.
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python. O'Reilly Media, Inc.

Policies

Please see the general MDS policies.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
docs		docs
lectures		lectures
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
_config.yml		_config.yml
_toc.yml		_toc.yml
attribution.md		attribution.md
env-dsci-575.yml		env-dsci-575.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Course learning outcomes

Deliverables

Lecture schedule

Installation

Course communication

Clarifications on the lecture notes or lab questions

Questions on lecture material or labs

Questions related to grading

Questions related to your personal situation or talking about sensitive information

Reference Material

Online resources

Books

Policies

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

UBC-MDS/DSCI_575_adv-mach-learn

Folders and files

Latest commit

History

Repository files navigation

Course learning outcomes

Deliverables

Lecture schedule

Installation

Course communication

Clarifications on the lecture notes or lab questions

Questions on lecture material or labs

Questions related to grading

Questions related to your personal situation or talking about sensitive information

Reference Material

Online resources

Books

Policies

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages