Skip to content

Materials for the course Machine Learning for Molecular Engineering (3/7/10/20.C01/C51)

Notifications You must be signed in to change notification settings

coleygroup/ML4MolEng

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Machine Learning for Molecular Engineering (3.C01/3.C51, 7.C01/7.C51, 10.C01/10.C51, 20.C01/20.C51)

Materials and problem sets for the course Machine Learning for Molecular Engineering taught at MIT.

Instructors: Prof. Connor Coley, Prof. Rafael Gomez-Bombarelli, Prof. Ernest Fraenkel, Prof. Joey Davis

Warning: These assignments are a work in progress and will change. Do not start working on an assignment until it has been released on Canvas.

Problem Sets

PS0

All (3.C01/3.C51, 7.C01/7.C51, 10.C01/10.C51, 20.C01/20.C51)

Ungraded problem set to practice using Google Colab and NumPy.

PS1

Linear classification problem to get you started for the course. You will use logistic regression to diagnose cancer (data size: ~10^2), applying linear methods with L1 and L2 regularization and understand what effects they have on your regression results.

Perovskites (3.C01/3.C51, 10.C01/10.C51)

You will then apply a MLP regressor to predict properties of perovskites (data size: ~10^3) and compare differences between representations of perovskite crystal chemical compositions.

MHC (7.C01/7.C51, 20.C01/20.C51)

You will then apply an MLP regressor to predict MHC binding to peptides (data size: ~10^3) and compare differences between representations of peptide amino acid sequences.

PS2

For this problem set, you will build a sequence-based model to predict DNA binding sequences (data size: ~10^4) generated with ChIP-seq.

Bubbles (3.C01/3.C51, 10.C01/10.C51)

Next, you will train a model to perform segmentation on images of bubbles (data size: ~10^2) arising from the surface of a catalyst, with the goal of using these insights to improve the design of catalysts.

Cells (7.C01/7.C51, 20.C01/20.C51)

Next, you will train a model to perform segmentation on images of cells (data size: ~10^2), with the goal of using these insights to better understand cell morphology.

PS3

In this problem set, you will train a 2D Graph Neural Network to predict solubility from molecular features, and investigate its insufficiency in predicting chirally-aware properties, namely $IC_{50}$.

Molecular Properties (3.C01/3.C51, 10.C01/10.C51)

In this problem set, you will train a Graph Neural Network (GNN) to predict synthetic lethal (SL) interactions between gene pairs.

SLGNN (7.C01/7.C51, 20.C01/20.C51)

About

Materials for the course Machine Learning for Molecular Engineering (3/7/10/20.C01/C51)

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 8