This repo is constructed for CS6140 Machine Learning Fall final project. We used Microsoft Malware Prediction dataset at Kaggle. Here is the link to the data: https://www.kaggle.com/competitions/microsoft-malware-prediction/overview. Please save the dataset in Google drive or under "./data" directory.
- Tian Ma
- Wenyu Pan
Run the ipynb file in the repo as following order:
- EDA
- Data clean
- Data encoding
- Any model with no specific order
Please note that you can choose either Google drive or save the dataset to "./data" directory. Just be careful about the path in each ipynb file. If you run the files locally, please use the path with "./data" to save and load the processed dataset under correct directory.
- Logistic Regression
- Random Forest
- LightGBM
- Keras
Final report is saved as "Final Report.pdf".