Faulty-training-data Resiliency in Machine Learning Models using Ensemble Techniques

By Xining Li (71361042) (xininglica@gmail.com) & Debashis Kayal (12326609) (debkayal@student.ubc.ca)

Project Brief:

Ensemble learning is a machine learning approach that uses multiple algorithms to improve prediction quality. The original goal was to improve the inference accuracy of supervised learning by combining results from multiple decision makers. On the other hand, faults in the training data set are becoming commonplace. Are ensemble learning methods resilient to such problems? This project involves an empirical analysis of ensemble learning approaches by subjecting the training data set to faults.

Steps To Execute The Code:

1. Clone the git-repo
$ git clone https://github.com/xiningli/ubc-capstone.git 
The git clone will create a folder called "ubc-capstone"

2. Create a new folder under "ubc-capstone"
user@machine-os:/path/<gitrepofolder>/ubc-capstone$ mkdir all-data
user@machine-os:/path/<gitrepofolder>/ubc-capstone$ cd all-data

3. Copy and inflate the MNIST dataset from our Google Drive location using below commands (CIFAR10 is pulled by the code directly and WINE data set is within the git-repo)

user@machine-os:/path/<gitrepofolder>/ubc-capstone$ sudo wget -O all-data/minst.zip https://drive.google.com/u/0/uc\?id\=1b1-wmx02a5_5bWxXCNDOKwyhTJq0LnLM\&export\=download
user@machine-os:/path/<gitrepofolder>/ubc-capstone$ unzip all-data/minst.zip -d all-data/minst

4. Setup the environment (virtualenv)
Pre-requisites: python3.8, virtualenv, pip3
[for prerequisites:
 $ sudo apt-install python3.8
 $ virtualenv venv
 $ source venv/bin/activate]
 
(venv) user@machine-os:/path/<gitrepofolder>/ubc-capstone$ sh resolve-dependencies.sh

Note: The resolve dependencies shell will pull in  all required python libraries including some key libraries required to run our ml code such as: numpy,scikit-learn,matplotlib,pandas,xgboost,keras,tensorflow. Also it might take a while to complete.

5. Experiment #1: Runing the script for MNIST dataset
(venv) user@machine-os:/path/<gitrepofolder>/ubc-capstone$sh run-minst.sh

Runing the shell for WINE dataset
(venv) user@machine-os:/path/<gitrepofolder>/ubc-capstone$sh run-wine.sh

6. Experiment #2: Runing the script for Experiment #2. Pulls the CIFAR dataset (from  https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz)
(venv) user@machine-os:/path/<gitrepofolder>/ubc-capstone$sh run-cifar.sh

The output screenshots are captured at https://github.com/xiningli/ubc-capstone/blob/master/OutputScreengrabs_RunningTheCode.docx

Additional Notes:

Code File Name: Brief Description

wine/wine.py minst/main.py cifar10/main.py cifar10-mislabeled/main.py

run-cifar.sh run-minst.sh run-wine.sh

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
cifar10-mislabeled		cifar10-mislabeled
cifar10		cifar10
minst		minst
report-imgs		report-imgs
save		save
wine		wine
.gitignore		.gitignore
OutputScreengrabs_RunningTheCode.docx		OutputScreengrabs_RunningTheCode.docx
README.md		README.md
generate-pdf.sh		generate-pdf.sh
generate_proposal.sh		generate_proposal.sh
generate_reports.py		generate_reports.py
proposal.tex		proposal.tex
report.aux		report.aux
report.bbl		report.bbl
report.bib		report.bib
report.blg		report.blg
report.log		report.log
report.out		report.out
report.pdf		report.pdf
report.synctex.gz		report.synctex.gz
report.tex		report.tex
resolve-dependencies.sh		resolve-dependencies.sh
run-cifar.sh		run-cifar.sh
run-minst.sh		run-minst.sh
run-wine.sh		run-wine.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Faulty-training-data Resiliency in Machine Learning Models using Ensemble Techniques

By Xining Li (71361042) (xininglica@gmail.com) & Debashis Kayal (12326609) (debkayal@student.ubc.ca)

Project Brief:

Steps To Execute The Code:

Additional Notes:

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

xiningli/ubc-capstone

Folders and files

Latest commit

History

Repository files navigation

Faulty-training-data Resiliency in Machine Learning Models using Ensemble Techniques

By Xining Li (71361042) (xininglica@gmail.com) & Debashis Kayal (12326609) (debkayal@student.ubc.ca)

Project Brief:

Steps To Execute The Code:

Additional Notes:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages