Skip to content

alexebaker/decision-trees

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Decision Trees

UNM CS 429/529 Machine Learning Project 1: Decision Trees

Details

Details about this project can be found on Kaggle

Usage

NOTE: This code will work with either python 2 or python 3.

The main entry point for this project is dtree.py. Use the -h flag from any command to see help:

>>> python dtree.py -h
usage: dtree.py [-h] [--training-data TRAINING_DATA]
                [--testing-data TESTING_DATA]
                [--classification-file CLASSIFICATION_FILE] [--gini-index]
                [--alpha ALPHA]

Creates decision trees based on a given data file.

optional arguments:
  -h, --help            show this help message and exit
  --training-data TRAINING_DATA
                        Path to the training data file.
  --testing-data TESTING_DATA
                        Path to the test data file.
  --classification-file CLASSIFICATION_FILE
                        Path to the classification file to write the results
                        of the testing data.
  --gini-index          Whether or not to use gini-index instead of
                        information gain.
  --alpha ALPHA         Alpha to use in chi-squared calculations.

Create a decision tree for a given data file:

python dtree.py --training-data data/training.csv --testing-data data/testing.csv

By default, information gain is used with an alpha vaue of 0.05. You can change these from the command line to use gini-index or a different alpha value:

To use gini-index, simply add the --gini-index flag to any command:

python dtree.py --training-data data/training.csv --testing-data data/testing.csv --gini-index

If you want to use an alpha value other than 0.05, you can specify that with the --alpha parameters as well:

python dtree.py --training-data data/training.csv --testing-data data/testing.csv --alpha 0.01

You can use both the --gini-index and --alpha flag together as well"

python dtree.py --training-data data/training.csv --testing-data data/testing.csv --alpha 0.1 --gini-index

Documentation

This module uses documentation complied by sphinx located in the docs/ directory. First, Shpinx needs to be installed into a virtual env:

First, you need to initialize the virtualenv:

virtualenv .venv

Next, activate the virtualenv in your current shell:

source .venv/bin/activate

Now, install the python requirements:

pip install -r requirements.txt

You can deactivate the virtualenv with the following command, however, make sure the virtualenv is active when you build the documentation:

deactivate

Now you can build the documentation. To build the documentation, run the Makefile:

source .venv/bin/activate
make docs

Once the documentation is built, it can be viewed in your brower by running the open-docs.py script:

python open-docs.py

TODO

  • - Implement ID3
  • - Implement Gini Index
  • - Implement Information Gain
  • - Implement Chi-Square testing in ID3
  • - Write up final report

Authors

About

CS 429/529 Machine Learning Project 1: Decision Trees

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages