Dear Genie - Kickstarter

Project Owner: Tanya Tandon QA Owner: Saurabh Annadate

Dear Genie - Kickstarter

Vision: In the history of the world, today is the best time for a dream to come true. With distances in the world shrinking by the internet, Kickstarter is one of the fastest and easiest ways to kickstart a great idea. This project aligns with Kickstarters mission to "help bring creative projects to life" by predicting if a kickstarter campaign will be successful or not. It will help any and everyone launching a campaign on Kickstarter. Users with failed campaigns can leverage the platform by exploring alternate goals as well as alternate completion dates to understand the tweaks needed for a more likely success.

Mission: Developing a classifier algorithm that predicts whether a Kickstarter project will be successful or not. This will help users evaluate whether their campaign will be able to raise money or not even before they do it. It will further suggest alternate completion dates as well as alternate goals to users that will be more likely to succeed. This will help users modify their campaign for a sure shot success.

Success criteria:

Accuracy of at least 60% ( Using AUC or Misclassification rate)
Inbound traffic of at least 100 Kickstarter campaign starters
Above 70 % positive feedback on UI design

Planned work

Theme 1: Model Development

Epic 1: Design data set that best represents all aspects needed to classify a Kickstarter project

Story 1: Downloading the relevant dataset from Kaggle keeping in mind that it would be possible to join them later
Story 2: Connect all these data sets
Story 3: Design new features

Epic 2: Data preparation

Story 1: Exploratory Data Analysis
Story 2: Data Cleaning

Epic 3: Model Development

Story 1: Build logistic regression model
Story 2: Build boosted trees
Story 3: Build Random Forest
Story 4: Compare measures of success (AOC, missclass rate, recall, etc) (%) for each on new data
Story 5: Reiterate the model for improvement

Epic 4: Create unit test scripts to test model functionality

Theme2 : App Development

Epic 1: Setup Deployment pipeline

Story 1: Setting up the requirements.txt from the current environment state to make reproducible environment
Story 2: Set up S3 instance
Story 3: Initialize RDS database
Story 4: Deploy the flask app

Epic 2: Setup front end interface UI where the users will interact with the model built

Story1: Setup a basic front end interface with all the required options
Story 2: Improve the basic front end interface to include engaging gifs on the Home page

Epic 3: Create UAT test scripts to test that UI is working as expected to intake selections and output accordingly

Story1: Check for model functionality
Story 2: Check for exception handling by UI

Epic 4: Create logging for easy debugging and error notifications

Backlog

Sprint Sizing Legend:

0 points - quick chore
1 point ~ 1 hour (small)
2 point ~ 2.5 hour ( slightly bigger than small)
3 points ~ 1/2 day (medium)
4 points ~ 1 day (large)
8 points - big and needs to be broken down more when it comes to execution (okay as placeholder for future work though)

Theme1.epic1.story1 (2pts) - (completed)
Theme1.epic1.story2 (2pts) - (completed)
Theme1.epic1.story3 (2pts) - (completed)
Theme1.epic2.story1 (2pts) - (completed)
Theme1.epic2.story2 (2pts) - (completed)
Theme1.epic3.story1 (2pts) - (completed)
Theme1.epic3.story2 (2pts) - (completed)
Theme1.epic3.story3 (2pts) - (completed)
Theme1.epic3.story4 (1pts) - (completed)
Theme1.epic3.story5 (2pts) - (completed)
Theme1.epic4 (3pts)
Theme2.epic1.story1 (2pts) (completed)
Theme2.epic1.story2 (1pts) (completed)
Theme2.epic1.story3 (1pts) (completed)
Theme2.epic1.story4 (1pts) (completed)
Theme2.epic1.story5 (1pts) (completed)
Theme2.epic2.story1 (3pts) (completed)
Theme2.epic2.story2 (4pts) (completed)
Theme2.epic3.story1 (1pts) (completed)
Theme2.epic3.story2 (1pts) (completed)
Theme2.epic4.story1 (2pts) (completed)
Theme2.epic4 (2pts)

Icebox

Epic 1 : Develop a recommendation bar to suggest alternate goals and alternate completion dates for success of the projects

Story 1: Integrate the model with this functionality
Story 2: Integrate the UI with this functionality

Repo Structure

├── README.md                         <- You are here
│
├── app
│   ├── static/                       <- CSS, JS files that remain static 
│   ├── templates/                    <- HTML (or other code) that is templated and changes based on a set of inputs
│   ├── models.py                     <- Creates the data model for the database connected to the Flask app 
│   ├── __init__.py                   <- Initializes the Flask app and database connection
│
├── config                            <- Directory for yaml configuration files for model training, scoring, etc
│   ├── logging/                      <- Configuration files for python loggers
│
├── data                              <- Folder that contains data used or generated. Only the external/ and sample/ subdirectories are tracked by git. 
│   ├── archive/                      <- Place to put archive data is no longer usabled. Not synced with git. 
│   ├── external/                     <- External data sources, will be synced with git
│   ├── sample/                       <- Sample data used for code development and testing, will be synced with git
│
├── docs                              <- A default Sphinx project; see sphinx-doc.org for details.
│
├── figures                           <- Generated graphics and figures to be used in reporting.
│
├── models                            <- Trained model objects (TMOs), model predictions, and/or model summaries
│   ├── archive                       <- No longer current models. This directory is included in the .gitignore and is not tracked by git
│
├── notebooks
│   ├── develop                       <- Current notebooks being used in development.
│   ├── deliver                       <- Notebooks shared with others. 
│   ├── archive                       <- Develop notebooks no longer being used.
│   ├── template.ipynb                <- Template notebook for analysis with useful imports and helper functions. 
│
├── src                               <- Source data for the project 
│   ├── archive/                      <- No longer current scripts.
│   ├── helpers/                      <- Helper scripts used in main src files 
│   ├── sql/                          <- SQL source code
│   ├── add_songs.py                  <- Script for creating a (temporary) MySQL database and adding songs to it 
│   ├── ingest_data.py                <- Script for ingesting data from different sources 
│   ├── generate_features.py          <- Script for cleaning and transforming data and generating features used for use in training and scoring.
│   ├── train_model.py                <- Script for training machine learning model(s)
│   ├── score_model.py                <- Script for scoring new predictions using a trained model.
│   ├── postprocess.py                <- Script for postprocessing predictions and model results
│   ├── evaluate_model.py             <- Script for evaluating model performance 
│
├── test                              <- Files necessary for running model tests (see documentation below) 

├── run.py                            <- Simplifies the execution of one or more of the src scripts 
├── app.py                            <- Flask wrapper for running the model 
├── config.py                         <- Configuration file for Flask app
├── requirements.txt                  <- Python package dependencies

Instructions to run the application

Ths application can be run on both local system as well as on AWS. Steps on how to deploy the app for both settings is given below.

1. Setting up AWS

Either you run on AWS or Local, you need to set up your aws account. There are two ways to do it.

1.1. Download .pem file from AWS

You can check if you own it by going to your terminal and typing cd ~/.aws . If it exists, then you are good otherwise, follow 1.2.

1.2. Set up configure using terminal

pip install awscli
aws configure

Input values in AWS Access key ID and AWS Secret Access Key. Press enter on region and output format.

2. Specify in configure file(s) if the app will be run locally or on AWS

2.1. Running on a local machine

config/config.py

Make sure that the WHERE variable is set to Local
Ensure that the HOST variable is set to 127.0.0.1
Keep the default PORT of 3000 ; However, change it if this port is not available

Makefile

Ensure that the WHERE variable is set to Local
Ensure that the variable BUCKET is set to Scipts if running on Windows, else bin if on Linux or MacOS

2.2. Running on AWS

RDS configurations In order to use RDS, the RDS credentials need to be added to the os environment. Add the following variables to the environment:

MYSQL_USER : Username to access the RDS instance
MYSQL_PASSWORD : Password to access the RDS instance
MYSQL_HOST : RDS instance endpoinr
MYSQL_PORT : Port number to access the instance
MYSQL_DB : Name of the database

Makefile

Make sure that the WHERE variable is set to AWS
Set the variable YOUR_S3_BUCKET to the S3 bucket that you intend to use
Make sure that the variable BUCKET is set to AWS if running on Windows server, else "bin" if on Linux server

3. Run all the functions in one step

In command line, type make all

4. Run each function seperately

4.1. Set up environment

With `virtualenv`

pip install virtualenv

virtualenv Kickstarter

source Kickstarter/bin/activate

pip install -r requirements.txt

With `conda`

conda create -n Kickstarter python=3.7.3
conda activate Kickstarter
pip install -r requirements.txt

With `Make`

make venv

4.2. Download the data

Original source: https://www.kaggle.com/kemical/kickstarter-projects The data has also been loaded on a public AWS bucket called: bossbucket

Local

Local: Using run file

Run the following command in bash:

python run.py loadS3

Running this code will download the raw data from the s3 bucket and will put it in /Data/raw/

Local: With `Make`

make load_data

AWS

AWS: Using run file

Run the following command in bash:

python run.py loadS3 --where=AWS --bucket=<destination_bucket_name>

Running this code will download the raw data from the s3 bucket and will put it in <destination_bucket_name>/raw/

AWS: With `Make`

make load_data

4.3. Initialize the database

Local

Run the following command in bash:

python run.py create_db

Running this code will create a sqlite database to log the app usage at: /Database/msia423.db The table name is Userinput

With Make

make create_db

AWS

Make sure the environment varia Run the following command in command line:

python run.py create_db --where=AWS

With Make

make create_db

Running this code will create the database specified in the given RDS instance

4.4. Clean the data

Local

Run the following command in command line:

python run.py clean_data

With Make

make clean_data

Running this code will clean the raw data and create a cleaned dataset at: /Data/clean/

AWS

Run the following command in command line:

python run.py clean_data --where=AWS --bucket=<destination_bucket_name>

With Make

make clean_data

Running this code will clean the raw data from the S3 bucket and will put it in <destination_bucket_name>/clean/

4.5. Train Model

Local

Run the following command in command line:

python run.py Model_fitting

With Make

make train_model

Running this code will train the prediction model and will dump it at: /models/

AWS

Run the following command in command line:

python run.py Model_fitting --where=AWS --bucket=<destination_bucket_name>

With Make

make train_model

Running this code will clean the raw data from the S3 bucket and will put it in <destination_bucket_name>/models/

4.6. Launch app

Local or AWS

Run the following command in command line:

python run.py run_app

With Make

make app

Running this code will start the flask app at the IP specified in config/config.py

Testing

In order to unit test the functions, follow the following steps:

Navigate to the tests folder
Run the following command on the command line:

pytest test_all_functions.py

Logging

All logs are saved at logs/logfile.log

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
config		config
data		data
deliverables		deliverables
docs		docs
figures		figures
models		models
notebooks		notebooks
references		references
src		src
test		test
.gitignore		.gitignore
Makefile		Makefile
Presentation.pdf		Presentation.pdf
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

TanyaTandon/Kickstarter

Folders and files

Latest commit

History

Repository files navigation

Dear Genie - Kickstarter

Planned work

Theme 1: Model Development

Theme2 : App Development

Backlog

Icebox

Repo Structure

Instructions to run the application

1. Setting up AWS

1.1. Download .pem file from AWS

1.2. Set up configure using terminal

2. Specify in configure file(s) if the app will be run locally or on AWS

2.1. Running on a local machine

2.2. Running on AWS

3. Run all the functions in one step

4. Run each function seperately

4.1. Set up environment

With virtualenv

With conda

With Make

4.2. Download the data

Local

Local: Using run file

Local: With Make

AWS

AWS: Using run file

AWS: With Make

4.3. Initialize the database

Local

AWS

4.4. Clean the data

Local

AWS

4.5. Train Model

Local

AWS

4.6. Launch app

Local or AWS

Testing

Logging

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

With `virtualenv`

With `conda`

With `Make`

Local: With `Make`

AWS: With `Make`

Packages