An Ensemble method using NB-SVM and LSTM algorithms
Toxic Comment Classification Challenge Identify and classify toxic online comments
Data Description
You are provided with a large number of Wikipedia comments which have been labeled by human raters for toxic behavior. The types of toxicity are:
toxic severe_toxic obscene threat insult identity_hate
Create a model which predicts a probability of each type of toxicity for each comment.
File descriptions:
train.csv - the training set, contains comments with their binary labels
test.csv - the test set, you must predict the toxicity probabilities for these comments. To deter hand labeling, the test set contains some comments which are not included in scoring.
sample_submission.csv - a sample submission file in the correct format
test_labels.csv - labels for the test data; value of -1 indicates it was not used for scoring; (Note: file added after competition close!)
The dataset can be downloaded from https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/data