This repository contains our solution for the Kaggle Jigsaw - Agile Community Rules Classification competition.
The Jigsaw Agile Community Rules Classification competition challenges participants to build binary classifiers that predict whether a Reddit comment violates specific subreddit rules. The goal is to assist moderators in upholding community-specific norms by automatically identifying rule violations.
- Task: Binary classification to predict rule violation probability
- Dataset: Moderated Reddit comments with hypothetical rules
- Evaluation: Accuracy in identifying rule violations
- Challenge: Diverse norms and expectations across different subreddits
├── data/ # Competition datasets and processed data
├── notebooks/ # Jupyter notebooks for analysis and modeling
├── src/ # Source code for preprocessing and modeling
├── models/ # Trained models and artifacts
├── results/ # Performance metrics and analysis results
├── config/ # Configuration files
└── requirements.txt # Python dependencies
- Python 3.8+
- Git
-
Clone the repository:
git clone <repository-url> cd Kaggle-Jigsaw
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Download the dataset:
- Download the competition dataset from Kaggle
- Place the files in the
data/directory
-
Run exploratory data analysis:
jupyter notebook notebooks/
-
Train models:
python src/train.py
-
Generate predictions:
python src/predict.py
Our approach includes:
- Data Preprocessing: Text cleaning and normalization
- Feature Engineering: Rule-specific and comment-specific feature extraction
- Model Development: Experimentation with various ML approaches
- Evaluation: Comprehensive performance assessment and validation
[Performance metrics and rankings will be updated here as we progress]
This is a competition repository. Please refer to the competition rules and guidelines.
- Kaggle and Jigsaw for organizing this competition
- The Reddit community for providing the dataset
- Previous Jigsaw competition winners for inspiration
This project is licensed under the MIT License - see the LICENSE file for details.
Competition Link: Kaggle Jigsaw - Agile Community Rules Classification