Skip to content

rydeveraumn/Kaggle-Jigsaw

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Kaggle Jigsaw - Agile Community Rules Classification

This repository contains our solution for the Kaggle Jigsaw - Agile Community Rules Classification competition.

Competition Overview

The Jigsaw Agile Community Rules Classification competition challenges participants to build binary classifiers that predict whether a Reddit comment violates specific subreddit rules. The goal is to assist moderators in upholding community-specific norms by automatically identifying rule violations.

Key Details:

  • Task: Binary classification to predict rule violation probability
  • Dataset: Moderated Reddit comments with hypothetical rules
  • Evaluation: Accuracy in identifying rule violations
  • Challenge: Diverse norms and expectations across different subreddits

Repository Structure

├── data/                 # Competition datasets and processed data
├── notebooks/           # Jupyter notebooks for analysis and modeling
├── src/                # Source code for preprocessing and modeling
├── models/             # Trained models and artifacts
├── results/            # Performance metrics and analysis results
├── config/             # Configuration files
└── requirements.txt    # Python dependencies

Getting Started

Prerequisites

  • Python 3.8+
  • Git

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd Kaggle-Jigsaw
  2. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Download the dataset:

    • Download the competition dataset from Kaggle
    • Place the files in the data/ directory

Usage

  1. Run exploratory data analysis:

    jupyter notebook notebooks/
  2. Train models:

    python src/train.py
  3. Generate predictions:

    python src/predict.py

Methodology

Our approach includes:

  • Data Preprocessing: Text cleaning and normalization
  • Feature Engineering: Rule-specific and comment-specific feature extraction
  • Model Development: Experimentation with various ML approaches
  • Evaluation: Comprehensive performance assessment and validation

Competition Performance

[Performance metrics and rankings will be updated here as we progress]

Contributing

This is a competition repository. Please refer to the competition rules and guidelines.

Acknowledgments

  • Kaggle and Jigsaw for organizing this competition
  • The Reddit community for providing the dataset
  • Previous Jigsaw competition winners for inspiration

License

This project is licensed under the MIT License - see the LICENSE file for details.


Competition Link: Kaggle Jigsaw - Agile Community Rules Classification

About

Repository for Kaggle Jigsaw competition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published