Skip to content

My work for the GO DATA SCIENCE 4.0 (GO DS 4.0) Hackathon hosted on Zindi (Feb 15 - Feb 16, 2024). The challenge focused on multi-class text classification for mes ntal health discussions. I placed in the top 22% of the participants. I fine-tuned transformer models, including BERT and RoBERTa.

Notifications You must be signed in to change notification settings

samehaisaa/GO-DataScience-4.0---Mental-Health-Text-Classification-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GO DS 4.0 - Mental Health Text Classification

License: MIT Python 3.9 Zindi

This repository contains my solution for the GO DATA SCIENCE 4.0 Hackathon hosted on Zindi, where I achieved 42rd place out of 194 participants. The challenge focused on classifying mental health-related text discussions into predefined categories using Natural Language Processing (NLP) techniques.


🏅 Competition Results

Final Leaderboard Performance

  • Rank: 42 out of 194 participants
  • Validation Accuracy: 74.4%
  • Public Leaderboard Score: 0.7528
  • Private Leaderboard Score: 0.7371

Top Performers (Excerpt)

Rank Team Name Public Score Private Score
1 Recursive Duo 0.8189 0.7996
... ... ... ...
9 one crew 0.7786 0.7792
25 Llama 0.7610 0.7648
43 SamehAissa (Me) 0.7686 0.7528
44 ... ... ...

📊 Analysis

Key Observations

  1. Top Scores:

    • The winning team achieved a public score of 0.8189 and a private score of 0.7996.
  2. My Performance:

    • Achieved a public score of 0.7686 and a private score of 0.7528.
    • Ranked 42rd, placing in the top 22% of participants.
  3. Leaderboard Insights:

    • A small gap between public and private scores indicates robust models.
    • The competition was highly competitive, with close scores among top teams.

🏆 Competition Overview

Problem Statement

The goal was to develop a model that accurately classifies text entries (titles and content) from online discussions into categories representing mental health issues. Each entry in the dataset included:

  • id: Unique identifier
  • title: Discussion title
  • content: Main body of the text
  • target: Mental health category (only in training data)

Example Data Entry

id title content target
101 Feeling Hopeless and Lost I've been struggling with depression for a while... Depression
102 Panic Attacks Are Getting Worse Lately, my panic attacks have been more frequent... Anxiety

Evaluation Metric

The model's performance was evaluated using Private Accuracy as the primary metric.


Key Steps

  1. Data Preprocessing:

    • Combined title and content into a single text feature.
    • Handled missing values and cleaned text data.
    • Encoded target labels into numerical format.
  2. Modeling:

    • Experimented with BERT and RoBERTa architectures.
    • Implemented class weighting to handle imbalanced data.
    • Used Text Augmentation (EDA) to improve generalization.
  3. Training:

    • Fine-tuned transformer models using Hugging Face's Trainer API.
    • Applied Focal Loss to focus on hard-to-classify examples.
    • Used Test-Time Augmentation (TTA) for robust predictions.
  4. Evaluation:

    • Achieved ~76.8% Public Accuracy on the validation set.
    • Secured 42rd place on the final leaderboard.

🔮 Future Improvements

  1. Deployement:

    • Build a Streamlit/Gradio app for real-time predictions.
    • Deploy the model using FastAPI or Flask.
  2. Explainability:

    • Use SHAP or LIME to explain model predictions.
    • Visualize attention weights for transformer models.
  3. Advanced Models:

    • Experiment with DeBERTa, GPT-based models, or ensemble methods.
    • Use knowledge distillation to combine multiple models.

About

My work for the GO DATA SCIENCE 4.0 (GO DS 4.0) Hackathon hosted on Zindi (Feb 15 - Feb 16, 2024). The challenge focused on multi-class text classification for mes ntal health discussions. I placed in the top 22% of the participants. I fine-tuned transformer models, including BERT and RoBERTa.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages