Hierarchical News Classification

A Two-Level Classification System for News Articles

¹ New York University

Overview

This project implements a hierarchical news classification system that leverages BERT embeddings with a two-level cascade architecture. The system first classifies news articles into broad categories (Level 1) and then further classifies them into specific subcategories (Level 2). By addressing hierarchical classification challenges through innovative confidence management and error correction strategies, our approach provides more accurate and robust news categorization, enhancing both computational efficiency and practical interpretability.

Key Innovations

Adaptive Confidence Thresholds: Dynamically adjusts confidence thresholds based on historical accuracy to mitigate error propagation
Data Balancing Strategy: Implements intelligent upsampling for under-represented subcategories to improve classification of rare classes
Hierarchical Error Correction: Introduces a feedback mechanism between classification levels, allowing low-confidence primary classifications to be corrected
Differentiated Training Parameters: Optimizes hyperparameters separately for each classification level

Model Architecture

Files Structure

config.py: Configuration parameters for models and training
data.py: Data loading, preprocessing, and balancing functions
thresholds.py: Implementation of adaptive threshold mechanism
train_l1.py: Training script for Level-1 classifier
train_l2.py: Training script for Level-2 classifiers
predict.py: Model inference and hierarchical classification
app.py: Streamlit web application for interactive classification

Requirements

Python 3.7+
PyTorch >= 1.9.0
Transformers >= 4.12.0
Pandas >= 1.3.0
NumPy >= 1.20.0
Scikit-learn >= 0.24.0
Datasets >= 1.11.0
Evaluate >= 0.2.0
Streamlit >= 1.8.0

Usage

# Install dependencies
pip install -r requirements.txt

# Train Level-1 classifier
python train_l1.py

# Train Level-2 classifiers
python train_l2.py

# Run interactive prediction
python predict.py

# Launch the web interface
streamlit run app.py

Evaluation Results

Web Interface

The Streamlit application provides an interactive interface for classifying news articles and visualizing confidence scores for both levels.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__pycache__		__pycache__
app		app
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hierarchical News Classification

A Two-Level Classification System for News Articles

Overview

Key Innovations

Model Architecture

Files Structure

Requirements

Usage

Evaluation Results

Web Interface

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

gracee-chen/NLP_Hierarchical-news-classification

Folders and files

Latest commit

History

Repository files navigation

Hierarchical News Classification

A Two-Level Classification System for News Articles

Overview

Key Innovations

Model Architecture

Files Structure

Requirements

Usage

Evaluation Results

Web Interface

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages