This project provides a modular pipeline for classifying text data into sentiments (positive, negative, neutral) using the OpenAI API.
It is adaptable to multiple domains such as product reviews, news headlines, or any short text format.
Built with Python, the system supports both generative AI-based classification and traditional evaluation via scikit-learn.
- Apply Generative AI to classify text data sentiment.
- Support multiple domains via reusable configurations (
nyt
,reviews
, etc.). - Enable clean modularization: preprocessing → classification → evaluation.
- Provide a CLI interface for production-ready usage.
- Python 3.8+
- OpenAI API (
gpt-3.5-turbo
) - Pandas – data manipulation
- scikit-learn – evaluation metrics
- dotenv – API key management
- argparse – CLI interface
- Jupyter Notebook – EDA (optional)
project-root/
├── data/
│ ├── raw/ # Unprocessed input data
│ └── processed/ # Cleaned and classified output
│
├── notebooks/ # EDA and insights (optional)
│
├── src/
│ ├── config/ # Modular configuration per domain
│ │ ├── base_config.py
│ │ ├── reviews_config.py
│ │ └── nyt_config.py
│ ├── data_preprocessing.py
│ ├── classifier.py
│ ├── evaluation.py
│ └── cli_runner.py # Main CLI script
│
├── .env # API key (not committed)
├── requirements.txt
└── README.md
git clone https://github.com/YOUR_USERNAME/Sentiment-Pipeline-With-OpenAI.git
cd Sentiment-Pipeline-With-OpenAI
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
Create a file called .env
in the root directory:
OPENAI_API_KEY=your-key-here
python src/cli_runner.py --domain reviews
Available domains:
reviews
– product review sentiment analysisnyt
– sentiment classification of NYT headlines
- File:
data/raw/reviews.csv
- Column required:
cleaned_text
- File:
data/raw/nyt_articles.json
- Field required:
headline
- Classified data in:
data/processed/<domain>_classified.csv
- If
has_labels=True
in config, an evaluation report is printed.
- Support for local models (e.g., BERT)
- Streamlit frontend
- Logging and metrics dashboard
- Custom prompt tuning via config
MIT License © 2025 William Yohei