This project implements a news summarization system using the T5-base model, fine-tuned on the CNN/DailyMail dataset from Hugging Face. The project covers data pre-processing, exploratory data analysis (EDA), model fine-tuning, and evaluation using ROUGE scores.
Dataset used: CNN/DailyMail
- Train Size: 287,113 samples
- Validation Size: 13,368 samples
- Test Size: 11,490 samples
Metric | Score |
---|---|
ROUGE-1 | 0.2969 |
ROUGE-2 | 0.1204 |
ROUGE-L | 0.2483 |
ROUGE-Lsum | 0.2483 |
git clone https://github.com/omkar-79/news-summarization.git
cd news-summarization
# For Python 3.x
python3 -m venv venv
source venv/bin/activate # On Linux/Mac
# OR
venv\Scripts\activate # On Windows
pip install -r requirements.txt
- Open the .ipynb file using Jupyter Notebook or Jupyter Lab.
- Run all the cells to fine-tune the model.
- The fine-tuned model will be saved to:
./t5_finetuned
python app.py
After running app.py, open index.html in your browser to access the application.