Hey! This is my NLP lab project that fetches news articles and creates summaries.
- First, install the required packages:
pip install -r requirements.txt
- Download the model (this might take a few minutes):
python download_model.py
- Create a file named
.env
in the project folder and add your NewsAPI key: (Sign up on https://newsapi.org/ to generate an api key)
NEWS_API=your_api_key_here
- Run the program:
python main.py
- Fetches news articles using NewsAPI
- Extracts and processes article content
- Generates summaries using the Pegasus model
- Supports multiple articles processing
- Progress tracking and error handling
- Clone the repository:
git clone <your-repository-url>
cd News-Extractor-Summarizer
- Create a virtual environment and activate it:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Create a
.env
file in the project root and add your NewsAPI key:
NEWS_API=your_api_key_here
- Download the Pegasus model:
python download_model.py
- Run the pipeline:
python main.py
Or run individual components:
python crawl.py # Fetch articles
python content.py # Process content
python summarise.py # Generate summaries
News-Extractor-Summarizer/
├── crawl.py # Article fetching
├── content.py # Content processing
├── summarise.py # Summary generation
├── main.py # Main pipeline
├── download_model.py # Model downloader
├── requirements.txt # Dependencies
└── README.md # Documentation
- newsapi-python
- transformers
- pandas
- numpy
- colorama
- tqdm
- python-dotenv
MIT License