News Topic Prediction

This project is about predicting the topic of news articles using the AG News dataset. The prediction model is built using Logistic Regression and is deployed as a Flask web application.

Project Structure

This structure is modular and organized, making it easy to maintain and scale the project. Each folder and file has a clear purpose, ensuring a clean and efficient workflow.

project/
│
├── logs/                  # Logs for tracking execution (e.g., training logs)
│
├── model/                 # Main folder for model-related files
│   ├── data/              # Data folder
│   │   ├── processed/     # Processed data (e.g., after preprocessing)
│   │   └── raw/           # Raw data (e.g., original datasets)
│   │
│   ├── notebooks/         # Jupyter Notebooks for analysis and experiments
│   │   └── toping_of_news.ipynb
│   │
│   ├── pipelines/         # Folder for data processing and prediction pipelines
│   │   ├── predict_pipeline.py  # Pipeline for making predictions
│   │   └── train_pipeline.py    # Pipeline for training the model
│   │
│   ├── saved_models/      # Folder for saved model files
│   │
│   ├── scripts/           # Scripts for data processing and model training
│   │   ├── __init__.py    # Package initialization file
│   │   ├── data_preprocessing.py  # Script for data preprocessing
│   │   ├── model_evaluation.py    # Script for model evaluation
│   │   └── model_training.py      # Script for model training
│   │
│   ├── utils/             # Utility functions and helpers
│   │   ├── __init__.py    # Package initialization file
│   │   ├── logger.py      # Utility for logging
│   │   └── pickler.py     # Utility for serialization/deserialization (e.g., pickle)
│   │
│   ├── templates/         # HTML templates for the web interface
│   │   └── index.html     # Main HTML template for the web interface
│   │
│   ├── .gitignore         # File to exclude specific files/folders from Git
│   ├── app.py             # Main Flask application file
│   ├── config.py          # Project configuration (paths, parameters)
│   ├── README.md          # Project documentation
│   └── requirements.txt   # Project dependencies

Installation

Clone the repository:

git clone https://github.com/nadirg2/nlp_topings.git

Navigate to the project directory:
```
cd nlp_topings
```
Install the required packages:
```
pip install -r requirements.txt
```

Usage

Run the Flask application:
```
python run.py
```
Open your web browser and go to http://127.0.0.1:5000.

Dataset

The AG News dataset is used for training and testing the model. It contains news articles categorized into four classes: World, Sports, Business, and Sci/Tech.

Model

The Logistic Regression model and TF-IDF vectorizer is used for predicting the topic of news articles. The model is trained on the AG News dataset and saved as model.pkl. Also implemented text preprocessing with using lemmatization, stop-word removal, and tokenization.

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

News Topic Prediction

Project Structure

Installation

Usage

Dataset

Model

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
model		model
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
config.py		config.py
requirements.txt		requirements.txt

License

nadirg2/nlp_topings

Folders and files

Latest commit

History

Repository files navigation

News Topic Prediction

Project Structure

Installation

Usage

Dataset

Model

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages