This project implements a comprehensive solution for analyzing sentiment in financial news and ranking articles based on their financial importance. It uses deep learning models (LSTM with BERT embeddings) and traditional machine learning approaches (Logistic Regression) to predict sentiment scores for news articles. The system is containerized for deployment, easily deployed to Azure Container Apps for scalable cloud hosting, and features CI/CD integration with GitHub Actions for automated testing and quality assurance.
- Overview
- Project Structure
- Features
- Primary Model Architecture
- Setup
- Test Suite
- Continuous Integration
- Run with Streamlit Application
- API Usage and PostmanAPI Testing
- Dockerization
- Azure Deployment
- MLflow Integration and DagsHub
This project analyzes financial news articles to determine sentiment (positive, negative, neutral) and ranks companies based on their news sentiment scores. By processing real-time news data, the system provides valuable insights for investors and financial analysts to make informed decisions. The application architecture supports both local deployment via Docker and cloud deployment through Azure Container Apps for production environments. Additionally, the project implements continuous integration through GitHub Actions workflows that automatically run tests on code changes, ensuring code quality and functionality across the entire application.
financial_news_sentiment_analysis_ranking/
├── assets/ # Screenshots and images for documentation
├── data/ # Input data files (financial_news.csv, news.csv)
├── deployment/ # Deployment-related code
│ ├── app.py # FastAPI application for serving predictions
│ ├── run.py # Script to run both API and Streamlit services
│ └── streamlit_app.py # Streamlit web interface
├── models/ # Saved models and tokenizers
├── notebooks/ # Jupyter notebooks
├── src/ # Core source code
│ ├── data_loader.py # Functions to load and prepare datasets
│ ├── experiment_tracking.py # MLflow integration
│ ├── main.py # Main entry point and orchestration
│ ├── model.py # Model architectures definition
│ ├── preprocessing.py # Text preprocessing utilities
│ ├── ranking.py # News ranking functionality
│ ├── run_experiment.py # MLflow experiment runner
│ ├── sentiment_analysis.py # Sentiment prediction and analysis
│ └── train.py # Model training functions
└── tests/ # Unit tests
- Text Preprocessing: Lowercasing, punctuation removal, stopword removal
- Multiple Model Types: LSTM with BERT embeddings and Logistic Regression
- Sentiment Analysis: Predict positive, negative, and neutral sentiment scores
- News Ranking: Rank financial news by importance
- Model Persistence: Save and load trained models and tokenizers
- Experiment Tracking: Track experiments with MLflow
- API Deployment: Serve predictions through FastAPI
- Web Interface: User-friendly interface with Streamlit
- Containerization: Docker support for easy deployment
The primary model is an LSTM network with BERT embeddings that processes financial news text and outputs sentiment scores. The architecture includes:
- Text tokenization and padding
- Embedding layer (using BERT embeddings)
- Bidirectional LSTM layers
- Dropout for regularization
- Dense output layer for sentiment prediction
-
Clone the Repository
git clone https://github.com/Abdelrahman-Elshahed/financial_news_sentiment_analysis_ranking.git
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use: venv\Scripts\activate
-
Set up dependencies
Install Python dependencies: To install all the required Python packages for the project, run the following command:
pip install -r requirements.txt
This directory contains tests for the Financial News Sentiment Analysis & Ranking project.
- conftest.py: Shared pytest fixtures (sample data, mock model, mock tokenizer)
- test_data_loader.py: Tests for data loading and preparation functions
- test_preprocessing.py: Tests for text preprocessing utilities
- test_sentiment_analysis.py: Tests for sentiment prediction functionality
This project employs GitHub Actions for continuous integration, automatically running tests on every push and pull request to ensure code quality and functionality. The CI pipeline executes the complete test suite using pytest, validating the core components of the application:
- Data loading and preprocessing functionality
- Text preprocessing utilities
- Sentiment analysis prediction accuracy
The workflow is configured in ci-tests.yml
and verifies that all tests pass before code is merged into the main branch. This automated testing approach helps catch issues early in the development process and ensures consistent functionality across all components of the sentiment analysis system.
- To run tests locally:
# From the project root
python -m pytest tests/
- Run the combined service (API + Streamlit)
cd deployment python run.py
- Or run them separately
cd deployment uvicorn app:app --host 0.0.0.0 --port 8000 streamlit run streamlit_app.py
- The application will be available at http://localhost:8501.
- GET /: Health check
- POST /predict: Predict sentiment for a single news article
- POST /predict_batch: Predict sentiment for multiple news articles
- Build the Docker image with:
docker build -t financial-news-sentiment .
- Run the container with:
docker run -p 8000:8000 -p 8501:8501 financial-news-sentiment
Docker image on Docker Hub Click Here.
Pull the image:
docker pull bodaaa/financial-news-sentiment:latest
Run the container:
docker run -p 8000:8000 -p 8501:8501 bodaaa/financial-news-sentiment:latest
The application is deployed on Azure Container Apps for scalable cloud hosting with public access. Azure Container Apps provides a fully managed serverless container service that enables you to run microservices and containerized applications on a serverless platform. The deployment exposes both the FastAPI backend (port 8000) and Streamlit frontend (port 8501) through a single public endpoint.
Visit the deployed application at: https://financial-news-sentiment.jollymushroom-663aaf82.westus2.azurecontainerapps.io/
The project uses MLflow to track experiments, including:
- Model parameters
- Training and validation metrics
- Saved model artifacts
Access the MLflow UI to compare different model configurations and results.
- For DagsHub Experiments Click Here.