Capital Markets Loaders Service

This repository hosts the backend for the Capital Markets Loaders service. It is designed to handle the extraction, transformation, and loading (ETL) of financial data from various sources, including Yahoo Finance and the Federal Reserve Economic Data (FRED) API, as well as financial news processing.

High Level Architecture

What is the Capital Markets Loaders Service?

The Capital Markets Loaders Service is responsible for:

Extracting market data from Yahoo Finance.
Extracting macroeconomic data from the FRED API.
Scraping and processing financial news.
Generating portfolio performance (emulation).
Transforming and loading the extracted data into MongoDB for further analysis.

Where Does MongoDB Shine?

MongoDB stands out as an ideal database solution for this Capital Markets Loaders service due to its exceptional ability to handle diverse data types within a single database platform:

Handling Diverse Financial Data Types

Market Data (Time Series Collections): The service retrieves market data for various asset classes (equities, bonds, real estate, commodities) from Yahoo Finance. MongoDB's Time Series collections are perfectly suited for this data, offering optimized storage, efficient querying, and automatic data expiration for historical market information.
Financial News (Unstructured Data & Vector Search): MongoDB excels at storing and querying unstructured data like financial news articles. Additionally, the document model seamlessly accommodates article embeddings generated by the service, enabling powerful Vector Search capabilities for semantic similarity searches and AI-driven insights.
Macroeconomic Indicators: Data from the FRED API is efficiently stored in standard MongoDB collections, allowing for flexible querying and aggregation of economic indicators that can be correlated with market movements.
Portfolio Performance Data: The emulated portfolio performance data fits naturally into MongoDB's document model, enabling efficient storage and retrieval of complex investment performance metrics.

This versatility eliminates the need for multiple specialized databases, reducing architectural complexity while providing performance-optimized storage for each data type—a significant advantage for financial applications that process and analyze diverse datasets.

The 4 Pillars of the Document Model

Easy: MongoDB's document model naturally fits with object-oriented programming, utilizing BSON documents that closely resemble JSON. This design simplifies the management of complex data structures such as user accounts, allowing developers to build features like account creation, retrieval, and updates with greater ease.
Fast: Following the principle of "Data that is accessed together should be stored together," MongoDB enhances query performance. This approach ensures that related data—like user and account information—can be quickly retrieved, optimizing the speed of operations such as account look-ups or status checks, which is crucial in services demanding real-time access to operational data.
Flexible: MongoDB's schema flexibility allows account models to evolve with changing business requirements. This adaptability lets financial services update account structures or add features without expensive and disruptive schema migrations, thus avoiding costly downtime often associated with structural changes.
Versatile: The document model in MongoDB effectively handles a wide variety of data types, such as strings, numbers, booleans, arrays, objects, and even vectors. This versatility empowers applications to manage diverse account-related data, facilitating comprehensive solutions that integrate user, account, and transactional data seamlessly.

MongoDB Key Features

Time Series - (More info): For storing market data in a time series format.
Atlas Vector Search (More info): For enabling vector search on financial news data.

Tech Stack

MongoDB Atlas for the database.
FastAPI for the backend framework.
Poetry for dependency management.
Uvicorn for ASGI server.
Docker for containerization.

Relevant Python Packages

yfinance for extracting market data from Yahoo Finance.
pyfredapi for extracting macroeconomic data from the FRED API.
pandas for data manipulation.
scheduler for job scheduling.
transformers for NLP tasks.

Relevant Models

FinBERT for sentiment score calculation.
voyage-finance-2 for generating article embeddings.

ETL Processes

Yahoo Finance Market Data ETL: Extracts, transforms, and loads market data for various asset types using the yfinance Python package.
FRED API Macroeconomic Data ETL: Extracts, transforms, and loads macroeconomic data using the pyfredapi Python package.
Financial News Processing: Scrapes financial news, generates embeddings using voyage-finance-2 model from Voyage AI, and calculates sentiment scores using FinBERT, a pre-trained NLP model to analyze sentiment of financial text.

News Sentiment Logic (Calculation)

The financial news sentiment analysis follows a sophisticated pipeline to extract meaningful insights from news articles:

Data Ingestion: The system scrapes financial news articles from Yahoo Search, storing them in the financial_news collection in MongoDB. For Phase 1, we use a fixed dataset of approximately 255 articles covering the 10 assets in the portfolio (about 20 articles per asset).
Text Processing: For each article, we construct a comprehensive article_string by concatenating multiple fields:

Headline: QQQ Leads Inflows as VGIT, HYG Jump: ETF Flows as of Feb. 27
/n Description: Top 10 Creations (All ETFs) Ticker Name Net Flows ($, mm) AUM ($, mm) AUM % Change QQQ Invesco QQQ...
/n Source: etf.com ·  via Yahoo Finance
/n Ticker: HYG
/n Link: https://finance.yahoo.com/news/qqq-leads-inflows-vgit-hyg-005429051.html?fr=sycsrp_catchall

Sentiment Analysis: The article_string is processed by FinBERT, a financial-domain-specific language model trained to understand financial text sentiment. This generates a sentiment score for each article.
Data Enrichment: The sentiment scores are stored back in the financial_news collection, associating each article with its computed sentiment.
Vector Embedding Generation: The same article_string is passed to the voyage-finance-2 model, generating a 1024-dimensional vector representation (article_embedding) that captures the semantic meaning of the article.
Semantic Search Implementation: Using MongoDB's Vector Search capability, the system can find semantically similar news articles based on these embeddings—identifying both explicit mentions of a ticker symbol and contextually relevant articles that don't directly reference it.
Portfolio Sentiment Calculation: For each asset in the portfolio, the system calculates an average sentiment score from its related articles, providing a consolidated sentiment indicator that helps assess market perception of that asset.

This approach enables both explicit keyword matching and deeper semantic understanding of financial news, offering more comprehensive insights than traditional text-based searches.

Scheduler

Job Scheduling: Uses the scheduler Python package to schedule and manage ETL processes.

Prerequisites

Before you begin, ensure you have met the following requirements:

MongoDB Atlas account - Register Here
Python 3.10 or higher
Poetry (install via Poetry's official documentation)

Setup Instructions

Step 1a: Set Up MongoDB Database and Collections

Log in to MongoDB Atlas and create a database named agentic_capital_markets. Ensure the name is reflected in the environment variables.
Create the following collections:
- financial_news (for storing financial news data) - You can export some sample data to this collection using backend/loaders/db/collections/agentic_capital_markets.financial_news.json file.
- pyfredapiMacroeconomicIndicators (for storing macroeconomic data) - You can export some sample data to this collection using backend/loaders/db/collections/agentic_capital_markets.pyfredapiMacroeconomicIndicators.json file.
- yfinanceMarketData (for storing market data) - You can export some sample data to this collection using backend/loaders/db/collections/agentic_capital_markets.yfinanceMarketData.json file. Additionally, there are some more backup files in this directory that you can use to populate the collection: backend/loaders/backup/*
- portfolio_allocation (for storing portfolio allocation data)
- portfolio_performance (for storing portfolio performance data)
- chartMappings (for storing chart mappings)

Note: For creating the time series collection, you can run the following python script located in the backend/loaders/db/ directory: mdb_timeseries_coll_creator.py. Make sure to parametrize the script accordingly.

Step 1b: Set Up Vector Search Index

Create the vector search index for the financial_news collection.

Note: For creating the vector search index, you can run the following python script located in the backend/loaders/db/ directory: mdb_vector_search_idx_creator.py. Make sure to parametrize the script accordingly.

Step 2: Add MongoDB User

Follow MongoDB's guide to create a user with readWrite access to the agentic_capital_markets database.

Configure Environment Variables

Create a .env file in the /backend directory with the following content:

MONGODB_URI="your_mongodb_uri"
DATABASE_NAME="agentic_capital_markets"
APP_NAME="your_app_name"
VOYAGE_API_KEY="your_voyage_api_key"
FRED_API_KEY="your_fred_api_key"
PORTFOLIO_PERFORMANCE_COLLECTION = "portfolio_performance"
YFINANCE_TIMESERIES_COLLECTION = "yfinanceMarketData"
PYFREDAPI_COLLECTION = "pyfredapiMacroeconomicIndicators"
NEWS_COLLECTION = "financial_news"
VECTOR_INDEX_NAME = "financial_news_VS_IDX"
VECTOR_FIELD = "article_embedding"
SCRAPE_NUM_ARTICLES = 1

Running the Backend

Virtual Environment Setup with Poetry

Open a terminal in the project root directory.
Run the following commands:
```
make poetry_start
make poetry_install
```
Verify that the .venv folder has been generated within the /backend directory.

Start the Backend

To start the backend service, run:

poetry run uvicorn main:app --host 0.0.0.0 --port 8004

Default port is 8004, modify the --port flag if needed.

Running with Docker

Run the following command in the root directory:

make build

To remove the container and image:

make clean

API Documentation

You can access the API documentation by visiting the following URL:

http://localhost:<PORT_NUMBER>/docs

E.g. http://localhost:8004/docs

Note: Make sure to replace <PORT_NUMBER> with the port number you are using and ensure the backend is running.

Common errors

Check that you've created an .env file that contains the required environment variables.

Future tasks

Add tests
Evaluate SonarQube for code quality
Automate the deployment process using GitHub Actions or CodePipeline

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
architecture		architecture
backend		backend
notebooks		notebooks
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile.backend		Dockerfile.backend
README.md		README.md
docker-compose.yml		docker-compose.yml
makefile		makefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Capital Markets Loaders Service

High Level Architecture

What is the Capital Markets Loaders Service?

Where Does MongoDB Shine?

Handling Diverse Financial Data Types

The 4 Pillars of the Document Model

MongoDB Key Features

Tech Stack

Relevant Python Packages

Relevant Models

ETL Processes

News Sentiment Logic (Calculation)

Scheduler

Prerequisites

Setup Instructions

Step 1a: Set Up MongoDB Database and Collections

Step 1b: Set Up Vector Search Index

Step 2: Add MongoDB User

Configure Environment Variables

Running the Backend

Virtual Environment Setup with Poetry

Start the Backend

Running with Docker

API Documentation

Common errors

Future tasks

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mongodb-industry-solutions/leafy-bank-backend-capitalmarkets-loaders

Folders and files

Latest commit

History

Repository files navigation

Capital Markets Loaders Service

High Level Architecture

What is the Capital Markets Loaders Service?

Where Does MongoDB Shine?

Handling Diverse Financial Data Types

The 4 Pillars of the Document Model

MongoDB Key Features

Tech Stack

Relevant Python Packages

Relevant Models

ETL Processes

News Sentiment Logic (Calculation)

Scheduler

Prerequisites

Setup Instructions

Step 1a: Set Up MongoDB Database and Collections

Step 1b: Set Up Vector Search Index

Step 2: Add MongoDB User

Configure Environment Variables

Running the Backend

Virtual Environment Setup with Poetry

Start the Backend

Running with Docker

API Documentation

Common errors

Future tasks

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages