LCH CDS Pricing Data Ingestion Pipeline

This project implements a structured and automated pipeline for retrieving, transforming, and storing Credit Default Swap (CDS) pricing data published by the LCH Group. It scrapes pricing content from the LCH post-trade website, processes the data into a normalized format, and loads it into a Microsoft SQL Server database for downstream use in risk analysis, reconciliation, and compliance reporting.

Overview

Purpose

This pipeline serves the following objectives:

Retrieve daily CDS pricing data, including fixed rate curves, tiers, maturities, and contract identifiers.
Transform the raw JSON into a clean, structured tabular format.
Persist the resulting dataset into a SQL Server environment for enterprise-scale consumption.

The processed data supports functions across finance, operations, and regulatory reporting teams.

Source of Data

Pricing data is publicly provided by LCH via its CDSClear post-trade portal:

Source: https://www.lseg.com/en/post-trade/clearing/lch-services/cdsclear/essentials/pricing-data
Data Format: JSON via discovered hyperlinks embedded in the pricing portal
Coverage: Credit Index and Single Name CDS curves, series-based valuation

Fields included in the transformed dataset:

valuation_date
instrument
instrument_name
series
version
contractual_definitions
fixed_rate
maturity
index_factor
price
ticker
tier
doc_clause

Pipeline Flow

The ingestion process is organized into the following stages:

Discovery:
- Web scraping logic extracts URLs pointing to relevant JSON pricing files.
Data Retrieval:
- The scraper downloads each JSON file and converts it to a DataFrame.
Data Transformation:
- Field normalization, date parsing, and column mapping are handled by the transformation layer.
Database Insertion:
- The transformed dataset is inserted into SQL Server using batch insertion with retry support.

Project Structure

lch-client-main/
├── main.py                   # Entry point
├── config/                   # Environment and logger setup
│   ├── settings.py
│   └── logger.py
├── scraper/                  # LCH pricing discovery and JSON loader
│   └── engine.py
├── transformer/              # Pricing transformation routines
│   └── agent.py
├── database/                 # MSSQL helper functions
├── utils/                    # Retry logic and inserter abstraction
├── .env.sample               # Example environment variables
├── Dockerfile                # Container setup
├── requirements.txt          # Python dependencies

Environment Variables

The application is configured via a .env file modeled after .env.sample. The following variables are required:

Variable	Description
`LOG_LEVEL`	Logging level (`INFO`, `DEBUG`, etc.)
`OUTPUT_TABLE`	SQL Server table for data storage
`MSSQL_SERVER`, `MSSQL_DATABASE`, `MSSQL_USERNAME`, `MSSQL_PASSWORD`	Database connection configuration
`INSERTER_MAX_RETRIES`	Maximum retry attempts for failed inserts
`REQUEST_MAX_RETRIES`, `REQUEST_BACKOFF_FACTOR`	Control exponential backoff behavior during data fetch

Docker Deployment

To run the pipeline in a containerized environment:

Build

docker build -t lch-client .

Execute

docker run --env-file .env lch-client

Local Installation

Install dependencies via:

pip install -r requirements.txt

Key libraries used:

requests, beautifulsoup4, lxml for scraping and parsing
pandas for data manipulation
SQLAlchemy, pyodbc for SQL Server integration
python-decouple for configuration management

Execution

Once the environment is set:

python main.py

This will:

Discover CDS JSON files
Extract, clean, and transform pricing data
Insert the result into the specified database table

License

This project is distributed under the MIT License. Users are responsible for ensuring compliance with LCH Group’s published data usage terms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LCH CDS Pricing Data Ingestion Pipeline

Overview

Purpose

Source of Data

Pipeline Flow

Project Structure

Environment Variables

Docker Deployment

Build

Execute

Local Installation

Execution

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
config		config
database		database
scraper		scraper
transformer		transformer
utils		utils
.dockerignore		.dockerignore
.env.sample		.env.sample
.flake8		.flake8
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
azure-pipelines.yaml		azure-pipelines.yaml
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

pvotio/lch-client

Folders and files

Latest commit

History

Repository files navigation

LCH CDS Pricing Data Ingestion Pipeline

Overview

Purpose

Source of Data

Pipeline Flow

Project Structure

Environment Variables

Docker Deployment

Build

Execute

Local Installation

Execution

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages