This project implements a structured and automated pipeline for retrieving, transforming, and storing Credit Default Swap (CDS) pricing data published by the LCH Group. It scrapes pricing content from the LCH post-trade website, processes the data into a normalized format, and loads it into a Microsoft SQL Server database for downstream use in risk analysis, reconciliation, and compliance reporting.
This pipeline serves the following objectives:
- Retrieve daily CDS pricing data, including fixed rate curves, tiers, maturities, and contract identifiers.
- Transform the raw JSON into a clean, structured tabular format.
- Persist the resulting dataset into a SQL Server environment for enterprise-scale consumption.
The processed data supports functions across finance, operations, and regulatory reporting teams.
Pricing data is publicly provided by LCH via its CDSClear post-trade portal:
- Source: https://www.lseg.com/en/post-trade/clearing/lch-services/cdsclear/essentials/pricing-data
- Data Format: JSON via discovered hyperlinks embedded in the pricing portal
- Coverage: Credit Index and Single Name CDS curves, series-based valuation
Fields included in the transformed dataset:
valuation_date
instrument
instrument_name
series
version
contractual_definitions
fixed_rate
maturity
index_factor
price
ticker
tier
doc_clause
The ingestion process is organized into the following stages:
-
Discovery:
- Web scraping logic extracts URLs pointing to relevant JSON pricing files.
-
Data Retrieval:
- The scraper downloads each JSON file and converts it to a DataFrame.
-
Data Transformation:
- Field normalization, date parsing, and column mapping are handled by the transformation layer.
-
Database Insertion:
- The transformed dataset is inserted into SQL Server using batch insertion with retry support.
lch-client-main/
├── main.py # Entry point
├── config/ # Environment and logger setup
│ ├── settings.py
│ └── logger.py
├── scraper/ # LCH pricing discovery and JSON loader
│ └── engine.py
├── transformer/ # Pricing transformation routines
│ └── agent.py
├── database/ # MSSQL helper functions
├── utils/ # Retry logic and inserter abstraction
├── .env.sample # Example environment variables
├── Dockerfile # Container setup
├── requirements.txt # Python dependencies
The application is configured via a .env
file modeled after .env.sample
. The following variables are required:
Variable | Description |
---|---|
LOG_LEVEL |
Logging level (INFO , DEBUG , etc.) |
OUTPUT_TABLE |
SQL Server table for data storage |
MSSQL_SERVER , MSSQL_DATABASE , MSSQL_USERNAME , MSSQL_PASSWORD |
Database connection configuration |
INSERTER_MAX_RETRIES |
Maximum retry attempts for failed inserts |
REQUEST_MAX_RETRIES , REQUEST_BACKOFF_FACTOR |
Control exponential backoff behavior during data fetch |
To run the pipeline in a containerized environment:
docker build -t lch-client .
docker run --env-file .env lch-client
Install dependencies via:
pip install -r requirements.txt
Key libraries used:
requests
,beautifulsoup4
,lxml
for scraping and parsingpandas
for data manipulationSQLAlchemy
,pyodbc
for SQL Server integrationpython-decouple
for configuration management
Once the environment is set:
python main.py
This will:
- Discover CDS JSON files
- Extract, clean, and transform pricing data
- Insert the result into the specified database table
This project is distributed under the MIT License. Users are responsible for ensuring compliance with LCH Group’s published data usage terms.