Welcome to my Polygon Finance ElT data pipeline project!

Project Overview

This project involves the creation of an Extract, Load, Transform (ELT) data pipeline that fetches data from the Polygon Finance website using its API, loads it into a PostgreSQL database, transfers it to a Google Cloud Storage (GCS) bucket, and finally moves it to BigQuery, the analytical database. The entire pipeline is orchestrated with Airflow using the Celery executor on Docker-compose. The tasks are grouped into three categories: stock, forex, and crypto, each with three different workflows - extraction to PostgreSQL, upload to GCS, and upload to BigQuery. in addition,How to load data from Polygon Stock API to PostgreSQL destination is found here

Project Presentation

Do check out the below links that record the presentation of the project by me

For non-technical/Business individuals: Non-Technical_link
For technical individuals Part 1: Technical_link-part_1
For technical individuals Part 3: Technical_link-part_2

Impact and Usefulness

The pipeline provides a robust and automated way to gather, process, and analyze financial data from various markets, including stocks, forex, and cryptocurrencies. This data can be used by businesses and investors to make informed decisions based on real-time and historical market trends. The pipeline's automation reduces the time and effort required to gather and process this data, allowing businesses to focus more on analysis and decision-making.

Data Dictionary

For a detailed understanding of the data fields in the tables containing daily open, high, low, and close (OHLC) data, refer to the provided Data Dictionary Link. The dictionary encompasses information for stocks/equities, forex, and cryptocurrencies, with specific details for each market, such as adjustments for splits, request IDs, exchange symbols, and more.

Data Quality Checks and Testing

A notable enhancement to this pipeline is the incorporation of robust data quality checks and testing within dbt. These checks ensure the reliability and accuracy of the processed data, contributing to the overall integrity of the analytics and insights derived from the pipeline.

Data Transformation and Metrics

The dbt tool is used to perform transformations and generate metrics. For each of the three finance data (stock, crypto, forex), a metric was developed to get the average trading prices and the number of transactions from increasing historical data. These metrics are represented in the agg_stock_avg, agg_crypto_avg, and agg_forex_avg tables.

Technical Details

The pipeline uses various technologies and tools to perform its tasks. The Polygon Finance API is used to extract the data, which is then loaded into a PostgreSQL database. The data is then transferred to a GCS bucket using the PostgresToGCSOperator in Airflow. Subsequently, the GCSToBigQuery Operator is employed to move the data from GCS to BigQuery. Finally, dbt is used within BigQuery for further transformations, metrics generation, and rigorous data quality checks. Orchestrated with Airflow from the data extraction down to the final stage involving dbt(A dbt job was utilized and it get triggered after the data finishes loading into BigQuery).

Image Workflows

Below are images showing the Airflow logs, complete table lineage, data at BigQuery, and data at GCS. These images will provide a visual representation of the pipeline's operation and the data it processes.

Airflow dag log
Table Lineage
Data at GCS Bucket
Raw tables in BiQuery
Crypto
Forex
Stock
Stock data in Postgres

Installation and Setup

There is a separate README file that provides detailed instructions on how to install and set up Airflow with the necessary connections. This file can be accessed via the provided link. Check out the Airflow setup instructions here

Future Improvement

Just like any Data Engineer, I'm looking to always find improvement and scale my data solutions to efficiently solve business problems. some improvements I have in mind:

Expansion of metrics based on specific business needs.
Integration of additional financial markets for broader insights.
Optimization of pipeline performance for scalability.

Conclusion

This project stands as a robust and automated solution, offering a streamlined approach to gather, process, and analyze financial data from diverse markets. Beyond providing valuable insights, it serves as a testament to the power and flexibility of modern data pipeline technologies. Open to feedback and improvement, the project maintains an adaptive stance, with opportunities for further enhancements, scalability, and a strong commitment to data quality. Feel free to contribute through pull requests to shape the evolution of this dynamic data engineering solution.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Airflow_Codes		Airflow_Codes
analyses		analyses
macros		macros
models		models
seeds		seeds
snapshots		snapshots
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dbt_project.yml		dbt_project.yml
packages.yml		packages.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Welcome to my Polygon Finance ElT data pipeline project!

Project Overview

Project Presentation

Impact and Usefulness

Data Dictionary

Data Quality Checks and Testing

Data Transformation and Metrics

Technical Details

Image Workflows

Installation and Setup

Future Improvement

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

krissemmy/Polygon-Finance-Data-ELT

Folders and files

Latest commit

History

Repository files navigation

Welcome to my Polygon Finance ElT data pipeline project!

Project Overview

Project Presentation

Impact and Usefulness

Data Dictionary

Data Quality Checks and Testing

Data Transformation and Metrics

Technical Details

Image Workflows

Installation and Setup

Future Improvement

Conclusion

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages