Skip to content

A cloud‑native distributed compression platform that splits large files into chunks, compresses them in parallel across a cluster of worker nodes, and seamlessly merges the results.

License

Notifications You must be signed in to change notification settings

ntdkhiem/cloud-distributed-compression-platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cloud Distributed Compression Platform

As a complete noob in Cloud, this is a challenge for myself to learn everything by building an enterprise-scale cool thing from the ground up using Go.

A cloud‑native distributed compression platform that splits large files into chunks, compresses them in parallel across a cluster of worker nodes, and seamlessly merges the results.

Inspired by Silicon Valley series and codingchallenges.fyi :)

Youtube Code-Along

TODO

  • Develop Huffman's algorithm to be used for lossless data compression.
  • Create a pipeline that takes in data for encoding or decoding with the algorithm.
  • Create simple tests for the algorithm and the pipeline.
  • Split input file into chunks for distributed compressing for files > 100MB.
  • Turn the pipeline into a REST API that accepts files (or streams) and returns compressed results.
  • Containerize the service and add proper environment configs.
  • Orchestrate the compression service on a cluster using Kubernetes.
  • Expose metrics with Prometheus and see them in Grafana.
  • Implement logging & tracing with OpenTelemetry.
  • Split large files into chunks and process them in parallel across pods using pub/sub pattern.
  • Deploy the system to Google Cloud Platform.
  • a) Use Google Kubernetes Engine
  • b) Store input/output in Google Cloud Storage
  • c) Use Pub/Sub for chunk distribution
  • d) Add CI/CD with Cloud Build
  • e) Secure the infrastructure by implementing proper IAM policies
  • (maybe) Support multiple algorithms other than Huffman
  • (maybe) Look into arithmetic coding (adaptive version) -- known to be better at distributed compression

About

A cloud‑native distributed compression platform that splits large files into chunks, compresses them in parallel across a cluster of worker nodes, and seamlessly merges the results.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages