-
Clone the repo
git clone https://github.com/imjuliengaupin/sparkler.git
⚙️ Features
-
Modular and configurable to work with a locally installed, pseudo-distributed
Apache Hadoop
machine cluster -
Apache Spark
structured event streaming withApache Kafka
-
Distributed Extract-Transform-Load (ETL) data processing with
Apache Spark
-
A custom Suite class (leveraging object-oriented programming abstraction concepts) to create independent and modular objects that leverage common functionality and can be used when connecting to different databases to extract data into a DataFrame object to apply transformations, using the
DataFrame API
-
A custom Suite class (leveraging object-oriented programming abstraction concepts) to create independent and modular objects that leverage common functionality and can be used when extracting the content of different file varieties into a DataFrame object to apply transformations, using the
DataFrame API
-
See the open issues for a full list of proposed features (and known issues).
🔁 CI/CD
💻 Demo
If you find interest in this project and want to share your own insights, enhancements, or bugfixes, please feel free to contribute!
- Fork the project
- Create your feature branch
git checkout -b feature/branchname
- Commit your changes
git commit -m 'description'
- Push your feature branch
git push origin feature/branchname
- Open a pull request
📝 License
Distributed under the MIT License. See LICENSE
for more information.