- Contents
- Introduction
- Why should I care about Software Engineering principles?
- Project contents
- Getting started
It is becoming increasingly important for Data Scientists to follow software engineering principles. However, there is no clear path for this and the internet is flooded with material that is more suited to a Data Engineer. This project aims to provide a clear path for Data Scientists to become better software engineers.
Software engineering principles help us build robust pipelines that are easy to maintain and well-documented.
They can help us reduce errors:
- Errors can decrease the value our models are generating.
- Errors can lead to key stakeholders getting frustrated and losing trust.
- Errors can delay new projects.
They make collaboration easier:
- If all Data Scientists can fix the pipeline, not just the author, it is easier to plan fixes or changes.
- If all Data Scientists can maintain the pipeline, it means the author leaving the business is less disruptive.
Core software engineering principles have been taken and broken down into small, easy-to-understand best practices. This project can be cloned and used as a tutorial initially and then as a reference further down the line.
Task | Best practice |
---|---|
Initial setup | Python |
Initial setup | IDE |
Initial setup | Git |
Initial setup | Virtual environment |
Writing SQL code | Coding standards and Linter |
Writing SQL code | Data quality reports |
Writing SQL code | Expectation tests |
Writing Python code | Coding standards and Linter |
Writing Python code | Docstrings |
Writing Python code | Logging |
Writing Python code | Exception handling |
Writing Python code | Unit testing |
Versioning | DVC |
Versioning | MLFlow |
Documentation | Markdown files |
Documentation | SQL documentation |
Documentation | Auto-generation |
Peer reviews | Merge requests |
Peer reviews | Merge templates |
Model deployment | Airflow, Docker, Kubernetes |
Model deployment | Docker example |
Model deployment | Airflow example |
Programming | SQL |
Programming | Python classes |
Go to the initial setup section to get started.