Skip to content

raz1470/ds_engineering_path

Repository files navigation

Contents

Introduction

It is becoming increasingly important for Data Scientists to follow software engineering principles. However, there is no clear path for this and the internet is flooded with material that is more suited to a Data Engineer. This project aims to provide a clear path for Data Scientists to become better software engineers.

Why should I care about Software Engineering principles?

Software engineering principles help us build robust pipelines that are easy to maintain and well-documented.

They can help us reduce errors:

  • Errors can decrease the value our models are generating.
  • Errors can lead to key stakeholders getting frustrated and losing trust.
  • Errors can delay new projects.

They make collaboration easier:

  • If all Data Scientists can fix the pipeline, not just the author, it is easier to plan fixes or changes.
  • If all Data Scientists can maintain the pipeline, it means the author leaving the business is less disruptive.

Project contents

Core software engineering principles have been taken and broken down into small, easy-to-understand best practices. This project can be cloned and used as a tutorial initially and then as a reference further down the line.

Task Best practice
Initial setup Python
Initial setup IDE
Initial setup Git
Initial setup Virtual environment
Writing SQL code Coding standards and Linter
Writing SQL code Data quality reports
Writing SQL code Expectation tests
Writing Python code Coding standards and Linter
Writing Python code Docstrings
Writing Python code Logging
Writing Python code Exception handling
Writing Python code Unit testing
Versioning DVC
Versioning MLFlow
Documentation Markdown files
Documentation SQL documentation
Documentation Auto-generation
Peer reviews Merge requests
Peer reviews Merge templates
Model deployment Airflow, Docker, Kubernetes
Model deployment Docker example
Model deployment Airflow example
Programming SQL
Programming Python classes

Getting started

Go to the initial setup section to get started.

About

A path for Data Scientists to become better engineers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published