Skip to content

Commit 14e4912

Browse files
author
Francesco Calcavecchia
committed
update pre-commit config
1 parent bc72133 commit 14e4912

File tree

3 files changed

+48
-49
lines changed

3 files changed

+48
-49
lines changed

.pre-commit-config.yaml

Lines changed: 9 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ default_language_version:
22
python: python3
33
repos:
44
- repo: https://github.com/pre-commit/pre-commit-hooks
5-
rev: v4.4.0
5+
rev: v5.0.0
66
hooks:
77
- id: check-added-large-files
88
- id: check-ast
@@ -12,29 +12,13 @@ repos:
1212
- id: check-json
1313
- id: check-toml
1414
- id: check-yaml
15-
exclude: mkdocs.yml
16-
- repo: https://github.com/psf/black
17-
rev: 23.1.0
18-
hooks:
19-
- id: black
20-
exclude: test/data/schema/wrong_syntax.py
21-
- repo: https://github.com/pre-commit/mirrors-mypy
22-
rev: v1.0.0
23-
hooks:
24-
- id: mypy
25-
exclude: test/data/schema/wrong_syntax.py
26-
- repo: https://github.com/dosisod/refurb
27-
rev: v1.11.1
28-
hooks:
29-
- id: refurb
30-
exclude: test/data/schema/wrong_syntax.py
31-
- repo: https://github.com/charliermarsh/ruff-pre-commit
32-
rev: 'v0.0.247'
33-
hooks:
34-
- id: ruff
35-
args: [--fix, --exit-non-zero-on-fix]
3615
- repo: https://github.com/tcort/markdown-link-check
37-
rev: 'v3.11.2'
16+
rev: "v3.13.7"
17+
hooks:
18+
- id: markdown-link-check
19+
args: [-q]
20+
- repo: https://github.com/executablebooks/mdformat
21+
rev: 0.7.22
3822
hooks:
39-
- id: markdown-link-check
40-
args: [-q]
23+
- id: mdformat
24+
args: ["--wrap", "120"]

docs/examples.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Examples
22

3-
In [Energy DaC](https://gitlab.com/data-as-code/energy-dac-example) you can pip install some energy-related data as code.
4-
The Readme will guide you through a demo.
5-
You can also inspect the repo to see how the DaC package was built using `dac`.
3+
In [Energy DaC](https://gitlab.com/data-as-code/energy-dac-example) you can pip install some energy-related data as
4+
code. The Readme will guide you through a demo. You can also inspect the repo to see how the DaC package was built using
5+
`dac`.

docs/index.md

Lines changed: 36 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,13 @@ Data-as-Code (DaC) `dac` is a tool that supports the distribution of data as (py
1313
## How will the Data Scientists use a DaC package?
1414

1515
Say that the Data Engineers prepared the `demo-data` as code for you. Then you will install the code in your environment
16+
1617
```sh
1718
python -m pip install demo-data
1819
```
20+
1921
and then you will be able to access the data simply with
22+
2023
```python
2124
from demo_data import load
2225

@@ -25,56 +28,68 @@ data = load()
2528

2629
Data can be in any format. There is no constraint of any kind.
2730

28-
Not only accessing data will be this easy but, depending on how data were prepared, you may also have access to useful metadata. How?
31+
Not only accessing data will be this easy but, depending on how data were prepared, you may also have access to useful
32+
metadata. How?
33+
2934
```python
3035
from demo_data import Schema
3136
```
3237

3338
With the schema you could, for example
3439

35-
* access the column names (e.g. `Schema.my_column`)
36-
* unit test your functions by getting a data example with `Schema.example()`
40+
- access the column names (e.g. `Schema.my_column`)
41+
- unit test your functions by getting a data example with `Schema.example()`
3742

3843
## How can a Data Engineer provide a DaC python package?
3944

4045
Install this library
46+
4147
```sh
4248
python -m pip install dac
4349
```
50+
4451
and use the command `dac pack` (run `dac pack --help` for detailed instructions).
4552

4653
On a high level, the most important elements you must provide are:
4754

48-
* python code to load the data
49-
* a `Schema` class that at very least contains a `validate` method, but possibly also
55+
- python code to load the data
5056

51-
- data field names (column names, if data is tabular)
52-
- an `example` method
57+
- a `Schema` class that at very least contains a `validate` method, but possibly also
5358

54-
* python dependencies
59+
- data field names (column names, if data is tabular)
60+
- an `example` method
5561

56-
!!! hint "Use `pandera` to define the Schema"
62+
- python dependencies
5763

58-
If the data type you are using is supported by [`pandera`](https://pandera.readthedocs.io/en/stable/index.html) consider using a [`DataFrameModel`](https://pandera.readthedocs.io/en/stable/dataframe_models.html) to define the Schema.
64+
!!! hint "Use `pandera` to define the Schema"
5965

66+
```
67+
If the data type you are using is supported by [`pandera`](https://pandera.readthedocs.io/en/stable/index.html) consider using a [`DataFrameModel`](https://pandera.readthedocs.io/en/stable/dataframe_models.html) to define the Schema.
68+
```
6069

6170
## What are the advantages of distributing data in this way?
6271

63-
* The code needed to load the data, the data source, and locations are abstracted away from the user.
64-
This mean that the data engineer can start from local files, transition to SQL database, cloud file storage, or kafka topic, without having the user to notice it or need to adapt its code.
72+
- The code needed to load the data, the data source, and locations are abstracted away from the user. This mean that the
73+
data engineer can start from local files, transition to SQL database, cloud file storage, or kafka topic, without
74+
having the user to notice it or need to adapt its code.
6575

66-
* *If you provide data field names in `Schema`* (e.g. `Schema.column_1`), the user code will not contain hard-coded column names, and changes in data source field names won't impact the user.
76+
- *If you provide data field names in `Schema`* (e.g. `Schema.column_1`), the user code will not contain hard-coded
77+
column names, and changes in data source field names won't impact the user.
6778

68-
* *If you provide the `Schema.example` method*, users will be able to build robust code by writing unit testing for their functions effortlessly.
79+
- *If you provide the `Schema.example` method*, users will be able to build robust code by writing unit testing for
80+
their functions effortlessly.
6981

70-
* Semantic versioning can be used to communicate significant changes:
82+
- Semantic versioning can be used to communicate significant changes:
7183

72-
* a patch update corresponds to a fix in the data: its intended content is unchanged
73-
* a minor update corresponds to a change in the data that does not break the schema
74-
* a major update corresponds to a change in the schema, or any other breaking change
84+
- a patch update corresponds to a fix in the data: its intended content is unchanged
85+
- a minor update corresponds to a change in the data that does not break the schema
86+
- a major update corresponds to a change in the schema, or any other breaking change
7587

76-
In this way data pipelines can subscribe to the appropriate updates. Furthermore, it will be easy to keep releasing data updates maintaining retro-compatibility (one can keep deploying `1.X.Y` updates even after version `2` has been rolled-out).
88+
In this way data pipelines can subscribe to the appropriate updates. Furthermore, it will be easy to keep releasing
89+
data updates maintaining retro-compatibility (one can keep deploying `1.X.Y` updates even after version `2` has been
90+
rolled-out).
7791

78-
* Description of the data and columns can be included in the schema, and will therefore reach the user together with the data.
92+
- Description of the data and columns can be included in the schema, and will therefore reach the user together with the
93+
data.
7994

80-
* Users will always know where to look for data: the PyPi index.
95+
- Users will always know where to look for data: the PyPi index.

0 commit comments

Comments
 (0)