This project contains a working streamlit web app to visualize and work with geo data from the dataset provided by the RIVM of chemical concentrations measured using water samples. The project also contains several notebooks that were used, among other things, to explore the dataset, experiment with factorization machines to fill gaps in the scarce dataset, explore the hydrological structure of the Dutch waters to connect the different locations and to perform feature engineering on the dataset.
The project is set to be public, as requested by the client: Nationale Politie
.
This project contains basic Python project features:
- structure of the python project
- readme and changelog (
README.md
,CHANGELOG.md
) - package namespacing (e.g.
ff
)
- readme and changelog (
- running multiple checks
- code style
- type checking with
mypy
- formatting and linting with
ruff
-
Follow the instructions in the
explanation
folder regarding the set-up of a Python project on your local machine. -
Currently, you will need to download the dataset into the
data
folder. The data can be downloaded from the Azure Blob Storage through a SAS which can be requested to one of the repo admins. Use theAzure Storage Explorer
to view and download the data with the SAS.Download the
data
directory from theff-politie-pollution
blob.Make sure to move the downloaded data and file structure as-is to the
data
folder.
To use this repository, perform the following steps:
- Clone the repository
- Start developing
- Supported Python versions (see
pyproject.toml
)installed. See HERE. - The poetry dependency manager. Install as described HERE
- Make sure you have
poetry
installed- More info can be found in the knowledgebase/tooling/poetry.md
- Make sure you work with
pyenv
and that you have the correctpython --version
- More info can be found in the knowledgebase/tooling/pyenv.md
- The correct python version can be found in the
.python-version
file.
- Make sure you work in a python virtual environment
- On MacOS and Linux
poetry
can make a virtual environment for you automatically. - On windows I would install one manually via
venv
:python -m venv .venv
- On MacOS and Linux
poetry install
poetry run pre-commit install
On Windows a Makefile
might not work for you. So we've added scripts/make.ps1
for Windows. It should work almost the same way.
To run all checks (tests, linting, etc), run:
make check
To run the tests, use the following command:
make pytest
make type-check
ruff
can be used to automatically format code, so you don't have to
worry about the nitty-gritty of code style.
make check_ruff
To fix issues in the repository you can format everything with ruff
via this command:
make format
To contribute, create a pull request. Before merging, all checks must pass successfully, check the github actions workflows.
Please update the CHANGLELOG.md
and the version in pyproject.toml
accordingly.
Streamlit
, Data Science
, Web App
, Geo Data
, Water pollution
, Outlier detection
.