RAP Devices

This is the RAP rework of the Specialised Services Devices Programming (SSDP) reporting pipeline.

What is RAP?

Reproducible Analytical Pipelines is a set of tools, principles, and techniques to help you improve your analytical processes. Learn more about RAP on the RAP Community of Practice Website

Prerequisites

To use this repository you will need access to:

Python 3.12

It is recommended you have access to:

A Linux based development environment: e.g. WSL, GitHub Codespaces, etc.

Getting Started

1 Clone the repository

To learn about what this means, and how to use Git, see the Git guide.

git clone https://github.com/nhsengland/devices_rap.git

2 Set up your environment

Either use pip or make (if you are using a linux based development environment this will be already installed). For more information on how to use virtual environments and why they are important,. see the virtual environments guide.

Using pip

If using Windows Powershell:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -r requirements.txt

If using Linux:

python -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt

Using `make`

There is a handy Makefile with commands form make to use to set up and run the environment:

make create-environment
source .venv/bin/activate
make requirements

These commands:

Creates a virtual environment in .venv
Activates the environment
Updates pip
Installs from requirements.txt

Make can also do other stuff, which will be touched on later, in the meantime, use make help to see the commands that can be run.

For Visual Studio Code it is necessary that you change your default interpreter to the virtual environment you just created .venv. To do this use the shortcut Ctrl-Shift-P, search for Python: Select interpreter and select .venv from the list. Sometimes it is nice as asks you once you create the environment, the eager to help thing.

3 Set up Pre-commits (Only needed for developers not users of the pipeline)

Pre-commits allow us to automatically check our code before we commit changes. This can be important for ensuring security and quality in our code. Currently two hooks run:

Gitleaks - a SAST tool for detecting and preventing hardcoded secrets like passwords, API keys, and tokens in git repos.
Linting checks - runs the make lint command, which uses Flake8 and Black to ensure that the repository is maintaining expected coding standards, such as PEP8.

To set up the pre-commits run the following commands:

If using a shell with make installed:

make pre-commits

Otherwise:

pre-commit autoupdate
pre-commit install
pre-commit run --all-files

Running the code

Now the pipeline is set up and ready to run, use the following command to run the pipeline:

If using a shell with make installed:

make run_pipeline

Otherwise:

python devices_rap/pipeline.py

Testing the code

When developing the code (or before you run the code), it is important to test the code to ensure it is working as expected. Regularly run the relevant commands.

If using a shell with make installed:

make test

# To only run unittests
make unittest

# To only run end-to-end tests
make e2e

Otherwise run pytest directly:

pytest

# To only run unittests
pytest tests/unittests 

# To only run end-to-end tests
pytest tests/e2e_tests

Project Organisation

DEVICES_RAP
├── LICENSE
├── Makefile
├── README.md
├── docs
│   ├── README.md
│   ├── docs
│   │   ├── getting-started.md
│   │   └── index.md
│   └── mkdocs.yml
├── models
├── notebooks
├── pyproject.toml
├── devices_rap
│   ├── __init__.py
│   ├── config.py
│   ├── data_in.py
│   ├── data_out.py
│   ├── pipeline.py
│   ├── utils
│   └── utils.py
├── references
├── reports
│   └── figures
├── requirements.txt
├── setup.cfg
└── tests
    ├── e2e_tests
    │   └── test_e2e_placeholder.py
    └── unittests
        ├── test_data_in.py
        ├── test_data_out.py
        ├── test_pipeline.py
        └── test_utils.py

Update the project diagram

To update the project diagram, run the following commands:
make clean 
tree
Copy and paste the resulting tree over the old tree above.

Root directory - `DEVICES_RAP`

This is the highest level of the repository containing the general files of project, such as:

LICENCE - This tells others what they can and can't do with the project code.
Makefile - Defines a series of convenient commands like make create_environment and make run_pipeline
README.md - Top level information about this project for users and developers.
pyproject.toml - Project configuration file with package metadata and configuration for tools like black
requirements.txt - The requirements file for reproducing the analysis environment.
setup.cfg - configuration file for flake8, a linting tool (makes sure the code is layed out cleanly)

The root directory also contains empty folders ready for future use:

docs - project documentation written in markdown
notebooks - exploratory notebooks (blend of code, text, and graphs). Do not use to store source code.
references - data dictionaries, manuals, and all other explanatory materials.
reports - Generated analysis as HTML, PDF, LaTeX, etc. Contains a figures directory for graphics and figures to be used in reporting.

Data folder - `data`

This folder contains saved checkpoints of data at various points in the pipeline. While we would want to process the data in one fell swoop, it makes sense to save the data. This can help cache the data for other parts of the pipeline to use, or for use to look at later and QA/debug. We will also save historical data in this folder to all us to perform backtesting (ensuring the pipeline replicates the manual process we are trying to recreate).

The sub folders in the data folder include:

external - Data from third party sources.
interim - Intermediate data that has been transformed.
processed - The final, canonical data sets ready for output or modelling.
raw - The original, immutable data dump.

Source code - `devices_rap`

This contains the source code for use in this project. This is where the code to do everything important should sit.

__init__.py - Makes devices_rap a Python module
config.py - Stores useful variable and configuration settings (Note: should not contain sensitive or secret information, this should be stored in an .env file)
pipeline.py - This is the main file containing function to run parts of pipeline (e.g. if only wanting to run one report) or all of the pipeline.

There are other files in here, however, these should include information about what they contain at the top of the file as a module docstring.

Testing framework - `tests`

This repository uses pytest for testing the pipeline. The tests for a RAP Pipeline can be divided into two types, end-to-end tests and unit tests.

End-to-End Tests - `e2e_tests`

End-to-End tests, also know as integration tests or backtests, tests large portions of the pipeline at once. This to ensure discrete parts of the pipeline can work together and that the pipeline will work under "real-world" conditions. We might cut out/mock connections to external resources (e.g. SQL Servers or distributing reports via email)

Part of the end-to-end tests will include backtests where we use historial inputs and outputs to ensure the pipeline correctly replicates the historical process.

Unit Tests - `unittest`

Unit tests, test discrete parts of the pipeline, usually a function at a time. Each function will have multiple tests, testing the expected and unexpected cases the function will encounter.

Licence

Unless stated otherwise, the codebase is released under the MIT Licence. This covers both the codebase and any sample code in the documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 161 Commits
.github/workflows		.github/workflows
devices_rap		devices_rap
docs		docs
nhsd-git-secrets		nhsd-git-secrets
notebooks		notebooks
tests		tests
.gitallowed		.gitallowed
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
amber_report_excel_config.yaml		amber_report_excel_config.yaml
conftest.py		conftest.py
git-secrets		git-secrets
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAP Devices

What is RAP?

Prerequisites

Getting Started

1 Clone the repository

2 Set up your environment

Using pip

Using `make`

3 Set up Pre-commits (Only needed for developers not users of the pipeline)

Running the code

Testing the code

Project Organisation

Update the project diagram

Root directory - `DEVICES_RAP`

Data folder - `data`

Source code - `devices_rap`

Testing framework - `tests`

End-to-End Tests - `e2e_tests`

Unit Tests - `unittest`

Licence

About

Uh oh!

Releases

Uh oh!

Languages

License

nhsengland/devices_rap

Folders and files

Latest commit

History

Repository files navigation

RAP Devices

What is RAP?

Prerequisites

Getting Started

1 Clone the repository

2 Set up your environment

Using pip

Using make

3 Set up Pre-commits (Only needed for developers not users of the pipeline)

Running the code

Testing the code

Project Organisation

Update the project diagram

Root directory - DEVICES_RAP

Data folder - data

Source code - devices_rap

Testing framework - tests

End-to-End Tests - e2e_tests

Unit Tests - unittest

Licence

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Languages

Using `make`

Root directory - `DEVICES_RAP`

Data folder - `data`

Source code - `devices_rap`

Testing framework - `tests`

End-to-End Tests - `e2e_tests`

Unit Tests - `unittest`