Investigating Privacy Concerns and Mitigations for Healthcare Language and Foundation Models Extended

NHS England Data Science Team - PhD Internship Project

About the Project

This repository holds code for the work continuing the "Investigating Privacy Concerns and Mitigations for Healthcare Language and Foundation Models" project (priv-lm-health).

⚠️ This repository is experimental and thus models generated and attacked using this repository are not suitable to deploy into a production environment without further testing and evaluation. ⚠️

This work was conducted as part of an NHS England Data Science PhD Internship project by Jenny Chim between July and December 2024.

Link to original project proposal.

Note: Only public or fake data are shared in this repository.

Project Stucture

This repository contains code to:
- Construct the instruction-tuning dataset (data_processing/)
- Run memorisation experiments (memorisation/)
- Run experiments to assess privacy in clinical documentation (privacy_in_context/)
- (see Usage below for more information)
The accompanying report is also available in the reports folder
More information about the code usage can be found in each sub-directory.

Getting Started

Installation

To get a local copy up and running follow these simple steps.

To clone the repo:

git clone git@github.com:nhsengland/pvt_p71_privLMextended.git

Each sub-directory has its own packages, detailed in a requirements file. To create a suitable environment, change into the sub-directory of interest, then run:

python -m venv <env_name>
source <env_name>/bin/activate
pip install -r requirements.txt

While part of the model training code shows experiments with larger models (e.g. meta-llama/Llama-3.1-70B), the code base is designed to work with compact models as well. Substitute the model names with an alternative hosted on the Hugging Face hub, e.g. HuggingFaceTB/SmolLM2-135M-Instruct.

Usage

Refer to sub-directories for work package specific instructions.

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

See CONTRIBUTING.md for detailed guidance.

License

Unless stated otherwise, the codebase is released under the MIT Licence. This covers both the codebase and any sample code in the documentation.

See LICENSE for more information.

Contact

To find out more about the NHS England Data Science visit our project website or get in touch at datascience@nhs.net.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
data_processing		data_processing
memorisation		memorisation
misc/attacker_LM		misc/attacker_LM
privacy_in_context		privacy_in_context
reports		reports
utils		utils
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CODUCT.md		CODE_OF_CODUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENCE		LICENCE
OPEN_CODE_CHECKLIST.md		OPEN_CODE_CHECKLIST.md
README.md		README.md
__init__.py		__init__.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Investigating Privacy Concerns and Mitigations for Healthcare Language and Foundation Models Extended

NHS England Data Science Team - PhD Internship Project

About the Project

Project Stucture

Getting Started

Installation

Usage

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

nhsengland/priv-lm-health-extended

Folders and files

Latest commit

History

Repository files navigation

Investigating Privacy Concerns and Mitigations for Healthcare Language and Foundation Models Extended

NHS England Data Science Team - PhD Internship Project

About the Project

Project Stucture

Getting Started

Installation

Usage

Contributing

License

Contact

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages