🏡 Air source heat pump cost estimator 🏡

The asf_hp_cost_estimator_model repository contains the code to model and predict the cost of an air source heat pump:

in residential properties
installed as part of a retrofit (heat pumps installed in new builds or as part of a cluster of installations are excluded)
in houses or bungalows (flats are excluded from the analysis)
houses with 2 or more habitable rooms and with a floor area between 20 and 500 m2
for Scotland, Wales and English regions

🚀 Modelling the cost of an air source heat pump

Quantile regression gradient boosting regressor models are fitted to create prediction intervals for the cost of an air source heat pump (80% confidence intervals, by fitting models on the 10th and 90th percentile).

The target variable is the overall cost of installation and the predictors include:

Total floor area
Number of habitable rooms (2 to 8+)
Number of days between 2007 and HP installation (as a measure of time)
Property built form: detached, semi detached, mid terrace and end terrace
Property type: bungalow and house
Construction age band: pre-1929, 1930-1965, 1966-1982, 1983-2006 and 2007 onwards
Region: Scotland, Wales, London, East Midlands, West Midlands, East of England, South East, South West, North West, North East and Yorkshire and the Humber.

🆕 Latest data

The latest model in use by the cost estimator tool was trained on data up to Q1 2025 (March 2025).

🧩 Data sources

Microgeneration Certification Scheme (MCS) data on heat pump installations

This is a subset of the MCS Installations Database (MID), and contains one record for each MCS certificate associated with a heat pump installation. The dataset contains records of both domestic and non-domestic air source, water/ground source and other types of heat pump installations. Features in the dataset include:

information about the property: address, heat and water demand
characteristics of the heat pump installed: type, model, manufacturer, capacity, flow temperature, SCOP
information about the installation: commissioning date, overall cost of installation

The overall installation cost is the full cost of installation including materials and labour, not just the cost of the heat pump unit. To note that this cost is the cost prior to deducting government grants such as the Boiler Upgrade Scheme (BUS) grant or Home Energy Scotland (HES) grant.

MID data is used with permission from MCS and subject to the conditions of a data sharing agreement.

Energy Performance Certificates (EPC) register data about homes

Property data comes from England and Wales and Scotland's EPC register. The EPC register provides data on building characteristics and energy efficiency measures, including:

Property address and other location information;
Property characteristics such as number of rooms, property type and built form.
Heating system(s) installed;
Energy efficiency ratings.

The EPC Register datasets are open-source and accessible to everyone.

Additional data

Location lookups

The following location lookups are used:

Inflation and price indices

The "CPI INDEX 05.3 : Household appliances, fitting and repairs 2015=100" from the inflation and price indices data was sourced from the ONS

⚒️ Data processing & joining

The underlying dataset used to model the cost of an air source installation is the MCS installations dataset enhanced with EPC information about properties. MCS and EPC datasets are cleaned and preprocessed before being joined. Installations without EPC property information are removed from the analysis. The code for preprocessing and joining MCS to EPC is available in the asf_core_data GitHub repository.

🗂️ Repository structure

The repository structure and key scripts are highlighted below:

asf_hp_cost_estimator_model
├───config/
│    Configuration scripts
│    ├─ base.yaml
├───getters/
│    Scripts with functions to load data from S3
│    ├─ data_getters.py
├───pipeline/
│    Subdirs with scripts to process data and produce outputs
│    ├─ data_processing/ - further data processing prior to modelling
|    |   ├─ process_installations_data.py
│    ├─ model_training/ - model training scripts
|    |    |- fit_cost_prediction_intervals.py
│    ├─ model_evaluation/ - scripts for model evaluation
|    |    |- cross_validation.py
│    ├─ hyperparameter_tuning/ - scripts for hyperparameter tuning
|    |    |- tune_hyperparameters.py
│    ├─ README.md - instructions to run the different pipelines
├───utils/
│    Utils for plotting and evaluation
│    ├─ plotting_utils.py
│    ├─ model_evaluation_utils.py
├───notebooks/
│    Notebooks for data and model exploration

📋 Instructions for retraining the model

These are instructions for data scientists at Nesta.

When new quarter data is made available you can follow the steps to retrain the cost models (after the data has been processed with asf_core_data).

Open an issue in this GitHub repository, such as "Retrain model with QX 202Y data"
Update asf_hp_cost_estimator_model/config/base.yaml
- cpi_reference_year: update the CPI reference year accordingly
- Location data sources: review and update location sources as required
- mcs_epc_filename_date: update with newest date of MCS-EPC data processing
Re-run hyperparameter tuning pipeline:

Run python asf_hp_cost_estimator_model/pipeline/hyperparameter_tuning/tune_hyperparameters.py
Take note of the hyperparameters logged

Update asf_hp_cost_estimator_model/config/base.yaml after tuning hyperparameters:
- change hyper_parameters according to the hyperparameters logged in the previous step
Re-run cross-validation pipeline:
- Run python asf_hp_cost_estimator_model/pipeline/model_evaluation/cross_validation.py
- Assess results logged
Retrain models:
- Run python asf_hp_cost_estimator_model/pipeline/model_training/fit_cost_prediction_intervals.py
- Models are saved to S3
Update sections "🆕 Latest data" and "🧩 Data sources" of this REAMDE.md to reflect changes.
Let the tech/design team know that the model has been updated, so that they can restart the API.

⚙️ Setup

Meet the data science cookiecutter requirements, in brief:
- Install: direnv and conda
Run make install to configure the development environment:
- Setup the conda environment
- Configure pre-commit

📢 Contributor guidelines

Technical and working style guidelines

Project based on Nesta's data science project template (Read the docs here).

Name		Name	Last commit message	Last commit date
Latest commit History 192 Commits
.cookiecutter		.cookiecutter
.github		.github
asf_hp_cost_estimator_model		asf_hp_cost_estimator_model
docs		docs
outputs		outputs
.envrc		.envrc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🏡 Air source heat pump cost estimator 🏡

🚀 Modelling the cost of an air source heat pump

🆕 Latest data

🧩 Data sources

Microgeneration Certification Scheme (MCS) data on heat pump installations

Energy Performance Certificates (EPC) register data about homes

Additional data

Location lookups

Inflation and price indices

⚒️ Data processing & joining

🗂️ Repository structure

📋 Instructions for retraining the model

⚙️ Setup

📢 Contributor guidelines

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

nestauk/asf_hp_cost_estimator_model

Folders and files

Latest commit

History

Repository files navigation

🏡 Air source heat pump cost estimator 🏡

🚀 Modelling the cost of an air source heat pump

🆕 Latest data

🧩 Data sources

Microgeneration Certification Scheme (MCS) data on heat pump installations

Energy Performance Certificates (EPC) register data about homes

Additional data

Location lookups

Inflation and price indices

⚒️ Data processing & joining

🗂️ Repository structure

📋 Instructions for retraining the model

⚙️ Setup

📢 Contributor guidelines

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages