Support units of measurement in PyMC #7812

drbenvincent · 2025-06-08T12:24:01Z

drbenvincent
Jun 8, 2025
Collaborator

Many statistical models (especially in scientific, engineering, and health applications) are built around physical quantities that carry units (e.g., meters, seconds, kilograms). Currently, PyMC treats all variables as unitless, which may lead to misinterpretations or errors when combining data with different units or when interpreting model parameters.

Adding optional support for units would:

Improve model readability and transparency
Enable automated unit-checking to catch errors (e.g., adding kg to m)
Facilitate interpretation of parameter estimates and priors

This could be implemented via integration with existing Python libraries like pint

In general, the idea would be to optionally specify the units. Initially it might be required to specify the units of all data and parameters and pymc could help with unit checking to avoid errors.

This would be useful as we could check unit consistency, but also make errors less likely (e.g. expressing slope priors in the wrong units).

It could be very fun to explore unit inference. For example, if you specify the units of data but not the parameters, if we are regressing weight ~ age where weight is in kg and age is in years, the model could infer that the slope is in units of kg/year and the intercept is in units of kg.

I'll leave it there - this is an initial proposal which is intended to spark discussion.

ricardoV94 · 2025-06-08T12:42:51Z

ricardoV94
Jun 8, 2025
Maintainer

Code examples?

2 replies

drbenvincent Jun 8, 2025
Collaborator Author

Something vaguely like this?

import pint
import pymc as pm

ureg = pint.UnitRegistry()

age = pm.Data("age", [10, 20, 30, 40] * ureg.year)
weight = pm.Data("weight", [50, 60, 65, 67] * ureg.kg)

with pm.Model() as model:
    intercept = pm.Normal("intercept", mu=1.2 * ureg.kg, sigma=0.2 * ureg.kg)
    beta = pm.Normal("beta", mu=0.5 * ureg.kg / ureg.year, sigma=0.1 * ureg.kg / ureg.year) 
    mu = pm.Deterministic("mu", intercept + beta * age)  # PyMC infers mu should be in kg
    pm.Normal("obs", mu=mu, sigma=0.5 * ureg.kg, observed=weight)

PyMC does unit checks and throws errors if there are incompatibilities
PyMC optionally infers units of any nodes where units are not provided, or throws an error if it is not possible, asking for units of more variables.
In theory I guess you could allow the intercept's mu to be provided in kg and the sigma in another weight unit and auto-convert, but perhaps emit a warning.
Units would be incorporated in the idata

Armavica Jun 13, 2025
Maintainer

A few thoughts:

"PyMC optionally infers units of any nodes where units are not provided": I am not sure how this would work, how would you distinguish a dimensionless variable and a variable with unspecified units?
"PyMC infers mu should be in kg": would there be a way to impose that? something like pm.Deterministic("mu", [...], unit=ureg.kg) that would throw an error if the expression is incompatible?
Perhaps it could also allow unit=ureg.gram, which is compatible because it's also a mass, and make the conversion transparently?

ErikRingen · 2025-06-13T10:51:42Z

ErikRingen
Jun 13, 2025

Chiming in here to say that I really like the idea of explicit units. Mis-managing units is a really common error in data analysis, leading to mistakes in published papers (a couple off the top of my head: https://www.pnas.org/doi/10.1073/pnas.1900438116, https://www.sciencedirect.com/science/article/pii/S004565352402811X).

0 replies

jessegrabowski · 2025-06-13T14:29:27Z

jessegrabowski
Jun 13, 2025
Maintainer

I could have sworn there was another discussion thread on this somewhere else started by @williambdean where I put some thoughts on this, but I can't find it now.

First, I love this idea, and I would like to have it. I think there's a ton of powerful stuff we can do with automatic reparameterization if we know units and we know conversions between the units a scientists wants to "think" in and units that are naturally more compatible for sampling. These could form the basis for RV transformations (with appropriate jacobian correction), the same way we handle sampling RVs that don't like on R+.

That said, I think it's something that should be developed on top of pytensor first. We really want to be able to reason graphically about metadata. I've had been thinking mostly about mathematical properties like "strictly postitive" or "real", or matrix structure like "lower triangular", "banded", "block diagonal". But I think units also fits very naturally into this structure, and it's an incredibly exciting direction to go in.

2 replies

williambdean Jun 13, 2025

This one? arviz-devs/preliz#674

jessegrabowski Jun 13, 2025
Maintainer

Yes exactly!

Uh oh!

Support units of measurement in PyMC #7812

Uh oh!

Uh oh!

drbenvincent Jun 8, 2025 Collaborator

Replies: 3 comments · 4 replies

Uh oh!

ricardoV94 Jun 8, 2025 Maintainer

Uh oh!

Uh oh!

drbenvincent Jun 8, 2025 Collaborator Author

Uh oh!

Armavica Jun 13, 2025 Maintainer

Uh oh!

ErikRingen Jun 13, 2025

Uh oh!

jessegrabowski Jun 13, 2025 Maintainer

Uh oh!

williambdean Jun 13, 2025

Uh oh!

jessegrabowski Jun 13, 2025 Maintainer

drbenvincent
Jun 8, 2025
Collaborator

Replies: 3 comments 4 replies

ricardoV94
Jun 8, 2025
Maintainer

drbenvincent Jun 8, 2025
Collaborator Author

Armavica Jun 13, 2025
Maintainer

ErikRingen
Jun 13, 2025

jessegrabowski
Jun 13, 2025
Maintainer

jessegrabowski Jun 13, 2025
Maintainer