Support units of measurement in PyMC #7812
Replies: 3 comments 4 replies
-
Code examples? |
Beta Was this translation helpful? Give feedback.
-
Chiming in here to say that I really like the idea of explicit units. Mis-managing units is a really common error in data analysis, leading to mistakes in published papers (a couple off the top of my head: https://www.pnas.org/doi/10.1073/pnas.1900438116, https://www.sciencedirect.com/science/article/pii/S004565352402811X). |
Beta Was this translation helpful? Give feedback.
-
I could have sworn there was another discussion thread on this somewhere else started by @williambdean where I put some thoughts on this, but I can't find it now. First, I love this idea, and I would like to have it. I think there's a ton of powerful stuff we can do with automatic reparameterization if we know units and we know conversions between the units a scientists wants to "think" in and units that are naturally more compatible for sampling. These could form the basis for RV transformations (with appropriate jacobian correction), the same way we handle sampling RVs that don't like on R+. That said, I think it's something that should be developed on top of pytensor first. We really want to be able to reason graphically about metadata. I've had been thinking mostly about mathematical properties like "strictly postitive" or "real", or matrix structure like "lower triangular", "banded", "block diagonal". But I think units also fits very naturally into this structure, and it's an incredibly exciting direction to go in. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Many statistical models (especially in scientific, engineering, and health applications) are built around physical quantities that carry units (e.g., meters, seconds, kilograms). Currently, PyMC treats all variables as unitless, which may lead to misinterpretations or errors when combining data with different units or when interpreting model parameters.
Adding optional support for units would:
This could be implemented via integration with existing Python libraries like pint
In general, the idea would be to optionally specify the units. Initially it might be required to specify the units of all data and parameters and pymc could help with unit checking to avoid errors.
This would be useful as we could check unit consistency, but also make errors less likely (e.g. expressing slope priors in the wrong units).
It could be very fun to explore unit inference. For example, if you specify the units of data but not the parameters, if we are regressing
weight ~ age
where weight is in kg and age is in years, the model could infer that the slope is in units of kg/year and the intercept is in units of kg.I'll leave it there - this is an initial proposal which is intended to spark discussion.
Beta Was this translation helpful? Give feedback.
All reactions