Adding time series functionality to MD analysis #5098

mrshirts · 2025-07-17T09:52:02Z

mrshirts
Jul 17, 2025

Is your feature request related to a problem?

It would be good to have the capability to do basic timeseries functions for biomolecular simulations in MD analysis, specifically equilibration detection for timeseries that should be stationary, and determining the correlation time between uncorrelated samples. Some of this functionality is in pymbar.timeseries, but it's really not the best place for it, since there are so many other data series that are not free energy calculations need timeseries functionality.

Describe the solution you'd like

We propose having a timeseries MD analysis module.

It would probably look like the timeseries module in pymbar, though that needs to have a number of upgrades. We would take the equilibration detection from https://github.com/fjclark/red, rather than what is used in pymbar. Ideally it would not operate on MD universe objects, but more just numpy arrays of observations so as to be used on all sorts of observables. We are doing some tests on better ways of calculating the autocorrelation time as well, which will be documented.

Finlay Clark, Toni Mey, and myself have volunteered to do big chunk of the work in getting this in.

Describe alternatives you've considered

The timeseries functionality in statsmodel is far too much and too complicated to use for relatively simple use cases that people running biomolecular simulations usually use.

Additional context

IAlibay · 2025-07-17T10:03:55Z

IAlibay
Jul 17, 2025
Maintainer

Ideally it would not operate on MD universe objects, but more just numpy arrays of observations so as to be used on all sorts of observables

Please don't take this as an rebuttal of the idea, but this somewhat triggers my go-to question for these types of feature expansions. Could you provide some insights on why this is better suited to live inside of MDAnalysis rather than in a separate package that folks would be able to use on any timeseries extracted by a given toolkit?

Possible ways to frame this questions:

What benefit does sitting in MDAnalysis yield to the timeseries tooling?
Timeseries analyses generally benefits from acceleration (e.g. Cython, JAX, CUDA, etc...). Would this timeseries module be able to live with the constraints of the current core depedencies of MDAnalysis?
Flipping it the other way around, would this timeseries tooling be able to deal with the baggage that comes with MDAnalysis? (slower releases, thigther control on how things are done by the core developers, etc...).

0 replies

mrshirts · 2025-07-17T12:26:52Z

mrshirts
Jul 17, 2025
Author

I think the idea is that it biomolecular simulation people need good timeseries analysis since alll of the
In theory, one could write a lightweight timeseries package outside of MDanalysis, but the question is, who would maintain it?

The sorts of autocorrelation analysis in statsmodels is way to heavy and hard to use. Having something that is lighterweight but that meets specific biosimulation needs would be useful.

I doubt there will be that much development after initial pass, as there's a relatively few number of methods that would be useful.

I think that FFT's with autocorrelation are likely fast enough.

0 replies

mrshirts · 2025-07-17T12:28:23Z

mrshirts
Jul 17, 2025
Author

If anyone has an alternative place where it could live, suggestions as welcome! We have had a hard time figuring it out.

0 replies

IAlibay · 2025-07-17T12:46:37Z

IAlibay
Jul 17, 2025
Maintainer

In theory, one could write a lightweight timeseries package outside of MDanalysis, but the question is, who would maintain it?

Generally "it should be in MDAnalysis because we can't maintain it" isn't the best approach to things. MDAnalysis already struggles with its own maintenance burden, so it's often hard to justify more without some kind of planned "syngergy".

From what you're saying, it does sound (edit: to me) like a standalone package would be better. I would be happy to help part-maintain a standalone package such as this if I'm involved in its development. It might even be possible to put it in either the MDAnalysis or OMSF namespaces.

P.S. If it helps clarify things, I personally am thinking of things like GridDataFormats or even PyEDR, that sit outside of MDAnalysis core but can potentially be dependencies if needed. That's a much more sustainable development model in my opinion.

0 replies

IAlibay · 2025-07-17T12:51:59Z

IAlibay
Jul 17, 2025
Maintainer

Just to clarify the above is my opinion on things, I would be keen to hear from other @MDAnalysis/coredevs. It may be that there are some more direct plans for the MDAnalysis library that would benefit from this sitting in the repo.

0 replies

orbeckst · 2025-07-17T17:14:40Z

orbeckst
Jul 17, 2025
Maintainer

From my perspective, MDAKits were really meant to be the place for tools like such as the timeseries analysis that you're proposing — for all the reasons written in the paper.

Importantly, it's always possible to move functionality from a kit into the MDA core. This is easier than deciding later to remove functionality. As @IAlibay said, our release cycles are quite slow.

0 replies

orbeckst · 2025-07-17T17:19:03Z

orbeckst
Jul 17, 2025
Maintainer

I personally am thinking of things like GridDataFormats or even PyEDR, that sit outside of MDAnalysis core but can potentially be dependencies if needed. That's a much more sustainable development model in my opinion.

If the MDA core needs timeseries functionality as a dependency then that's a sensible thing to do. I don't know if there's anything within mda.lib (correlations?) that we would be moving into an external package as well.

0 replies

IAlibay · 2025-07-17T18:01:42Z

IAlibay
Jul 17, 2025
Maintainer

MDAnalysis.lib.correlation

I had completely forgotten about that code. Yes 100% moving that out to its own package with all the necessary optimizations would be amazing.

In my mind, something like distopia but for timeseries analyses would be great!

0 replies

ppxasjsm · 2025-07-18T14:23:32Z

ppxasjsm
Jul 18, 2025

Just to chime in a little. I think the main point is having a time series analysis module that is separate from pymbar but is closely linked to MDAnalysis would be great. There are many reasons why one might want to use a time series analysis that at the moment mean its quite clunky to do and requires the whole of pymbar as dependency. I don't think anyone wants a maintenance nightmare. I am not super familiar with the difference between MDAcore and MDAkits. MDAkits sounds like it could be a good place to go rather than core. Another motivation would be to make the code easier to use/and maintain also by us and maybe gain traction in the community for contribution. At the moment the barrier of entry for maintaining things in pymbar are quite high I would say.

Thanks @orbeckst for suggesting MDAkits!

0 replies

orbeckst · 2025-07-18T17:37:39Z

orbeckst
Jul 18, 2025
Maintainer

@IAlibay already mentioned that one of our biggest concerns is taking on maintenance. In practice it's very, very difficult to get "community members" to take up a package. You need someone who really needs it, i.e., enlightened self interest, at least that's my experience.

MDAKits were meant for tools "that use MDAnalysis". Originally I thought that the proposed timeseries functionality was meant to be closely tied to MDAnalysis. However, if it's a more general package (like deeptime) that's agnostic of MDA data structures then a MDAKit would not be the proper home. Nevertheless, if MDAnalysis uses the package somewhere (e.g., for correlation functions) then we would have an interest as having it as one of our dependencies (similar to the packages that @IAlibay mentioned in #5084 (comment) such as MDAnalysis/GridDataFormats). Then there could be MDAnalysis/timeseries-analysis (... catchier name?? ;-) ).

0 replies

mrshirts · 2025-07-19T22:20:03Z

mrshirts
Jul 19, 2025
Author

I probably could have explained things better. I believe the thinking was that it would be good to have a lightweight way to perform equilibration detection, compute correlation times, and subsample MANY types of biophysics data, and it seemed to us that many other molecular simulation analysis tasks could benefit from something better - hence thinking it would fit more naturally in MDanalysis.

We've identified a few things we need to do for both equilibration detection and correlation time calculation for pymbar, so we thought it was a good time to revisit this code module, which is needed for good free energy calculations, but is not at all restricted to free energy calculations. It's ALSO not restricted to molecular structures, but to any timeseries that will be used for some computation.

Statsmodels and deeptime are good at calculation autocorrelations, but there seem t not be examples of what one would do with those, or to make sure they are calculated correctly. Maybe what we are doing is too simplistic for them?

Welll. I did find this:

https://github.com/deeptime-ml/deeptime/blob/126e443222234fefaf59c13ed3c47516fc898b79/deeptime/util/stats.py#L292

Though they just use the "stop integrating when it goes through zero", which we know is not great.

0 replies

IAlibay · 2025-07-19T22:55:46Z

IAlibay
Jul 19, 2025
Maintainer

I think we might be at a point in this conversation where we are roughly aligned on the idea / need of this timeseries tooling, but maybe not so much on what the "implementation" might look like. Might I suggest a brief call amongst interested parties to discuss this?

Statsmodels and deeptime are good at calculation autocorrelations, but there seem t not be examples of what one would do with those, or to make sure they are calculated correctly. Maybe what we are doing is too simplistic for them?

Just in case it helps, I didn't interpret @orbeckst's comment re: deeptime to be "deeptime should be used for timeseries analysis", it was more of an example of the "a package where the type of data it interacts with isn't directly an MDAnalysis object" (unlike MDAKits that ingest Universes and AtomGroups).

0 replies

orbeckst · 2025-07-20T04:35:20Z

orbeckst
Jul 20, 2025
Maintainer

(Just like Irfan said.)

0 replies

fjclark · 2025-08-05T08:17:47Z

fjclark
Aug 5, 2025

Might I suggest a brief call amongst interested parties to discuss this?

I've sent round an availability poll (although I'm aware that @ppxasjsm is likely unavailable). Please let me know if you haven't been included but are interested in joining.

1 reply

orbeckst Aug 8, 2025
Maintainer

Thanks for the great discussion today @fjclark @mrshirts @IAlibay . As discussed, I moved the issue to the discussions.

Adding time series functionality to MD analysis #5098

Uh oh!

mrshirts Jul 17, 2025

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Replies: 14 comments · 1 reply

Uh oh!

IAlibay Jul 17, 2025 Maintainer

Uh oh!

mrshirts Jul 17, 2025 Author

Uh oh!

mrshirts Jul 17, 2025 Author

Uh oh!

Uh oh!

IAlibay Jul 17, 2025 Maintainer

Uh oh!

IAlibay Jul 17, 2025 Maintainer

Uh oh!

orbeckst Jul 17, 2025 Maintainer

Uh oh!

orbeckst Jul 17, 2025 Maintainer

Uh oh!

IAlibay Jul 17, 2025 Maintainer

Uh oh!

ppxasjsm Jul 18, 2025

Uh oh!

Uh oh!

orbeckst Jul 18, 2025 Maintainer

Uh oh!

mrshirts Jul 19, 2025 Author

Uh oh!

Uh oh!

IAlibay Jul 19, 2025 Maintainer

Uh oh!

orbeckst Jul 20, 2025 Maintainer

Uh oh!

fjclark Aug 5, 2025

Uh oh!

orbeckst Aug 8, 2025 Maintainer

mrshirts
Jul 17, 2025

Replies: 14 comments 1 reply

IAlibay
Jul 17, 2025
Maintainer

mrshirts
Jul 17, 2025
Author

mrshirts
Jul 17, 2025
Author

IAlibay
Jul 17, 2025
Maintainer

IAlibay
Jul 17, 2025
Maintainer

orbeckst
Jul 17, 2025
Maintainer

orbeckst
Jul 17, 2025
Maintainer

IAlibay
Jul 17, 2025
Maintainer

ppxasjsm
Jul 18, 2025

orbeckst
Jul 18, 2025
Maintainer

mrshirts
Jul 19, 2025
Author

IAlibay
Jul 19, 2025
Maintainer

orbeckst
Jul 20, 2025
Maintainer

fjclark
Aug 5, 2025

orbeckst Aug 8, 2025
Maintainer