Replies: 14 comments 1 reply
-
Please don't take this as an rebuttal of the idea, but this somewhat triggers my go-to question for these types of feature expansions. Could you provide some insights on why this is better suited to live inside of MDAnalysis rather than in a separate package that folks would be able to use on any timeseries extracted by a given toolkit? Possible ways to frame this questions:
|
Beta Was this translation helpful? Give feedback.
-
I think the idea is that it biomolecular simulation people need good timeseries analysis since alll of the The sorts of autocorrelation analysis in statsmodels is way to heavy and hard to use. Having something that is lighterweight but that meets specific biosimulation needs would be useful. I doubt there will be that much development after initial pass, as there's a relatively few number of methods that would be useful. I think that FFT's with autocorrelation are likely fast enough. |
Beta Was this translation helpful? Give feedback.
-
If anyone has an alternative place where it could live, suggestions as welcome! We have had a hard time figuring it out. |
Beta Was this translation helpful? Give feedback.
-
Generally "it should be in MDAnalysis because we can't maintain it" isn't the best approach to things. MDAnalysis already struggles with its own maintenance burden, so it's often hard to justify more without some kind of planned "syngergy". From what you're saying, it does sound (edit: to me) like a standalone package would be better. I would be happy to help part-maintain a standalone package such as this if I'm involved in its development. It might even be possible to put it in either the MDAnalysis or OMSF namespaces. P.S. If it helps clarify things, I personally am thinking of things like GridDataFormats or even PyEDR, that sit outside of MDAnalysis core but can potentially be dependencies if needed. That's a much more sustainable development model in my opinion. |
Beta Was this translation helpful? Give feedback.
-
Just to clarify the above is my opinion on things, I would be keen to hear from other @MDAnalysis/coredevs. It may be that there are some more direct plans for the MDAnalysis library that would benefit from this sitting in the repo. |
Beta Was this translation helpful? Give feedback.
-
From my perspective, MDAKits were really meant to be the place for tools like such as the timeseries analysis that you're proposing — for all the reasons written in the paper. Importantly, it's always possible to move functionality from a kit into the MDA core. This is easier than deciding later to remove functionality. As @IAlibay said, our release cycles are quite slow. |
Beta Was this translation helpful? Give feedback.
-
If the MDA core needs timeseries functionality as a dependency then that's a sensible thing to do. I don't know if there's anything within mda.lib (correlations?) that we would be moving into an external package as well. |
Beta Was this translation helpful? Give feedback.
-
I had completely forgotten about that code. Yes 100% moving that out to its own package with all the necessary optimizations would be amazing. In my mind, something like distopia but for timeseries analyses would be great! |
Beta Was this translation helpful? Give feedback.
-
Just to chime in a little. I think the main point is having a time series analysis module that is separate from pymbar but is closely linked to MDAnalysis would be great. There are many reasons why one might want to use a time series analysis that at the moment mean its quite clunky to do and requires the whole of pymbar as dependency. I don't think anyone wants a maintenance nightmare. I am not super familiar with the difference between MDAcore and MDAkits. MDAkits sounds like it could be a good place to go rather than core. Another motivation would be to make the code easier to use/and maintain also by us and maybe gain traction in the community for contribution. At the moment the barrier of entry for maintaining things in pymbar are quite high I would say. Thanks @orbeckst for suggesting MDAkits! |
Beta Was this translation helpful? Give feedback.
-
@IAlibay already mentioned that one of our biggest concerns is taking on maintenance. In practice it's very, very difficult to get "community members" to take up a package. You need someone who really needs it, i.e., enlightened self interest, at least that's my experience. MDAKits were meant for tools "that use MDAnalysis". Originally I thought that the proposed timeseries functionality was meant to be closely tied to MDAnalysis. However, if it's a more general package (like deeptime) that's agnostic of MDA data structures then a MDAKit would not be the proper home. Nevertheless, if MDAnalysis uses the package somewhere (e.g., for correlation functions) then we would have an interest as having it as one of our dependencies (similar to the packages that @IAlibay mentioned in #5084 (comment) such as MDAnalysis/GridDataFormats). Then there could be MDAnalysis/timeseries-analysis (... catchier name?? ;-) ). |
Beta Was this translation helpful? Give feedback.
-
I probably could have explained things better. I believe the thinking was that it would be good to have a lightweight way to perform equilibration detection, compute correlation times, and subsample MANY types of biophysics data, and it seemed to us that many other molecular simulation analysis tasks could benefit from something better - hence thinking it would fit more naturally in MDanalysis. We've identified a few things we need to do for both equilibration detection and correlation time calculation for pymbar, so we thought it was a good time to revisit this code module, which is needed for good free energy calculations, but is not at all restricted to free energy calculations. It's ALSO not restricted to molecular structures, but to any timeseries that will be used for some computation. Statsmodels and deeptime are good at calculation autocorrelations, but there seem t not be examples of what one would do with those, or to make sure they are calculated correctly. Maybe what we are doing is too simplistic for them? Welll. I did find this: Though they just use the "stop integrating when it goes through zero", which we know is not great. |
Beta Was this translation helpful? Give feedback.
-
I think we might be at a point in this conversation where we are roughly aligned on the idea / need of this timeseries tooling, but maybe not so much on what the "implementation" might look like. Might I suggest a brief call amongst interested parties to discuss this?
Just in case it helps, I didn't interpret @orbeckst's comment re: deeptime to be "deeptime should be used for timeseries analysis", it was more of an example of the "a package where the type of data it interacts with isn't directly an MDAnalysis object" (unlike MDAKits that ingest |
Beta Was this translation helpful? Give feedback.
-
(Just like Irfan said.) |
Beta Was this translation helpful? Give feedback.
-
I've sent round an availability poll (although I'm aware that @ppxasjsm is likely unavailable). Please let me know if you haven't been included but are interested in joining. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Is your feature request related to a problem?
It would be good to have the capability to do basic timeseries functions for biomolecular simulations in MD analysis, specifically equilibration detection for timeseries that should be stationary, and determining the correlation time between uncorrelated samples. Some of this functionality is in pymbar.timeseries, but it's really not the best place for it, since there are so many other data series that are not free energy calculations need timeseries functionality.
Describe the solution you'd like
We propose having a timeseries MD analysis module.
It would probably look like the timeseries module in pymbar, though that needs to have a number of upgrades. We would take the equilibration detection from https://github.com/fjclark/red, rather than what is used in pymbar. Ideally it would not operate on MD universe objects, but more just numpy arrays of observations so as to be used on all sorts of observables. We are doing some tests on better ways of calculating the autocorrelation time as well, which will be documented.
Finlay Clark, Toni Mey, and myself have volunteered to do big chunk of the work in getting this in.
Describe alternatives you've considered
The timeseries functionality in statsmodel is far too much and too complicated to use for relatively simple use cases that people running biomolecular simulations usually use.
Additional context
Beta Was this translation helpful? Give feedback.
All reactions