Time series reconstruction with ModelList (Multi-Output) GP Regression #2357
-
I am trying to reconstruct a time series (OMP) from another as a proxy (ERA5) and adapting this example. Supposedly, the model should fill in the missing values at the instants where there are values in the proxy series, but that's not what I get: Do I have to make any changes in the example code ? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 6 replies
-
Do not know much about your domain specifics but you usually need to represent time in a variety of ways in order to encode enough structure. Otherwise, you'd just be using GPs to "smooth" the data. If you're representing time as a monotonically-increasing datetime object (i.e., Unix time), the GP will just revert to its mean function in the absence of any nearby data. I recommend one-hot encoding various temporal dimensions like the day-of-week (0-6), hour-of-day (0-23), and whatever else you think would capture variability for your domain. Then, you can use an ARD kernel to fit each input dimension w.r.t. to your dependent variable. Alternatively, you can try messing around with the PeriodicKernel to see if you can capture any periodic trends in your data, but I recommend doing this only with a monotonically-increasing variable like Unix time. I also don't see a clear periodic trend in your data, so this may not be the best approach. The kernel cookbook (and David Duvenaud's dissertation) is an easy read if you want to better understand how to use and combine different kernels to achieve your goals in time-series analysis. |
Beta Was this translation helpful? Give feedback.
-
Hi @gpleiss Yes ! I'm just trying the Spectra Delta Kernel,:
but got the following error: I don't understand why train_x needs to have at least two dimensions. |
Beta Was this translation helpful? Give feedback.
-
Thank you @gpleiss. It works, but is still too softened. Needs further adjustment: |
Beta Was this translation helpful? Give feedback.
Do not know much about your domain specifics but you usually need to represent time in a variety of ways in order to encode enough structure. Otherwise, you'd just be using GPs to "smooth" the data. If you're representing time as a monotonically-increasing datetime object (i.e., Unix time), the GP will just revert to its mean function in the absence of any nearby data. I recommend one-hot encoding various temporal dimensions like the day-of-week (0-6), hour-of-day (0-23), and whatever else you think would capture variability for your domain. Then, you can use an ARD kernel to fit each input dimension w.r.t. to your dependent variable.
Alternatively, you can try messing around with the Per…