Replies: 28 comments
-
So the You basically have two options (possibly others, but these are the two I can think of right now):
Hope this helps. |
Beta Was this translation helpful? Give feedback.
-
@Balandat Thanks for both suggestions! I think suggestion 1. is particularly simple and is exactly what I need. However, I guess I am stuck with the finer details of implementing it. I think what I see is that if I just subset the first output of the mean and the first
Error produced:
|
Beta Was this translation helpful? Give feedback.
-
So if you overwrite the forward method then you won't be able to fit this model properly with the GPyTorch internals. What I was getting at is a simple BoTorch model variant that leaves the actual GPyTorch inference part untouched, and simply extracts the appropriate outcome from the multi-output posterior. (I guess the term "wrapper" in this context may be debatable, but it works...)
Then, say you have a trained
The subsetting of the output is quite cumbersome right now, b/c of how GPyTorch represents multi-task mvns. I have an upstream PR that should simplify this and provide some convenience methods, I'm hoping to work on this some more soon: cornellius-gp/gpytorch#1083 |
Beta Was this translation helpful? Give feedback.
-
Ah, I see -- thanks! Just FYI, with the wrapper like your
|
Beta Was this translation helpful? Give feedback.
-
Hmmm not sure what causes this. Note that for full generality you want to pass
With this, first running your code (up to failure), then running my code, and then doing the following works for me:
|
Beta Was this translation helpful? Give feedback.
-
Thanks @Balandat . Just to clarify, I used the
for which I get the same error as in above. (edit) I think the error is due to MES() being able to take only inputs of size |
Beta Was this translation helpful? Give feedback.
-
Sorry must have pasted the wrong snippet, what I meant to say is that this works for me:
What version of botorch are you using? (Side note: I am running into some singularity issues with posterior sampling in this simple 1d example, but those are unrelated to the model wrapper). |
Beta Was this translation helpful? Give feedback.
-
I am using v I am aware of the singularity issue for the Cholesky decomposition that you're talking about. I went ahead and added a nugget of |
Beta Was this translation helpful? Give feedback.
-
I am not sure if we fixed MES since then, but the code does run as above with the wrapper model and Re singularity: I can't really point to any issues directly, but exactly what kind of model were you trying to fit in GPyTorch? Depending on the amount of data the hyperparameters may end up with degenerate values if they are non constrained / do not have a prior. Would need a repro for your model to understand exactly what's going on (if this is just about the model, then let's maybe move that discussion to the gpytorch GitHub). |
Beta Was this translation helpful? Give feedback.
-
Okay thanks, I will update BoTorch and give it a shot. Like you said, I will perhaps open a new issue on the GPyTorch github about the singularity issue. |
Beta Was this translation helpful? Give feedback.
-
@r-ashwin has this been resolved? |
Beta Was this translation helpful? Give feedback.
-
Yes it is. Thank you very much! |
Beta Was this translation helpful? Give feedback.
-
@Balandat quick question on the The reason I ask is because, in my case, what I see is that for the same training data, with the approach above, instantiating With or without gradients, the auto-covariance of the training data should be the same, unless the kernel hyperparameters are different. I understand that the kernel hyperparameters will not necessarily be the same, but when I actually checked their values, they were not too small to cause any singularity issue in the covariance matrix. I would appreciate your thoughts on this issue. A full reproducible example is attached. Thanks! |
Beta Was this translation helpful? Give feedback.
-
@Balandat Just following up on the previous question, if you had a chance to take a look at it. I noticed that the error appears for some test functions and not for some. I guess my question comes down to how can I subset the exact covariance matrix corresponding to the observations only from the joint (obs. + grad) covariance matrix? Thanks very much! |
Beta Was this translation helpful? Give feedback.
-
We are, essentially. The I'm having some trouble running the attached nb - some vars are not defined. Also the nb doesn't include the full stack trace so it's hard to figure out what's going on without actually running into this myself - mind updating the nb? My hunch is that MES is trying to instantiate a pytroch |
Beta Was this translation helpful? Give feedback.
-
Sorry about that - I have now updated the notebook. I am using v0.3.3 which is indeed using the safe_cholesky. As you said, MES is indeed evaluating the posterior covariance matrix on the candidate set (see part of the stack trace below). However,
This raises a couple of questions (1) because of the size of the candidate set potentially causing problems, is MES recommended only for low-d problems? (2) Is there a recommended size for the Also, I tried training a SingleTaskGP in the Branin with upto 5000 training points without singularity issues. Since you say the covarmatrix
|
Beta Was this translation helpful? Give feedback.
-
So the current implementation of MES uses samples at a discrete set of points to draw max posterior samples. This is indeed limited in scalability as the dimension of the problem grows. An alternative is to use Random Fourier Features (RFF)s or decoupled sampling as described e.g. in https://arxiv.org/abs/2011.04026, but that adds additional complexity since now we have to optimize each approximate posterior draw for each sample (each of which is a highly nonconvex high-dim optimization problem in its own right). We have some of this in the works and plan to addd functionality for using this in MES in the future.
I don't have a lot of specific data on this unfortunately. Generally, the denser the set of discrete points the better the quality of the max posterior samples will be. So usually one cranks this up until it either gets too slow or causes numerical issues.
Great point, this looks like it's a bug...
Well in your example with the |
Beta Was this translation helpful? Give feedback.
-
Thanks.
Hmm.. in any case it is the ill-conditioning of this |
Beta Was this translation helpful? Give feedback.
-
No it’s the full posterior covariance matrix |
Beta Was this translation helpful? Give feedback.
-
The 1st approach above (that subselects the covariance matrix), as far as I understand, does not look correct to me (where you pick every Edit: the error seems to be due to the mismatch between 3 outputs in the model and the scalarized output, but I have no idea where the fix for this would be.
|
Beta Was this translation helpful? Give feedback.
-
I don't think I understand. There is no indexing going on in the prior covariance matrix, and the indexing into the posterior covariance happens on the computed one (after all inverses/solves have already been done).
Thanks for the repro. Looking at the MES code I see there is some shape funkiness going on that I don't fully understand, will have to take a closer look. It also appears that the code is doing some repeated work that we shouldn't be doing. Let me see if I can clean this up. |
Beta Was this translation helpful? Give feedback.
-
Thanks @Balandat !
In the above lines in you wrapper class (assuming we have |
Beta Was this translation helpful? Give feedback.
-
This should choose every third column/row, starting with the first one. So this should correspond to the covariance across the observations. |
Beta Was this translation helpful? Give feedback.
-
That's the part I disagree with, but let me see if I can setup a case to verify that. |
Beta Was this translation helpful? Give feedback.
-
if you're worried about the structure of the covariance (whether it's cross-point covariances stacked for each output, or cross-output covariance stacked for each data point), this shouldn't matter here since you only evaluate at a single point |
Beta Was this translation helpful? Give feedback.
-
Okay I think having the scalarized version might still be useful, since it is more generic. So whenever you get a chance to review my repro, please let me know. Thanks very much! Much appreciated! |
Beta Was this translation helpful? Give feedback.
-
@Balandat Any luck with the repro? I guess I am a bit stuck because trying to fix the dimension mismatch leads to further errors and it is not clear how everything propagates. 😇 |
Beta Was this translation helpful? Give feedback.
-
One thing I see in your nb is that you're passing a I would suggest taking a look at https://github.com/pytorch/botorch/blob/master/botorch/acquisition/objective.py#L29. You should be able to instantiate this in the constructor and then just push the posteriors returned by all Sorry I'm pretty caught up in other stuff right now so I won't be able to test this myself until next week or so. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
If you are submitting a bug report or feature request, please use the respective
issue template.
Issue description
I am trying to use the
MultiTaskGP
model from GPyTorch with the BoTorch'sqMaxValueEntropy
. I get theUnsupportedError
because theobjective
kwarg is not supported. See error below`---------------------------------------------------------------------------
`
System Info
Please provide information about your setup, including
0.2.5
1.1.1
1.5.0+cpu
windows
Beta Was this translation helpful? Give feedback.
All reactions