You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, it seems that your models have 2 input branches, one for the words and one for the image descriptor. Instead, in the paper the input is the same. That is, in the first time the image descriptor is fed, then the (embedded) words are fed to the LSTM.