Implement continuous cache LSTM model of Graves et al.

See section 4.3.2 of project proposal (https://ai-on.org/pdf/larochelle-few-shot-distribution-learning.pdf). This might require a bit of adaptation from the original continuous cache model, but not too much I think.