References: - https://arxiv.org/pdf/1711.11053.pdf - [MXNet implementation](https://github.com/awslabs/gluon-ts/tree/4126386da1c71a371a77fe824e5092645dc2d2db/src/gluonts/model/seq2seq) (in my opinion this is a little too dispersive) Milestones for this one: - [ ] Implementing core modules (with option for different encoders, but more compact than the MXNet one) - [ ] Implementing LightningModule to handle training: this should take care of the "forking" logic described in the paper - [ ] Estimator wrapper