Applying the Reinformer architecture to minimal RNNs

Hi there,
I have a repository, where we have been trying to get the Reinformer to work with [a minimal Recurrent Network.](https://arxiv.org/abs/2410.01201) 

However, both on Hopper and medium-expert datasets we just cannot reach anywhere near the levels of the reinformer (or the original minimal RNNs for that matter)

I was wondering if you might have an idea as to what could be wrong? I have implemented dropout in the minRNN similar to what you have done for MultiHeadAttention, with some slight improvements, but we are still a ways from the results I expected based on the Reinformer and MinRNN baselines.

If you are curious, the repo can be found [here](https://github.com/WhyAreYouJay/Re-Minimal-RNN/tree/main)

Either way, cool repo!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Applying the Reinformer architecture to minimal RNNs #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Applying the Reinformer architecture to minimal RNNs #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions