Korbinian Pöppel1,2, Richard Freinschlag1, Thomas Schmied1, Wei Lin1,, Sepp Hochreiter1,2
1ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria
2NXAI GmbH, Linz, Austria
This repository contains the pLSTM (parallelizable Linear Source Transition Mark networks) core implementations in flax.nnx
, flax.linen
and torch
.
pLSTMs inherit ideas from Multi-Dimensional RNNs Graves et al. 2007 and linear RNNs.
With the linearity, and changing the gating structure to the Source, Transition and Mark gates, we introduce a multi-dimensional parallel associative scan, on general directed acyclic graphs (DAGs) for parallelization.
pLSTMs also solve the vanishing/exploding gradient/activation problem on DAGs, similar to how the LSTM tackled them for RNNs on sequences.
All layers within pLSTM can be configured using the config classes in plstm.config
composed by way of compoconf
library.
pLSTM offers implementations across multiple popular deep learning frameworks:
nnx
linen
torch
Please note that plstm_graph
is currently only implemented in torch
.
MD-RNNs:
- Graves et al. 2007: Multi-Dimensional Recurrent Neural Networks http://arxiv.org/abs/0705.2011
linear RNNs (among lots of others):
- Schlag et al. 2021: Linear Transformers are Secretly Fast Weight Programmers http://arxiv.org/abs/2102.11174
- Orvieto et al. 2023: Resurrecting Recurrent Neural Networks for Long Sequences http://arxiv.org/abs/2303.06349
- Gu and Dao 2023: Mamba: Linear-time sequence modeling with selective state spaces http://arxiv.org/abs/2312.00752
- Yang et el. 2023: Gated Linear Attention Transformers with Hardware-Efficient Training http://arxiv.org/abs/2312.06635
- Beck et al. 2024: xLSTM: Extended Long Short Term Memory http://arxiv.org/abs/2405.04517
State-Tracking:
- Merrill et al. 2024: The Illusion of State in State Space Models https://arxiv.org/abs/2404.08819
MIT License
If you use this dataset in your research, please cite:
@misc{poppel_plstm_2025,
title = {{pLSTM}: parallelizable {Linear} {Source} {Transition} {Mark} networks},
shorttitle = {{pLSTM}},
url = {http://arxiv.org/abs/2506.11997},
doi = {10.48550/arXiv.2506.11997},
urldate = {2025-06-16},
publisher = {arXiv},
author = {Pöppel, Korbinian and Freinschlag, Richard and Schmied, Thomas and Lin, Wei and Hochreiter, Sepp},
month = jun,
year = {2025},
note = {arXiv:2506.11997 [cs]},
keywords = {Computer Science - Machine Learning, Statistics - Machine Learning},
}