simple in principle, just engineering https://github.com/FLAIROx/JaxMARL might need some extra work for representing sparse action transitions