As an alternative to AWS, I bet we could run Ray managed on a "neo cloud" by using Skypilot. See these docs: - https://docs.skypilot.co/en/latest/running-jobs/distributed-jobs.html#executing-a-distributed-ray-program - https://docs.skypilot.co/en/latest/examples/auto-failover.html (scarce resources could be GPUs/TPUs or high count CPU machines). CC: @Michaelvll, @romilbhardwaj xref: https://github.com/cubed-dev/cubed/pull/769, https://github.com/cubed-dev/cubed/issues/488.