Description
Prompted by discussions on Discourse and Slack, I think we sorely need a roadmap issue for FluxML. This issue will serve as that roadmap; the hope is that we build it together. This roadmap is the BDFL (and bors
).
Some things are technical, some things are organizational. Feel free to suggest more tasks. If you think something should not be on the list, then suggest that too.
Governance
We don't seem to have a clear governance model. Officially, we follow ColPrac, and I think if we lean into it, it can be a sustainable model for the org. The contributing guide seems like the correct place to document this.
- More comprehensive contributing guide (Contributor's Guide draft #1824 seems good)
- List the "core maintainers" on the README so it's clear who to ping on Slack/Discourse?
- Agree upon some way to move passed blockers (e.g. a simple majority of maintainers are in favor of X)
- Bi-weekly calls are actually used for us to come together and work through issues (a stronger culture of communicating frequently would be nice too)
- Organize our issues and PRs so it is easier to contribute
Technical
This isn't meant to be a comprehensive list, but it should detail our top priorities for what to work on when we aren't squashing bugs.
- Make Flux AD-agnostic Chris's post if you need convincing; or search for "Zygote" and "bug" on any forum
- Better CI (i.e. benchmarking Metalhead.jl, downstream package testing, JuliaFormatter) We need to be able to trust the tools in order for PRs to be merged quickly and bugs to be caught before releases. The model-zoo frequently goes stale. Maybe it should be repurposed to be a set of examples + regression tests. This doesn't need to be a fancy solution with a webpage. Just a bot that responds to Github PR comments would be a huge improvement.
- Documentation overhaul We need to go through our docs from cover to cover and see if it makes sense and flows. Too many users walk away cause the solution to their problem is buried in bad documentation structure.
- Multi-GPU training There are a couple options right now, but we need to think about how we wrap it up in a user-facing API. Probably in a separate package from Flux itself.
- Pre-trained models This will likely require the previous point, but ONNX.jl is another promising option here.
- Make Flux explicit gradient-first This is a broad goal to capture many sub-tasks like getting Optimisers.jl to the state of release, figuring out tied weights, getting ready for Diffractor.jl, etc.
- Think really hard about RNNs
- Better benchmarking and performance improvements on baseline architectures This is quite a large and multi-faceted task, but things like very high memory consumption from which we suffer in some basic setting AFAIK, are showstoppers for training large vision and NLP models.
- Gradient correctness We should track down and fix all situations where Zygote silently yields incorrect gradients.
I explicitly started with a short list, because some of these tasks like CI need to be dealt with first before we can reliably tackle anything else.
I'd also ask that we try to be honest and constructive here. Nothing above is solved totally, so commenting what's left is more helpful than "oh that's not an issue because X does 70% of it and the rest is easy." If the rest is easy, then open a linked issue detailing what the rest is so that someone can tackle it.
Lastly, let's limit comments to Flux maintainers for the most part. Anyone is welcome to suggest stuff that we are missing of course, but it would be good to pre-start the list with comments from folks who have been working on the packages for some time.
@DhairyaLGandhi @CarloLucibello @mcabbott @ToucheSir @lorenzoh @ChrisRackauckas @logankilpatrick