π A curated collection of resources related to Multimodal Learning and Multimodal Foundation Models, including applications in healthcare, vision-language modeling, tutorials, and open-source tools.
- π§ Tutorials & Workshops
- π Research Labs
- π Applications in Healthcare
- π§° Code & Tools
- π Suggested Papers
- π€ Contributing
- π« CMU Multicomp Lab (Homepage)
- π¬ CMU Multicomp Research Projects
- π§© CMU Multimodal Machine Learning Portal
- π» CMU Multicomp GitHub
- π§ Google Research β Multimodal Medical AI
- π©Ί Owkin: AβZ of AI in Healthcare
- π¬ Owkin β Multimodal Data in Healthcare
- π CMU MultiComp GitHub
- π€ OpenFlamingo
- π§ LLaVA (Large Language and Vision Assistant)
- π¦ BLIP / BLIP-2 (Salesforce)
- π HuggingFace Transformers β Multimodal Support
Title | Year | Link |
---|---|---|
Multimodal Transformers: A Survey | 2022 | arXiv |
A Survey on Multimodal Learning | 2023 | arXiv |
A Survey on Multimodal Foundation Models | 2023 | arXiv |
Vision-Language Pre-training: A Survey | 2022 | arXiv |
Self-Supervised Multimodal Learning: A Survey | 2021 | arXiv |
Contributions are welcome! If you know of other great multimodal resources, please feel free to open a pull request.
This repository is licensed under the MIT License.