Comprehensive collection of cutting-edge Generative AI resources across Speech, Text, Image, and Multimodal domains
📚 Categories • 🤖 Models • 🦾 Agents • 💡 Contribute
This is a curated and organized collection of state-of-the-art Generative AI resources, carefully compiled from various open-source projects, research papers, and community contributions. Each resource has been selected for its quality, relevance, and practical utility in the AI/ML ecosystem.
- Speech Processing: STT, TTS, Voice Cloning, Emotion Recognition
- Computer Vision: Text-to-Image, Talking Head Generation
- Multimodal AI: Transformers, Foundation Models, APIs
- AI Agents: Latest autonomous AI systems and frameworks
- Datasets: High-quality training data for various AI tasks
🎙️ Speech | 🖼️ Vision | 🤖 AI & APIs | 📊 Data |
---|---|---|---|
STT Datasets | Text-to-Image | GenAI APIs | STT Models |
STT Models | Talking Head | Transformers | TTS Models |
TTS Models | Voice Cloning | ||
Voice Cloning | Emotion Recognition | ||
Emotion Recognition |
- Whisper - Multilingual speech recognition
- Moonshine - Automatic speech recognition
- Wav2Vec2 - Keyword spotting
- Moshi - Speech-to-speech generation
- MusicGen - Text-to-audio generation
- Bark - Text-to-speech synthesis
- SAM - Automatic mask generation
- DepthPro - Depth estimation
- DINO v2 - Image classification
- SuperGlue - Keypoint detection & matching
- RT-DETRv2 - Object detection
- VitPose - Pose estimation
- OneFormer - Universal segmentation
- VideoMAE - Video classification
- Qwen2-Audio - Audio/text to text
- LayoutLMv3 - Document understanding
- Qwen-VL - Image/text to text
- BLIP-2 - Image captioning
- GOT-OCR2 - OCR document understanding
- TAPAS - Table question answering
- Emu3 - Unified multimodal understanding
- Llava - Visual question answering
- Kosmos-2 - Visual referring expression
- ModernBERT - Masked word completion
- Gemma - Named entity recognition
- Mixtral - Question answering
- BART - Summarization
- T5 - Translation
- Llama - Text generation
- Qwen - Text classification
Agent | Organization | Description | Links |
---|---|---|---|
DeepResearchAgent | Skywork AI | Hierarchical multi-agent framework | Repo • Paper |
OWL | CAMEL-AI.org | Optimized Workforce Learning | Repo • Paper |
Suna | Kortix | Open-source generalist AI agent | Repo • Release |
OpenManus | MetaGPT | Open alternative to Manus | Repo • Release |
Agent S² | Simular | Compositional generalist-specialist framework | Repo • Paper |
UI-TARS | ByteDance | All-in-one multimodal AI agent stack | Repo • Paper |
- 🧩 Comprehensive Open-Source Projects - Extended collection with detailed descriptions and implementation guides
Found something amazing that should be here? We'd love to include it!
- Open an Issue - Suggest new resources or improvements
- Submit a PR - Add new content or fix existing entries
- Share Feedback - Help us improve the organization and structure
- Ensure resources are open-source or freely accessible
- Include relevant links (GitHub, papers, demos)
- Provide brief but informative descriptions
- Maintain consistent formatting
This repository is a curated collection of Generative AI and LLM-related projects. All rights and credits belong to their respective authors and organizations. If you're an author and would like to suggest edits or request removal, please open an issue.
⭐ Star this repo if you find it helpful!
🔄 Updated regularly with the latest AI breakthroughs