🚀 Awesome Generative AI Resources

Comprehensive collection of cutting-edge Generative AI resources across Speech, Text, Image, and Multimodal domains

📚 Categories • 🤖 Models • 🦾 Agents • 💡 Contribute

🎯 About This Repository

This is a curated and organized collection of state-of-the-art Generative AI resources, carefully compiled from various open-source projects, research papers, and community contributions. Each resource has been selected for its quality, relevance, and practical utility in the AI/ML ecosystem.

🌟 What You'll Find Here:

Speech Processing: STT, TTS, Voice Cloning, Emotion Recognition
Computer Vision: Text-to-Image, Talking Head Generation
Multimodal AI: Transformers, Foundation Models, APIs
AI Agents: Latest autonomous AI systems and frameworks
Datasets: High-quality training data for various AI tasks

📚 Main Categories

🎙️ Speech	🖼️ Vision	🤖 AI & APIs	📊 Data
STT Datasets	Text-to-Image	GenAI APIs	STT Models
STT Models	Talking Head	Transformers	TTS Models
TTS Models			Voice Cloning
Voice Cloning			Emotion Recognition
Emotion Recognition

🤖 Transformers & Foundation Models

🎵 Audio Processing

Whisper - Multilingual speech recognition
Moonshine - Automatic speech recognition
Wav2Vec2 - Keyword spotting
Moshi - Speech-to-speech generation
MusicGen - Text-to-audio generation
Bark - Text-to-speech synthesis

👁️ Computer Vision

SAM - Automatic mask generation
DepthPro - Depth estimation
DINO v2 - Image classification
SuperGlue - Keypoint detection & matching
RT-DETRv2 - Object detection
VitPose - Pose estimation
OneFormer - Universal segmentation
VideoMAE - Video classification

🔄 Multimodal

Qwen2-Audio - Audio/text to text
LayoutLMv3 - Document understanding
Qwen-VL - Image/text to text
BLIP-2 - Image captioning
GOT-OCR2 - OCR document understanding
TAPAS - Table question answering
Emu3 - Unified multimodal understanding
Llava - Visual question answering
Kosmos-2 - Visual referring expression

📝 Natural Language Processing

ModernBERT - Masked word completion
Gemma - Named entity recognition
Mixtral - Question answering
BART - Summarization
T5 - Translation
Llama - Text generation
Qwen - Text classification

🦾 Super Agents

Agent	Organization	Description	Links
DeepResearchAgent	Skywork AI	Hierarchical multi-agent framework	Repo • Paper
OWL	CAMEL-AI.org	Optimized Workforce Learning	Repo • Paper
Suna	Kortix	Open-source generalist AI agent	Repo • Release
OpenManus	MetaGPT	Open alternative to Manus	Repo • Release
Agent S²	Simular	Compositional generalist-specialist framework	Repo • Paper
UI-TARS	ByteDance	All-in-one multimodal AI agent stack	Repo • Paper

📖 Detailed Resources

🧩 Comprehensive Open-Source Projects - Extended collection with detailed descriptions and implementation guides

💡 Contribution

Found something amazing that should be here? We'd love to include it!

🤝 How to Contribute:

Open an Issue - Suggest new resources or improvements
Submit a PR - Add new content or fix existing entries
Share Feedback - Help us improve the organization and structure

📋 Guidelines:

Ensure resources are open-source or freely accessible
Include relevant links (GitHub, papers, demos)
Provide brief but informative descriptions
Maintain consistent formatting

⚖️ Disclaimer

This repository is a curated collection of Generative AI and LLM-related projects. All rights and credits belong to their respective authors and organizations. If you're an author and would like to suggest edits or request removal, please open an issue.

⭐ Star this repo if you find it helpful!

🔄 Updated regularly with the latest AI breakthroughs

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md
emotion-recognition.md		emotion-recognition.md
genai-apis.md		genai-apis.md
more_detailed.md		more_detailed.md
stt-datasets.md		stt-datasets.md
stt-models.md		stt-models.md
talking-head.md		talking-head.md
text-to-image.md		text-to-image.md
transformers.md		transformers.md
tts.md		tts.md
voice-cloning.md		voice-cloning.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Awesome Generative AI Resources

🎯 About This Repository

🌟 What You'll Find Here:

📚 Main Categories

🤖 Transformers & Foundation Models

🎵 Audio Processing

👁️ Computer Vision

🔄 Multimodal

📝 Natural Language Processing

🦾 Super Agents

📖 Detailed Resources

💡 Contribution

🤝 How to Contribute:

📋 Guidelines:

⚖️ Disclaimer

About

Uh oh!

Releases

Packages

License

Mrkomiljon/awesome-generative-ai

Folders and files

Latest commit

History

Repository files navigation

🚀 Awesome Generative AI Resources

🎯 About This Repository

🌟 What You'll Find Here:

📚 Main Categories

🤖 Transformers & Foundation Models

🎵 Audio Processing

👁️ Computer Vision

🔄 Multimodal

📝 Natural Language Processing

🦾 Super Agents

📖 Detailed Resources

💡 Contribution

🤝 How to Contribute:

📋 Guidelines:

⚖️ Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages