Skip to content

Mrkomiljon/awesome-generative-ai

Repository files navigation

Awesome Generative AI Banner

🚀 Awesome Generative AI Resources

Comprehensive collection of cutting-edge Generative AI resources across Speech, Text, Image, and Multimodal domains

Downloads Stars Forks License GitHub Repository

📚 Categories🤖 Models🦾 Agents💡 Contribute


🎯 About This Repository

This is a curated and organized collection of state-of-the-art Generative AI resources, carefully compiled from various open-source projects, research papers, and community contributions. Each resource has been selected for its quality, relevance, and practical utility in the AI/ML ecosystem.

🌟 What You'll Find Here:

  • Speech Processing: STT, TTS, Voice Cloning, Emotion Recognition
  • Computer Vision: Text-to-Image, Talking Head Generation
  • Multimodal AI: Transformers, Foundation Models, APIs
  • AI Agents: Latest autonomous AI systems and frameworks
  • Datasets: High-quality training data for various AI tasks

📚 Main Categories


🤖 Transformers & Foundation Models

🎵 Audio Processing

  • Whisper - Multilingual speech recognition
  • Moonshine - Automatic speech recognition
  • Wav2Vec2 - Keyword spotting
  • Moshi - Speech-to-speech generation
  • MusicGen - Text-to-audio generation
  • Bark - Text-to-speech synthesis

👁️ Computer Vision

  • SAM - Automatic mask generation
  • DepthPro - Depth estimation
  • DINO v2 - Image classification
  • SuperGlue - Keypoint detection & matching
  • RT-DETRv2 - Object detection
  • VitPose - Pose estimation
  • OneFormer - Universal segmentation
  • VideoMAE - Video classification

🔄 Multimodal

  • Qwen2-Audio - Audio/text to text
  • LayoutLMv3 - Document understanding
  • Qwen-VL - Image/text to text
  • BLIP-2 - Image captioning
  • GOT-OCR2 - OCR document understanding
  • TAPAS - Table question answering
  • Emu3 - Unified multimodal understanding
  • Llava - Visual question answering
  • Kosmos-2 - Visual referring expression

📝 Natural Language Processing

  • ModernBERT - Masked word completion
  • Gemma - Named entity recognition
  • Mixtral - Question answering
  • BART - Summarization
  • T5 - Translation
  • Llama - Text generation
  • Qwen - Text classification

🦾 Super Agents

Agent Organization Description Links
DeepResearchAgent Skywork AI Hierarchical multi-agent framework RepoPaper
OWL CAMEL-AI.org Optimized Workforce Learning RepoPaper
Suna Kortix Open-source generalist AI agent RepoRelease
OpenManus MetaGPT Open alternative to Manus RepoRelease
Agent S² Simular Compositional generalist-specialist framework RepoPaper
UI-TARS ByteDance All-in-one multimodal AI agent stack RepoPaper

📖 Detailed Resources


💡 Contribution

Found something amazing that should be here? We'd love to include it!

🤝 How to Contribute:

  1. Open an Issue - Suggest new resources or improvements
  2. Submit a PR - Add new content or fix existing entries
  3. Share Feedback - Help us improve the organization and structure

📋 Guidelines:

  • Ensure resources are open-source or freely accessible
  • Include relevant links (GitHub, papers, demos)
  • Provide brief but informative descriptions
  • Maintain consistent formatting

⚖️ Disclaimer

This repository is a curated collection of Generative AI and LLM-related projects. All rights and credits belong to their respective authors and organizations. If you're an author and would like to suggest edits or request removal, please open an issue.


⭐ Star this repo if you find it helpful!

🔄 Updated regularly with the latest AI breakthroughs

Releases

No releases published

Packages

No packages published