AniMind is an advanced AI-powered application that transforms text prompts into complete stories with matching video animations and narration. It combines multiple AI models to create a seamless storytelling experience!
- Generate complete stories from single-sentence prompts or a large article
- Create AI-generated video animations for each story segment
- Add sentiment-aware voice narration
- Automatic video-audio synchronization
- Interactive web interface
- API access available
- Go to this link: https://colab.research.google.com/drive/16WmebY0aIJqI2H2MFLFAUxx8pPXedUBZ?usp=sharing
- Navigate to Runtime -> Change runtime type in the menu
- Select T4 GPU then click Save
- Run the cell and wait until the gradio interface appears
- Enter any long story or short promt and the code will do the rest
- Once complete a video file called "final_video_with_audio.mp4" will be generated
- Download it
- Enjoy ;)
-
Text Generation:
- Model:
Qwen/Qwen2.5-1.5B-Instruct
- Purpose: Story generation and summarization
- Features:
- 15-20 sentence story generation
- Token-length management
- Narrative flow optimization
- Model:
-
Video Generation:
- Base Model:
emilianJR/epiCRealism
- Motion Adapter:
ByteDance/AnimateDiff-Lightning
- Features:
- Frame generation: 256x256 resolution
- Crossfade transitions
- 8 FPS output
- CUDA acceleration support
- Base Model:
-
Audio Generation:
- Text-to-Speech:
edge_tts
- Sentiment Analysis: HuggingFace pipeline
- Features:
- Dynamic voice selection based on sentiment
- Adjustable speech rate and pitch
- MP3 output format
- Text-to-Speech:
-
Story Generation:
def generate_story(prompt): # Generates 15-20 sentence story # Enforces token limits per sentence # Returns formatted story text
-
Video Creation:
def generate_video(summary): # Splits story into scenes # Generates frames for each scene # Applies transitions # Returns video path
-
Audio Generation:
async def generate_audio_with_sentiment(text): # Analyzes sentiment # Selects appropriate voice # Generates audio narration # Returns audio path
{
"story": "Generated story text",
"video_path": "Path to generated video",
"audio_summary": "Narration text",
"download_link": "HTML download link",
"file_path": "Direct file path"
}
import requests
api_url = "https://a03fe99a8a99d63578.gradio.live/api/generate"
response = requests.post(api_url, json={
"prompt": "A detective discovers a hidden room in an abandoned mansion"
})
- Python 3.8+
- CUDA-capable GPU
- Required packages:
torch
gradio
edge_tts
diffusers
transformers
moviepy
imageio
opencv-python
pip install gradio edge_tts torch diffusers transformers moviepy imageio opencv-python
- Clone the repository
- Install dependencies
- Run the Jupyter notebook
- Access the interface at
http://localhost:7860
- Video Resolution: 256x256 (configurable)
- Frame Rate: 8 FPS
- Story Length: 15-20 sentences
- Token Limit: 77 tokens per sentence
- Audio: Adaptive voice selection based on sentiment
- Input Validation: Checks prompt length and content
- Story Generation: Creates structured narrative
- Video Generation: Processes each sentence into visual scenes
- Audio Creation: Generates narration with sentiment analysis
- Final Compilation: Combines video and audio
- Processing time varies with story length
- GPU memory requirements for video generation
- API availability may be subject to rate limiting
- Video resolution fixed at 256x256
This project is licensed under MIT License.
This project was developed by:
- Ali Kanbar
- Karim Ramadan
- Jameel Zbib
Under the supervision of:
- Eng. Jean-Pierre Fakhry