This project is an advanced AI Video Analyzer and Chat Agent built using Streamlit, powered by Google's Gemini 1.5 Flash and LangChain's DuckDuckGo integration. It provides an interactive platform for users to analyze videos, get AI-powered insights, and perform web searches all in one interface.
- Project Overview
- Features
- Uses and Scope
- File Structure
- Software and Tools Requirements
- Getting Started
- Data Description
- Usage
- Future Enhancements
- Acknowledgments
The AI Video Analyzer & Chat Agent is a powerful web application that combines video analysis capabilities with natural language processing and web search functionality. It uses Agno's Agent framework to integrate Google's Gemini 1.5 Flash model for video understanding and LangChain's DuckDuckGoSearchRun tool for supplementary web searches, providing users with comprehensive insights and information about their uploaded videos.
- AI Agent Architecture: Built using Agno's Agent framework for seamless AI integration
- Video Upload & Processing: Support for multiple video formats (MP4, MOV, AVI, MKV)
- AI-Powered Analysis: Video content analysis using Gemini 1.5 Flash
- Interactive Chat Interface: Real-time conversation with the AI about video content
- LangChain Tools Integration: Web search functionality using LangChain's DuckDuckGoSearchRun tool
- Session Management: Automatic timeout after 1 hour of inactivity
- Responsive UI: Clean and intuitive user interface with auto-scrolling chat
- Multi-Modal Analysis: Combines video understanding with text-based responses
- Temporary File Handling: Secure processing of uploaded videos
The AI Video Analyzer & Chat Agent serves multiple purposes across different domains:
- Content Analysis: Quickly understand and extract insights from video content
- Research & Education: Analyze educational videos and gather supplementary information
- Content Creation: Help content creators understand and improve their videos
- Information Synthesis: Combine video analysis with web search results for comprehensive understanding
- Interactive Learning: Engage with video content through natural language conversations
- Multi-Modal Processing: Utilize Gemini 1.5 Flash for advanced video understanding
- Tool-Augmented Search: Leverage LangChain tools for enhanced web search capabilities
ai-video-analyzer
│
├── app.py # Main application
├── .env # Environment variables file
├── .env.example # Example environment variables template
├── .gitignore # Git ignore rules
└── requirements.txt # Python dependencies
- Python 3.7 or higher
- pip (Python package manager)
- Google AI Studio API key
- Agno API key
-
Clone the repository:
git clone https://github.com/yourusername/ai-video-analyzer.git cd ai-video-analyzer
-
Create an Virtual Environment
python -m venv venv
- Activate the Virtual Environment
venv/Scripts/Activate # on windows source venv/bin/activate # on mac
- Activate the Virtual Environment
-
Install required packages:
pip install -r requirements.txt
-
Set up environment variables:
cp .env.example .env
-
Edit the
.env
file and add your API keys:GOOGLE_API_KEY="your_google_api_key_here" AGNO_API_KEY="your_agno_api_key_here"
- You'll need to:
- Get a Google API key from Google AI Studio
- Get an Agno API key from Agno
- You'll need to:
- Start the Streamlit application:
streamlit run app.py
-
Open your web browser and navigate to the provided local URL (typically http://localhost:8501)
-
Upload a video file in the supported format (MP4, MOV, AVI, MKV)
-
Wait for the video processing to complete
-
Start chatting with the AI about the video content:
-
Ask Questions:
- Use the app to ask questions about the video content. The AI analyzes the video using Gemini 1.5 Flash and provides insightful answers.
- e.g
What is the main theme of the video?
-
Request Summaries & Analysis:
- Get summaries, key points, or detailed breakdowns of the video content for better understanding.
- e.g
Analyze the video and use key points to describe the detailed breakdown of the video content
-
Search for Additional Information:
- Use the integrated LangChain DuckDuckGoSearchRun tool to find related information or expand on the video's topic.
- e.g
Summarize the video, use web search for the given information and authenticate it.
-
Direct Web Search:
- Alternatively, perform a direct web search for your query using the
Web Search 🔍
feature, powered by LangChain's search tools. - e.g
What is Agno?
(Let's say the video was about Agno, so you can directly web search for it in the same application and get results.)
- Alternatively, perform a direct web search for your query using the
-
The application uses:
- Agno's Agent framework with Gemini 1.5 Flash to process and understand video content
- LangChain's DuckDuckGoSearchRun tool for web search capabilities
- Streamlit for the interactive UI
- Session state for maintaining conversation context
The application handles various types of data:
- Video Files: Supports MP4, MOV, AVI, and MKV formats
- Chat History: Stored in session state for the duration of the session
- Processed Video Data: Temporarily stored during analysis
- Web Search Results: Retrieved via LangChain's DuckDuckGoSearchRun tool in real-time
- Agent State: Managed by Agno's framework
-
Advanced Video Analysis:
- Scene detection and segmentation
- Object and person recognition
- Sentiment analysis of video content
-
Enhanced User Experience:
- Custom video player controls
- Timestamp-based questioning
- Export functionality for chat history
-
Performance Optimizations:
- Video compression before processing
- Caching of frequent queries
- Batch processing capabilities
-
Additional Features:
- Multiple video comparison
- Collaborative analysis sessions
- Integration with additional LangChain tools (e.g., document loaders, memory modules)
- Custom model fine-tuning options
- Advanced LangChain-powered retrieval and search capabilities
- Powered by Google's Gemini 1.5 Flash model
- Built with Agno's Agent framework
- Uses LangChain's DuckDuckGoSearchRun tool for web search capabilities
- Built with Streamlit's powerful web framework
- Inspired by the need for intelligent video analysis tools