This project integrates VideoSDK, OpenAI Realtime APIs and Gemini Vision API to analyse screenshare stream in realtime
git clone https://github.com/videosdk-community/videosdk-gemini-vision-agent.git
cd videosdk-gemini-vision-agent
-
Navigate to
client
dir:cd client
-
Make a copy of the environment configuration file:
cp .env.example .env
-
Create a
.env
file in theclient
folder with:VITE_VIDEOSDK_TOKEN=your_videosdk_auth_token_here
Obtain your VideoSDK Auth Token from app.videosdk.live
Create Virtual Environment (from project root):
python -m venv .venv
Create a virtual environment:
Install Dependencies:
pip install -r requirements.txt
Create Server Environment File (in project root):
cp .env.example .env
Add these keys to your .env
file:
OPENAI_API_KEY=your_openai_key_here
GEMINI_API_KEY=your_gemini_api_key
🔑 Obtaining API Keys
- OpenAI: https://platform.openai.com/api-keys
- Gemini: https://aistudio.google.com/apikey
- VideoSDK Token: https://app.videosdk.live
Start the Server (From Project Root):
uvicorn app:app
Start the Client (From /client
Folder):
npm run dev
For more information, check out docs.videosdk.live.