Skip to content

videosdk-community/videosdk-gemini-vision-agent

Repository files navigation

AI Vision Agent using VideoSDK and Google Gemini Vision API

This project integrates VideoSDK, OpenAI Realtime APIs and Gemini Vision API to analyse screenshare stream in realtime

Gemini AI Vision Agent

Start with the project

git clone https://github.com/videosdk-community/videosdk-gemini-vision-agent.git
cd videosdk-gemini-vision-agent

Client Setup

  1. Navigate to client dir:

    cd client
  2. Make a copy of the environment configuration file:

    cp .env.example .env
  3. Create a .env file in the client folder with:

    VITE_VIDEOSDK_TOKEN=your_videosdk_auth_token_here

Obtain your VideoSDK Auth Token from app.videosdk.live

Server Setup (Python FastAPI)

Create Virtual Environment (from project root):

python -m venv .venv

Create a virtual environment:

Install Dependencies:

pip install -r requirements.txt

Create Server Environment File (in project root):

cp .env.example .env

Add these keys to your .env file:

OPENAI_API_KEY=your_openai_key_here
GEMINI_API_KEY=your_gemini_api_key

🔑 Obtaining API Keys


▶️ Running the Application

Start the Server (From Project Root):

uvicorn app:app

Start the Client (From /client Folder):

npm run dev

For more information, check out docs.videosdk.live.

About

AI Meeting Assistant with Gemini Vision API and VideoSDK

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •