Gemini Live Cam is a Python application that streams real-time audio (and optionally video or screen captures) from your device to Google Gemini using the Gemini Live API. It enables interactive conversations with Gemini via both text and voice, demonstrating how to integrate media capture, streaming, and AI-powered responses in Python.
- Real-time audio streaming to Gemini with AI-powered voice responses.
- Optional video or screen frame streaming.
- Interactive text chat with Gemini.
- Audio playback of Gemini's responses.
- Extensible for UI integration (Flask, Streamlit, etc.).
git clone <your-repo-url>
cd gemini-live-cam
Windows:
python -m venv gem-env
gem-env\Scripts\activate
Linux/macOS:
python3 -m venv gem-env
source gem-env/bin/activate
pip install google-genai opencv-python pyaudio pillow mss python-dotenv
- Create a
.env
file in the project root. - Add your Google Gemini API key:
GEMINI_API_KEY=your_google_gemini_api_key_here
python gemini-live-cam.py --mode camera
python gemini-live-cam.py --mode screen
python gemini-live-cam.py --mode none
- The script captures audio from your microphone and (optionally) video from your webcam or screen.
- Audio and video/screen frames are streamed to the Gemini model using the Google GenAI Live API.
- You can also interact with Gemini via text input in the console.
- Gemini responds with audio, which is played back in real time.
- Python 3.8+
- A working microphone (and webcam/screen for video/screen modes)
- Google Gemini API key
This project demonstrates how to:
- Integrate real-time media capture (audio/video/screen) in Python.
- Stream data to a state-of-the-art AI model using Google GenAI Live API.
- Build the foundation for advanced AI-powered assistants, bots, or interactive applications.
- Extend with a web UI (Flask/Streamlit).
- Add more controls and error handling.
- Integrate with other tools or APIs as needed.