A simple web application for real-time AI vision analysis using SmolVLM-500M-Instruct with live camera feed processing and text-to-speech.
- Modern web browser with camera access
- llama-server with SmolVLM running locally
-
Install and run SmolVLM server:
# Install llama-server and SmolVLM # Run the server llama-server -hf ggml-org/SmolVLM-500M-Instruct-GGUF
-
Serve the application:
# Using Python python -m http.server 8000 # Or open index.html directly in your browser
-
Access: Open
http://localhost:8000
in your browser
- Grant camera permissions when prompted
- Configure API endpoint (default:
http://localhost:8080
) - Set your instruction (e.g., "What do you see?")
- Choose request interval (100ms - 2s)
- Click Start to begin real-time analysis
- Real-time camera feed processing
- Customizable AI instructions
- Text-to-speech responses
- Adjustable analysis intervals
- Modern, responsive UI
- Local processing only
- No data storage
- Secure HTTPS connections required
- User-controlled camera permissions
MIT License
Made with ❤️ for the AI community