A next-generation AI assistant powered by LLaVA (Large Language and Vision Assistant) with advanced visual understanding capabilities. This project combines the power of LLaVA's multimodal AI with Vy's task automation capabilities and SecondMe's training architecture.
- Visual Understanding: Advanced image analysis and understanding using LLaVA
- Chat Interface: Natural language conversation with vision capabilities
- Task Automation: Vy's powerful computer task automation
- Open Source: Built entirely with free, open-source tools
- SecondMe Integration: Leverages SecondMe's AI training infrastructure
- Multimodal AI: Combines text and vision processing
Vy-LLaVA-Vision
βββ llava_integration/ # LLaVA model integration
βββ chat_interface/ # Web-based chat UI
βββ vision_processing/ # Image processing pipeline
βββ task_automation/ # Vy task execution engine
βββ secondme_bridge/ # SecondMe integration
βββ api/ # REST API endpoints
βββ config/ # Configuration files
βββ docs/ # Documentation
- Python 3.8+
- CUDA-compatible GPU (recommended)
- 16GB+ RAM
- Git
# Clone the repository
git clone https://github.com/xirtech/vy-llava-vision.git
cd vy-llava-vision
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Download LLaVA model
python scripts/download_model.py
# Start the application
python main.py
Edit config/config.yaml
to customize:
llava:
model_path: "models/llava-v1.5-7b"
device: "cuda"
max_tokens: 2048
chat:
port: 8080
host: "0.0.0.0"
vision:
max_image_size: 1024
supported_formats: ["jpg", "png", "webp"]
secondme:
api_endpoint: "http://localhost:7865"
enabled: true
- Start the application:
python main.py
- Open your browser to
http://localhost:8080
- Upload an image or take a screenshot
- Ask questions about the image or request tasks
import requests
# Send image for analysis
response = requests.post(
"http://localhost:8080/api/analyze",
files={"image": open("screenshot.png", "rb")},
data={"query": "What do you see in this image?"}
)
print(response.json())
llava_integration/
: Core LLaVA model integrationchat_interface/
: React-based web interfacevision_processing/
: Image preprocessing and analysistask_automation/
: Vy's task execution capabilitiesapi/
: FastAPI backend services
- Fork the repository
- Create a feature branch:
git checkout -b feature-name
- Make your changes
- Run tests:
pytest tests/
- Submit a pull request
- LLaVA - Large Language and Vision Assistant
- SecondMe - AI training platform
- Vy - AI task automation assistant
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
- LLaVA team for the amazing multimodal AI model
- SecondMe team for the training infrastructure
- Vercept team for Vy's automation capabilities
For questions and support:
- Create an issue on GitHub
- Visit Vercept for Vy-related questions
- Check the documentation for detailed guides