Skip to content

A new version of Vy AI assistant powered by LLaVA (Large Language and Vision Assistant) with advanced visual understanding capabilities. Built with open-source tools and integrated. withSecondMe architecture.

Notifications You must be signed in to change notification settings

xlrdtech/vy-llava-vision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 

Repository files navigation

πŸ€– Vy-LLaVA-Vision

A next-generation AI assistant powered by LLaVA (Large Language and Vision Assistant) with advanced visual understanding capabilities. This project combines the power of LLaVA's multimodal AI with Vy's task automation capabilities and SecondMe's training architecture.

🌟 Features

  • Visual Understanding: Advanced image analysis and understanding using LLaVA
  • Chat Interface: Natural language conversation with vision capabilities
  • Task Automation: Vy's powerful computer task automation
  • Open Source: Built entirely with free, open-source tools
  • SecondMe Integration: Leverages SecondMe's AI training infrastructure
  • Multimodal AI: Combines text and vision processing

πŸ—οΈ Architecture

Vy-LLaVA-Vision
β”œβ”€β”€ llava_integration/     # LLaVA model integration
β”œβ”€β”€ chat_interface/        # Web-based chat UI
β”œβ”€β”€ vision_processing/     # Image processing pipeline
β”œβ”€β”€ task_automation/       # Vy task execution engine
β”œβ”€β”€ secondme_bridge/       # SecondMe integration
β”œβ”€β”€ api/                   # REST API endpoints
β”œβ”€β”€ config/                # Configuration files
└── docs/                  # Documentation

πŸš€ Quick Start

Prerequisites

  • Python 3.8+
  • CUDA-compatible GPU (recommended)
  • 16GB+ RAM
  • Git

Installation

# Clone the repository
git clone https://github.com/xirtech/vy-llava-vision.git
cd vy-llava-vision

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Download LLaVA model
python scripts/download_model.py

# Start the application
python main.py

πŸ”§ Configuration

Edit config/config.yaml to customize:

llava:
  model_path: "models/llava-v1.5-7b"
  device: "cuda"
  max_tokens: 2048

chat:
  port: 8080
  host: "0.0.0.0"

vision:
  max_image_size: 1024
  supported_formats: ["jpg", "png", "webp"]

secondme:
  api_endpoint: "http://localhost:7865"
  enabled: true

πŸ“– Usage

Web Interface

  1. Start the application: python main.py
  2. Open your browser to http://localhost:8080
  3. Upload an image or take a screenshot
  4. Ask questions about the image or request tasks

API Usage

import requests

# Send image for analysis
response = requests.post(
    "http://localhost:8080/api/analyze",
    files={"image": open("screenshot.png", "rb")},
    data={"query": "What do you see in this image?"}
)

print(response.json())

πŸ› οΈ Development

Project Structure

  • llava_integration/: Core LLaVA model integration
  • chat_interface/: React-based web interface
  • vision_processing/: Image preprocessing and analysis
  • task_automation/: Vy's task execution capabilities
  • api/: FastAPI backend services

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Make your changes
  4. Run tests: pytest tests/
  5. Submit a pull request

πŸ”— Related Projects

  • LLaVA - Large Language and Vision Assistant
  • SecondMe - AI training platform
  • Vy - AI task automation assistant

πŸ“„ License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

🀝 Acknowledgments

  • LLaVA team for the amazing multimodal AI model
  • SecondMe team for the training infrastructure
  • Vercept team for Vy's automation capabilities

πŸ“ž Support

For questions and support:

  • Create an issue on GitHub
  • Visit Vercept for Vy-related questions
  • Check the documentation for detailed guides

About

A new version of Vy AI assistant powered by LLaVA (Large Language and Vision Assistant) with advanced visual understanding capabilities. Built with open-source tools and integrated. withSecondMe architecture.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published