Skip to content

Shahrom-S/BarsAI

Repository files navigation

BarsAI 🤖


University of Central Asia 🎓

📝 Project Overview

As my final year project for my Bachelor of Science in Computer Science at the University of Central Asia, I developed BarsAI—a desktop application aimed at making human-computer interaction more natural and intuitive. BarsAI enables intelligent conversations, document processing, and gesture control, bringing together cutting-edge AI technologies like conversational AI, Retrieval-Augmented Generation (RAG), and gesture recognition in a single, easy-to-use application. It runs locally using a GGUF-quantized version of the Llama 2 7B Chat model, allowing the chatbot to function entirely offline. For document-based queries, BarsAI taps into the power of Google’s Gemini Pro model, which requires an internet connection to process and retrieve relevant information. This project reflects my passion for AI and my desire to create a tool that blends intelligence with convenience.

⚡ Features

🤖 Conversational AI - Built with the Llama-2-7B-Chat-GGUF model, the assistant can engage in intelligent and context-aware conversations.
📄 Document Query Processing - Powered by Google AI’s Gemini Pro model, the assistant can process and respond to queries based on uploaded documents (PDF, DOCX, CSV, Excel).
Gesture Control - Users can control their computers using hand gestures.
📡 Offline Functionality - The AI Assistant can operate partially offline, ensuring accessibility even without an internet connection.
💻 User-Friendly Interface - Developed using PyQt5, the interface is designed for simplicity and ease of use.

🧩 Project Structure

Chatbot Module

Screenshot of the Chatbot Module interface

The Chatbot Module of the application handles user interactions and provides responses based on the Llama-2-7B-Chat-GGUF model. It also has a history of previous requests, so it understands the context as shown in the screenshot (the country is not specified in the second request).

RAG Module

Screenshot of the RAG Module interface

Retrieval-Augmented Generation or Document Processing Module manages document queries and generates relevant responses using the Gemini Pro model. When users upload a document, the model processes and indexes the document. After that, the user can ask questions from the contents of the document. If they ask something irrelevant, a message will be displayed saying that it does not relate to the context of the document.

Gesture Control

The Gesture Control Module allows users to interact with their system using gestures.

GC Swipe right and left

Swipe to the right gesture activates forward action, and swipe to the left gesture triggers backwards action.


GC Volume

Volume control gesture adjusts the volume of the computer if the distance between the thumb and index fingers changes.

📥 Installation and Setup

Prerequisites

  • Python 3.10 or above
  • CPU
  • RAM 16 GB (if you have a smaller RAM size, download a smaller quantized version of Llama-2-7B)

Clone the Repository

To clone the repository, run the following commands:


git clone https://github.com/Shahrom-S/BarsAI.git
cd BarsAI

Install Dependencies

To install the necessary dependencies, run the following command:


pip install -r requirements.txt --default-timeout=100

Environment Setup

Follow the steps below to configure your environment:

  • Download Llama-2-7B-Chat-GGUF Model:
  • 
    from huggingface_hub import hf_hub_download 
    model_name_or_path = "TheBloke/Llama-2-7B-Chat-GGML" 
    model_basename = "llama-2-7b-chat.ggmlv3.q5_1.bin" 
    cache_path = hf_hub_download(repo_id=model_name_or_path, 
    filename=model_basename, cache_dir="C:/path/to/Project/directory", force_download=True) 
    model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)
      
  • Set Up API Keys for Document Processing:
  • To enable the document processing feature, obtain an API key from Google AI for the Gemini Pro model. Once you have the API key, add it to an .env file in the project directory:

    
      GOOGLE_API_KEY="Your_API_key_here"
      

Run the Application

Once you have set up the environment and dependencies, create and activate a virtual environment, and run the following command to start the application:


python interface.py

👨‍💻 Further Development

  • Full Offline Capability: Integrating more powerful open-source models such as Llama-3-8B, which can replace Gemini Pro's API for RAG and eliminate dependencies on cloud-based services.
  • Adding Gestures: Adding more gestures to recognize and perform a wider range of actions.
  • Enhanced UI/UX: Further improving the interface for better usability and aesthetic appeal.

👏 Acknowledgment

Special thanks to the University of Central Asia for providing the facilities necessary for the completion of this project.