Skip to content

Krish0909/Collaborative-Multi-SLM-Chatbot

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

Collaborative Multi-Agent AI Framework using Cross-Attention

Overview

The Collaborative Multi-Agent AI Framework is a state-of-the-art solution that integrates multiple specialized smaller language models (SLMs) to deliver domain-specific expertise through collaborative reasoning. This system surpasses traditional monolithic large-scale language models (LLMs) by employing Multi-Agent Reinforcement Learning (MARL), intelligent query routing, and cross-attention mechanisms. It achieves superior computational efficiency while delivering semantically rich, precise, and robust responses across diverse domains.

This project aims to develop an efficient, domain-aware chatbot system by integrating specialized models like BioGPT (for medical expertise) and Qwen Coder (for technical expertise) into a collaborative framework. The system utilizes GPT-4.0 Mini as a lightweight query analysis and routing agent to decompose user queries and direct them to relevant models. Leveraging cross-attention and multi-agent reinforcement learning (MARL), the architecture ensures knowledge exchange and optimized collaboration between models.

FlowChart
System Architecture


What is Cross Attention in Our Project?

FlowChart
Cross Attention Logic

Cross attention is a pivotal mechanism in our framework that enables collaborative reasoning and information exchange between specialized language models (SLMs), such as BioGPT and Qwen Coder, to produce unified, semantically coherent responses. It operates as the connective tissue between models, aligning their domain-specific insights and ensuring that the final output leverages the strengths of each model effectively.

How Cross Attention Works

  1. Input Tensors from Specialized Models:

    • Each domain-specific model (e.g., BioGPT and Qwen Coder) processes its assigned query fragments and generates two key components:
      • Key (K): Encodes the contextual information of the domain.
      • Value (V): Represents the actual knowledge or output of the model.
      • Additionally, for cross-attention, a Query (Q) tensor is generated from the main routing model (GPT-4.0 Mini) or the primary domain.
  2. Attention Scores Calculation:

    • Cross attention calculates the similarity between the Query tensor (Q) from one model and the Key tensor (K) from another model:

      Attention Scores = Q . K ^ T


    • This step identifies how strongly one model's output (Value tensor) should influence the response, based on its relevance to the query.
  3. Weighting the Outputs:

    • The attention scores are passed through a softmax function to normalize them into probabilities.
    • These probabilities are then applied to the Value tensor (V) of the second model:

      Context=Softmax(Attention Scores)⋅V


    • This creates a refined "context" that integrates knowledge from the second model into the primary response.
  4. Knowledge Exchange:

    • The refined context vectors are reshaped and combined into the originating model's tensor, enabling a bidirectional exchange of knowledge.
    • This ensures that models do not work in isolation but instead collaboratively enrich each other’s outputs.
  5. Unified Response Generation:

    • After cross-attention, the enriched outputs from all participating models are merged, ensuring a comprehensive understanding of the query.

Benefits of Cross Attention in Our Framework

  1. Semantic Alignment:

    • Cross attention ensures that responses from different models are aligned and consistent in context, even if they originate from diverse domains.
  2. Enhanced Collaboration:

    • It allows models to "learn from" and adapt to the strengths of other models during query resolution, resulting in a holistic response.
  3. Contextual Refinement:

    • By weighting the contributions of each model's output, cross attention dynamically adjusts the importance of specific knowledge based on query relevance.
  4. Computational Efficiency:

    • Instead of using a single, large-scale model for all queries, cross attention leverages lightweight specialized models and fuses their outputs effectively.

Example Workflow: Query Splitting and Cross Attention

  • Original Query: "Explain how diabetes impacts cardiovascular health and suggest a technical solution to monitor related risks."
    1. Query Splitting:
      • Sub-Query 1: "How does diabetes impact cardiovascular health?" → Sent to BioGPT.
      • Sub-Query 2: "What are technical solutions to monitor cardiovascular risks for diabetes patients?" → Sent to Qwen Coder.
    2. Domain-Specific Processing:
      • BioGPT: Outputs a detailed explanation of the biological link between diabetes and cardiovascular health.
      • Qwen Coder: Suggests technical solutions like wearable devices or monitoring applications.
    3. Cross Attention Integration:
      • Cross attention enables knowledge sharing between the models, enriching the technical solution with medical context and vice versa.
    4. Unified Response:
      • A final, coherent answer is generated, combining medical insights with actionable technical recommendations.

Key Features

  1. Intelligent Query Splitting and Routing:

    • GPT-4.0 Mini decomposes user queries into domain-specific components.
    • Specialized routing:
      • Technical Queries: Processed by Qwen Coder.
      • Medical Queries: Addressed by BioGPT.
  2. Model Collaboration with Cross-Attention:

    • Integrates responses through a cross-attention mechanism to enhance semantic coherence.
    • Enables knowledge exchange between domain-specific embeddings.
  3. Reinforcement Learning and MARL:

    • Implements multi-agent reinforcement learning (MARL) to foster synergistic interactions.
    • Uses game-theoretic strategies to optimize information exchange and improve response quality dynamically.
  4. Caching and Efficiency:

    • Uses a caching layer (e.g., Redis) for frequent queries to reduce latency and computational overhead.

System Architecture

Input Query Analysis:

  • User queries are analyzed and routed by GPT-4.0 Mini.
  • Pre-generated responses are retrieved from a Redis-based cache if available.

Domain-Specific Processing:

  • Query components are routed to domain-specific models (BioGPT, Qwen Coder), which generate independent responses and embeddings.

Cross-Attention Mechanism:

  • Responses are refined through a cross-attention mechanism, enabling collaborative reasoning.

Response Generation:

  • Outputs are merged and validated by GPT-4.0 Mini, producing a unified, semantically rich response.

Reinforcement Learning:

  • MARL principles iteratively optimize model collaboration for long-term performance.

FlowChart


Tech Stack

  • Models: BioGPT, Qwen Coder (Hugging Face)
  • Frameworks: PyTorch, Hugging Face Transformers
  • Learning Techniques: Reinforcement Learning (RL), Multi-Agent RL (MARL), Game Theory, Cross-Attention

Setup Instructions

  1. Clone the repository:
    git clone https://github.com/YourRepo/CrossAttentionChatbot.git
    cd CrossAttentionChatbot
  2. Install dependencies:
    pip install -r requirements.txt
  3. Configure API keys for GPT-4-mini and Hugging Face models in .env.
  4. Run the application:
    python main.py

Applications

  • Healthcare: Generate medically accurate responses via BioGPT.
  • Coding Assistance: Leverage Qwen Coder for precise technical solutions.
  • Cross-Domain Problem Solving: Seamless collaboration across diverse domains. And many more accordingly!

Results

  • Efficiency: Gurantees more than 50% reduction in compute costs compared to traditional large-scale models.
  • Speed : Average response time of under 2 seconds.
  • Accuracy : Matches or surpasses monolithic LLMs with domain-specific specialization.

Future Enhancements

  • Implementing adaptive reward systems for RL.
  • Expanding domain coverage with additional SLMs.
  • Enhancing collaboration with adversarial training.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%