Skip to content

manasa-26/VoiceAssist-RAG

Repository files navigation

🎙️ VoiceAssist-RAG

VoiceAssist-RAG is a multimodal voice-enabled assistant that combines speech recognition, retrieval-augmented generation (RAG), and text-to-speech to deliver intelligent, voice-based answers from domain-specific documents.


🚀 Features

  • 🔍 RAG Pipeline: Retrieval-Augmented Generation that provides grounded answers using telecom_faq.txt
  • 🎤 Speech-to-Text (STT): Whisper-based transcription of voice input (input.m4a, input.wav)
  • 🧠 Vector Search: Fast and accurate semantic search using FAISS
  • 💬 Streamlit UI: Clean and interactive frontend for voice-to-answer interactions
  • 🗣️ Text-to-Speech (TTS): Converts model-generated answers to audio output (output.mp3)
  • 🧩 Modular Architecture: Easy to maintain and extend

Architecture Diagram

image

DataFlow Diagram

Editor _ Mermaid Chart-2025-06-22-103347

💡 Tech Stack

  • Python, Streamlit
  • OpenAI Whisper for STT
  • FAISS for vector-based semantic retrieval
  • LLM (OpenAI / HuggingFace) for answer generation
  • gTTS / pyttsx3 or similar for TTS

🛠️ Installation

git clone https://github.com/yourusername/call-center-rag.git cd call-center-rag pip install -r requirements.txt

🗂️ Project Structure

image

📸 Output Screenshot

image


📽️ Output Demo Video

https://drive.google.com/file/d/1ePg8iGRVX87NQ8dEGb70B_8Fm4PGkJf6/view?usp=sharing

About

Multimodal Voice RAG Agent using Speech-to-Text, FAISS Search, and Text-to-Speech

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages