Introducing Sentry – an AI-powered podcast guest designed to transform how we interact with audio content. This project uses advanced language models to create an interactive experience where listeners can ask questions and receive thoughtful, human-like responses in real time. Features
Powered by Google's Gemini large language model (LLM), Sentry generates contextually relevant and informative responses to listener questions. Speech Recognition: OpenAI's Whisper speech-to-text model accurately transcribes audio input, enabling seamless voice interactions. Contextual Awareness: The FAISS vector database retrieves relevant information from historical interactions and external knowledge sources, enhancing the contextual understanding of Sentry's responses.
spaCy, a robust NLP library, aids in understanding and processing listener queries. Expressive Speech Synthesis: SunoAI's BARK text-to-speech model converts Sentry's responses into natural-sounding human speech, creating an immersive audio experience. User-Friendly Interface: A Flask-based web application provides a platform for listeners to interact with Sentry through voice or text.
- Google Gemini: Large Language Model for generating human-like responses.
- OpenAI Whisper: Speech-to-text model for accurate audio transcription.
- FAISS: Vector database for efficient information retrieval.
- spaCy: Natural Language Processing library for text processing.
- SunoAI BARK: Text-to-speech model for natural-sounding speech synthesis.
- Flask: Web framework for developing the user interface.