A voice-based conversational AI agent that conducts cold calls in Hinglish (a mix of Hindi and English) for various business scenarios using Google's Gemini AI.
This project creates an interactive voice assistant that can conduct realistic cold calls in Hinglish for three different business scenarios:
- Demo Scheduling: Sales representative pitching an ERP software product
- Candidate Interviewing: HR representative conducting initial job screening
- Payment Follow-up: Accounts department representative requesting overdue payment
The agent uses speech recognition to understand voice input, processes it through Google's Gemini AI model to generate contextually appropriate responses, and delivers them using text-to-speech.
- 🎙️ Voice Recognition: Captures and transcribes user speech
- 🤖 AI-Powered Responses: Generates contextually relevant responses using Gemini 1.5 Pro
- 🗣️ Text-to-Speech: Converts AI responses to natural-sounding voice output
- 💬 Bilingual Support: Handles Hinglish conversations (Hindi-English mix)
- 🧠 Conversation Memory: Maintains context throughout the interaction
- 📝 Scenario Templates: Pre-configured prompts for different business use cases
- Python 3.7+
- Google API key for Gemini AI
- Internet connection for speech API and Gemini services
- Clone the repository:
git clone https://github.com/yourusername/hinglish-cold-call-agent.git
cd hinglish-cold-call-agent
- Install required packages:
pip install -r requirements.txt
- Set up your Gemini API key:
- Get your API key from Google AI Studio
- Replace
GEMINI_API_KEY
in the code with your actual key
Run the main script:
python cold_call_agent.py
Follow the on-screen instructions to select a scenario. The agent will:
- Greet you with an introduction specific to the chosen scenario
- Listen for your voice input
- Respond appropriately in Hinglish
- Continue the conversation until you say "bye", "goodbye", or "end call"
hinglish-cold-call-agent/
├── cold_call_agent.py # Main application file
├── requirements.txt # Required Python packages
└── README.md # Project documentation
The application has three main components:
- SpeechHandler: Manages voice input and output using speech recognition and text-to-speech services
- ConversationMemory: Maintains the conversation history and context
- GeminiAgent: Processes the conversation using structured prompts and the Gemini AI model
Each conversation follows a three-phase structure:
- Greeting: Introduces the agent and purpose of the call
- Conversation: Handles the main dialogue
- Farewell: Concludes the call with a summary and next steps
Simulates a sales representative calling to schedule a product demo for an ERP system, highlighting features relevant to the customer's interests.
Simulates an HR representative conducting an initial screening interview for a software engineering position, assessing candidate qualifications.
Simulates an accounts department representative following up on an overdue invoice, aiming to secure a payment commitment.
- SpeechRecognition: For voice recognition
- gTTS: For text-to-speech conversion
- pygame: For audio playback
- Google Generative AI: For AI response generation
- Add support for more languages and regional accents
- Implement more business scenarios
- Enhance conversation memory with long-term retention
- Add sentiment analysis to adapt tone based on customer mood
- Implement call recording and analytics
- Google for providing the Gemini AI API
- Open source speech recognition and text-to-speech libraries
Contributions are welcome! Please feel free to submit a Pull Request.