Low-Cost AI Voice Assistant with OpenAI, Deepgram & Twilio

Build your own AI voice assistant that can handle inbound calls using OpenAI's GPT for conversation, Deepgram for speech processing, and Twilio for telephony - all for around 1 cent per minute!

Watch the Tutorial Video

Overview

This repository is Part 1 of a series demonstrating how to build production-ready AI voice assistants. In this first part, we focus on handling inbound calls and basic FAQ responses, achieving:

~1 second latency
~$0.01 per minute cost
Natural conversation flow with interruption handling
Scalable architecture for future expansion

Coming in Part 2 (stay tuned!):

Function calling capabilities
Outbound call handling
Enhanced text-to-speech with 11.labs
And more!

System Architecture

High-Level Architecture

Key Components:

Twilio: Handles inbound calls and audio streaming
Deepgram:
- Speech-to-Text: Real-time transcription
- Text-to-Speech: Response generation
OpenAI GPT: Natural language processing and response generation
WebSocket Server: Real-time audio streaming and service orchestration

Code Architecture

The system is built with a modular architecture:

app.js: Main server and WebSocket handling
services/:
- gpt-service.js: OpenAI integration and conversation management
- stream-service.js: Audio streaming and buffer management
- transcription-service.js: Speech-to-text processing
- tts-service.js: Text-to-speech conversion

Setup Guide

Prerequisites

Node.js (v14+)
npm/yarn
Accounts with:
- Twilio
- Deepgram
- OpenAI

Installation

Clone the repository:

git clone https://github.com/Barty-Bart/ai-voice-assistant-openai-deepgram.git
cd ai-voice-assistant-openai-deepgram

Install dependencies:

npm install

Create .env file:

SERVER=your-server-domain
DEEPGRAM_API_KEY=your-deepgram-api-key
VOICE_MODEL=your-preferred-voice-model
OPENAI_API_KEY=your-openai-api-key

Configure Twilio:

Set up a Twilio phone number
Configure webhook to point to your /incoming endpoint
Ensure your server has HTTPS (required for Twilio)

Start the server:

npm start

How It Works

Call Initiation:
- Customer calls Twilio number
- Twilio establishes WebSocket connection with server
Real-time Processing:
- Speech-to-Text: Customer audio → Deepgram → Text
- Processing: Text → OpenAI GPT → Response
- Text-to-Speech: Response → Deepgram → Audio
- Audio streamed back to caller
Key Features:
- Real-time transcription and response
- Natural conversation handling
- Interruption detection
- Ordered message queuing

Future Improvements

Implement streaming TTS API from Deepgram for reduced latency
Integrate Elevenlabs for enhanced voice quality
Add outbound calling capabilities
Implement function calling for complex tasks
Add more sophisticated conversation handling

Costs & Performance

Cost: Approximately 1 cent per minute
- Significantly lower than commercial alternatives ($5-10 cents/min)
Latency: ~1 second response time
- Can be further optimized with streaming TTS

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Resources

Source Code

This project was built based on Twilio's Call-GPT.

Keywords

ai voice assistant, openai gpt, deepgram, twilio, voice ai, chatbot, conversational ai, speech recognition, text to speech, websocket, nodejs, real-time audio, low-cost ai, inbound calls

Star ⭐ this repository if you find it helpful!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
services		services
.gitignore		.gitignore
.replit		.replit
README.md		README.md
app.js		app.js
package-lock.json		package-lock.json
package.json		package.json
replit.nix		replit.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Low-Cost AI Voice Assistant with OpenAI, Deepgram & Twilio

Overview

System Architecture

High-Level Architecture

Code Architecture

Setup Guide

Prerequisites

Installation

How It Works

Future Improvements

Costs & Performance

Contributing

Resources

Source Code

Keywords

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Barty-Bart/ai-voice-assistant-openai-deepgram

Folders and files

Latest commit

History

Repository files navigation

Low-Cost AI Voice Assistant with OpenAI, Deepgram & Twilio

Overview

System Architecture

High-Level Architecture

Code Architecture

Setup Guide

Prerequisites

Installation

How It Works

Future Improvements

Costs & Performance

Contributing

Resources

Source Code

Keywords

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages