Link to Demo: Click Here
This project consists of two main parts:
- Backend Server: A Node.js application using Express, WebSockets, Twilio, Azure OpenAI, and Azure Table Storage. It handles the core logic for an AI-driven voice assistant for a record store, including managing conversation state, interacting with the language model, querying data, and streaming responses. It also publishes conversation updates to a Twilio Sync Stream.
- Frontend Web Application: A React application designed to monitor the conversation in real-time by subscribing to the Twilio Sync Stream populated by the backend server. It displays the conversation flow in a chat-like interface.
This Node.js backend server powers the AI assistant. It integrates Twilio for communication (handling WebSocket streams potentially originating from voice calls and using Twilio Sync), Azure OpenAI for natural language processing, and an Azure Table Storage service (data-table-service
) for data retrieval.
- WebSocket Server: Handles real-time communication with voice clients.
- Express API: Provides HTTP endpoints (
/
,/GetToken
). - Azure OpenAI Integration: Uses Azure OpenAI (
o3-mini
) for conversation. - Function Calling: Queries internal data sources (e.g.,
query_stock
). - Streaming Responses: Streams AI responses to the connected voice client.
- Twilio Sync Publishing: Pushes user prompts and AI responses to a designated Twilio Sync Stream for external monitoring.
- Data Service Integration: Connects to Azure Table Storage (
data-table-service
) for store, customer, order, and stock data. - Conversation Management: Maintains context via
promptHistory
. - Interruption Handling: Processes
interrupt
messages from the voice client. - Preemptive Waiting Messages: Sends "hold on" messages during processing.
- Configuration: Uses
.env
file for credentials.
- Node.js and npm (or yarn)
- Azure Account (OpenAI Service, Storage Account)
- Twilio Account (Account SID, Auth Token, API Key SID/Secret, Sync Service SID)
- Git (optional)
- Clone the repository:
git clone <repository-url>
- Navigate to the backend directory:
cd <backend-project-directory>
(Assuming separate directories) - Install dependencies:
npm install
(oryarn install
) - Create
.env
file: Populate with credentials (see below). - Compile TypeScript:
npm run build
(usesesbuild
as perpackage.json
) - Run the server:
npm start
(runsnode dist/server.js
) oryarn dev
for development (usesnodemon
andts-node
).
# Twilio Credentials
TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TWILIO_API_KEY_SID=SKxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TWILIO_API_KEY_SECRET=your_api_key_secret
TWILIO_SYNC_SERVICE_SID=ISxxxxxxxxxxxxxxxxxxxxxxxxxxxxx # SID of the Sync Service to use
# Twilio Client Credentials (used for sending Sync messages)
ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
AUTH_TOKEN=your_twilio_auth_token
# Azure OpenAI Credentials
LLM_API_KEY=your_azure_openai_api_key
LLM_ENDPOINT="[https://recordshopaifo3229622076.openai.azure.com/](https://recordshopaifo3229622076.openai.azure.com/)"
GET /
: Health check.GET /GetToken?identity=<user_identity>
: Generates Twilio Access Token with Sync grants. Used by both voice clients (potentially) and the frontend monitor app.
ws://YourTunneledAPI.com
: Primary interaction point for the voice client (handles setup, prompt, interrupt, end messages; streams text responses).- Consider using cloudflared tunnel for use with Twilio
This is a React application built with Create React App and TypeScript. It serves as a visual monitor for the AI conversation happening via the backend. It connects to the same Twilio Sync Stream that the backend publishes messages to and displays the conversation in a chat interface.
- Real-time Conversation Display: Subscribes to the Twilio Sync Stream and displays messages as they are published by the backend.
- Chat Interface: Renders messages with alignment differentiating "System" (AI/Backend) and "User" inputs.
- Typewriter Effect: Uses a custom
TypewriterText
component to display System messages with a typing animation. - Interrupt Handling (Visual): Listens for special
interrupt: true
signals on the Sync Stream. When received, it flags the last displayed System message to stop its typewriter effect, visually representing the interruption. - Token Fetching: Retrieves its own Twilio Sync access token from the backend's
/GetToken
endpoint.
- Node.js and npm (or yarn)
- A running instance of the backend server (to provide the
/GetToken
endpoint and publish Sync messages).
-
Navigate to the frontend directory:
cd <frontend-project-directory>
(Assuming separate directories) -
Install dependencies:
npm install
(oryarn install
) -
Configure Backend URL:
- Create a .env file which includes the following:
REACT_APP_API_BASE_URL=<Your API URL>
-
Run the development server:
npm start
(oryarn start
). This will typically open the app in your browser athttp://localhost:3000
.
- Initialization: On mount, the component fetches a Twilio Access Token from the configured backend URL using the hardcoded identity
WebApp
. - Sync Client: Initializes the
twilio-sync
client with the token and sets up listeners for connection state changes and token expiry (including refresh logic). - Stream Subscription: Subscribes to the Twilio Sync Stream identified by the
streamNameOrSid
prop. - Message Listener: Listens for
messagePublished
events on the stream. - Message Processing:
- When a message arrives, its data payload (
{ text, author, messageId, interrupt? }
) is extracted. - Interrupt Signal: If the incoming message data contains
interrupt: true
, the component finds the last message already in its state and updates that message object by adding aninterrupt: true
flag to it. This signals theTypewriterText
component rendering that specific message to stop typing. - Regular Message: If it's not an interrupt signal, the new message data is appended to the
messages
state array.
- When a message arrives, its data payload (
- Rendering: The component maps over the
messages
array, rendering each message in a list item styled like a chat bubble. System messages use theTypewriterText
component, passing the interrupt flag. User messages are displayed directly. - Cleanup: On unmount, listeners are removed, and the Sync client is shut down.
1. User (Voice) <---> Twilio Voice Service
|
| (Bi-directional Stream)
v
2. Backend Server (ws://localhost:8000)
- Receives audio/events from Twilio
- Sends audio/commands to Twilio
- Interacts with Azure OpenAI (LLM)
- Interacts with Azure Table Storage (Data)
- Publishes conversation messages ("User:", "System:") to Twilio Sync Stream ---> 3. Twilio Sync Service
|
| (Sync Stream Updates)
v
4. Frontend Monitor App (http://localhost:3000) <------------------------------------
- Fetches Token from Backend (/GetToken)
- Subscribes to Twilio Sync Stream
- Displays conversation messages