This project connects Asterisk with OpenAI's real-time API to enable real-time voice interactions. It processes incoming audio from Asterisk SIP calls, sends it to OpenAI for processing, and streams the audio responses back to the caller seamlessly. (Please use headphones for testing, using speakers will constantly interrupt communication.)
-
Asterisk Integration:
- Connects to Asterisk via the ARI (Asterisk REST Interface) at
http://127.0.0.1:8088
using credentialsasterisk:asterisk
. - Listens for SIP channels entering the Stasis application (
stasis_app
). - Creates a mixing bridge for each call, answers the channel, and sets up an ExternalMedia channel.
- Connects to Asterisk via the ARI (Asterisk REST Interface) at
-
RTP Audio Handling:
- Listens for μ-law audio from Asterisk on RTP port
12000
. - Receives RTP packets, strips headers, converts μ-law to 24kHz PCM with interpolation, normalizes audio (target RMS 0.15), and buffers it.
- Listens for μ-law audio from Asterisk on RTP port
-
OpenAI Real-Time API Integration:
- Establishes a WebSocket connection to OpenAI’s real-time API (
wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17
). - Sends normalized PCM audio chunks to OpenAI as base64-encoded data every 200ms.
- Receives PCM audio responses and transcripts from OpenAI, converting PCM back to μ-law for Asterisk.
- Establishes a WebSocket connection to OpenAI’s real-time API (
-
Audio Streaming:
- Streams OpenAI’s audio responses to Asterisk via RTP with 10ms packet timing (80 samples at 8kHz).
- Manages buffers to avoid overflow (1MB max) and ensures real-time playback with silence padding.
-
Voice Activity Detection (VAD):
- Configures OpenAI’s server-side VAD with customizable threshold (default 0.1), prefix padding (default 300ms), and silence duration (default 500ms).
- Stops RTP streaming to Asterisk when speech is detected to avoid overlap.
-
Logging:
- Uses Winston for detailed logging with timestamps, colored output (cyan for client events, yellow for server events, gray for general logs).
- Logs RTP packet stats, audio processing details (RMS, gain), and OpenAI interactions.
-
File Saving (Optional):
- Saves Asterisk input audio as
.raw
(μ-law) and OpenAI-processed audio as.wav
(24kHz PCM) ifENABLE_SENT_TO_OPENAI_RECORDING
istrue
. - Files are saved on call end (
StasisEnd
).
- Saves Asterisk input audio as
-
Cleanup:
- Handles call termination (
StasisEnd
), closing WebSockets, stopping RTP streams, clearing intervals, and destroying bridges. - Cleans up resources on uncaught exceptions or SIGINT (Ctrl+C).
- Handles call termination (
-
Configuration:
- Loads settings from
.env
(e.g.,OPENAI_API_KEY
,MAX_CALL_DURATION
, VAD settings, logging options). - Provides defaults for unspecified values.
- Loads settings from
- Node.js: Version 16 or higher.
- Asterisk 20: Installed with ARI enabled (default:
http://127.0.0.1:8088
, user:asterisk
, password:asterisk
). - OpenAI API Key: Obtain from OpenAI's platform.
- SIP Client: A SIP client (e.g., softphone like Linphone) to make calls to Asterisk.
- Clone the Repository:
git clone https://github.com/infinitocloud/asterisk_to_openai_rt.git cd asterisk_to_openai_rt# asterisk_to_openai_rt
- **Rename .env.sample to .env and add your OpenAI key.
Asterisk to OpenAI RealTime