A modular VoIP
➞ STT
➞ AI Agent
➞ TTS
➞ VoIP
implementation.
Dialog provides a framework and a set of interfaces for building VoIP Agent applications.
- An easy to understand and extensible modular framework
- Event driven architecture
- Facilities for multithreaded deployments
- Talk over interruption of agent
- Conversation history
- Agent-driven STT and TTS selection
NB Dialog is still undergoing active refactoring. Prior to 1.0.0, public interfaces may change on turns of the minor and commit messages will be minimal.
These instructions describe how to clone the Dialog repository and build the package.
git clone https://github.com/faranalytics/dialog.git
cd dialog
npm install && npm update
You can use the clean:build
script in order to do a clean build.
npm run clean:build
Alternatively, you can use the watch
script in order to watch and build the package. This will build the package each time you make a change to a file in ./src
. If you use the watch
script, you will need to open a new terminal in order to build and run your application.
npm run watch
npm install <path-to-the-dialog-respository> --save
You should now be able to import Dialog artifacts into your package.
Example applications are provided in the example subpackages.
When a call is initiated, a Controller
(e.g., a Twilio or Telnyx Controller) emits an init
event. The init
handler is called with a VoIP
instance as its single argument. The VoIP
instance handles the websocket connection that is set on it by the Controller
. In the init
handler, an instance of a Dialog application is constructed by passing a VoIP
, STT
, Agent
, and TTS
implementation into a Dialog
constructor and calling its start
method. The start
method of the Dialog
instance connects the component interfaces that comprise the application.
An important characteristic of the architecture is that a new instance of each component of a Dialog application - a VoIP
, STT
, TTS
, and an Agent
- is created on each call; this means each instance may maintain state relevant to its respective call.
Excerpted from src/main.ts
.
controller.on("init", (voip: VoIP) => {
const stt = new DeepgramSTT({ apiKey: DEEPGRAM_API_KEY });
const tts = new CartesiaTTS({ apiKey: CARTESIA_API_KEY });
const agent = new OpenAIAgent({
apiKey: OPENAI_API_KEY,
system: OPENAI_SYSTEM_MESSAGE,
greeting: OPENAI_GREETING_MESSAGE,
model: OPENAI_MODEL,
});
const dialog = new Dialog({ voip, stt, tts, agent });
dialog.start();
});
Dialog provides example implementations for each of the artifacts that comprise a VoIP Agent application.
A VoIP implementation is provided that uses the Twilio API.
A VoIP implementation is provided that uses the Telnyx API.
An STT implementation is provided that uses the Deepgram API.
A TTS implementation is provided that uses the Cartesia API.
An Agent implementation is provided that uses the OpenAI API.
Dialog provides VoIP
, STT
, Agent
, and TTS
example implementations. You can use a provided implementation as-is, subclass it, or implement your own. If you plan to implement your own VoIP
, STT
, Agent
, or TTS
component, interfaces are provided for each component of the VoIP application.
A custom Agent
implementation will allow you to manage conversation history, turn of speech, agent interruption, STT and TTS selection, and other nuances.
You can extend the provided OpenAIAgent
class, as in the example below, or just implement the Agent
interface. The straight-forward openai_agent.ts
implementation can be used as a guide.
This custom Agent
implementation adds a timestamp to each user message.
import { randomUUID } from "node:crypto";
import { log, Agent, OpenAIAgent, OpenAIAgentOptions } from "@farar/dialog";
export class CustomAgent extends OpenAIAgent implements Agent {
protected mutex: Promise<void>;
constructor(options: OpenAIAgentOptions) {
super(options);
this.mutex = Promise.resolve();
}
public onTranscript = (transcript: string): void => {
this.mutex = (async () => {
try {
await this.mutex;
this.uuid = randomUUID();
log.notice(`User message: ${transcript}`);
this.history.push({ role: "user", content: `${new Date().toISOString()}\n${transcript}` });
this.stream = await this.openAI.chat.completions.create({
model: "gpt-4o-mini",
messages: this.history,
temperature: 0,
stream: true
});
await this.dispatchStream(this.uuid, this.stream);
}
catch (err) {
log.error(err);
}
})();
};
}
If you have a feature request or run into any issues, feel free to submit an issue or start a discussion. You’re also welcome to reach out directly to one of the authors.