Skip to content

faranalytics/dialog

Repository files navigation

Dialog

A modular VoIPSTTAI AgentTTSVoIP implementation.

Introduction

Dialog provides a framework and a set of interfaces for building VoIP Agent applications.

Features

  • An easy to understand and extensible modular framework
  • Event driven architecture
  • Facilities for multithreaded deployments
  • Talk over interruption of agent
  • Conversation history
  • Agent-driven STT and TTS selection

NB Dialog is still undergoing active refactoring. Prior to 1.0.0, public interfaces may change on turns of the minor and commit messages will be minimal.

Table of contents

Installation

Development Installation

These instructions describe how to clone the Dialog repository and build the package.

Clone the repository.

git clone https://github.com/faranalytics/dialog.git

Change directory into the Dialog repository.

cd dialog

Install the package dependencies.

npm install && npm update

Build the Dialog package.

You can use the clean:build script in order to do a clean build.

npm run clean:build

Alternatively, you can use the watch script in order to watch and build the package. This will build the package each time you make a change to a file in ./src. If you use the watch script, you will need to open a new terminal in order to build and run your application.

npm run watch

Install Dialog into your package

Change directory into your package directory and install the package.

npm install <path-to-the-dialog-respository> --save

You should now be able to import Dialog artifacts into your package.

Usage

Example applications are provided in the example subpackages.

How it works

When a call is initiated, a Controller (e.g., a Twilio or Telnyx Controller) emits an init event. The init handler is called with a VoIP instance as its single argument. The VoIP instance handles the websocket connection that is set on it by the Controller. In the init handler, an instance of a Dialog application is constructed by passing a VoIP, STT, Agent, and TTS implementation into a Dialog constructor and calling its start method. The start method of the Dialog instance connects the component interfaces that comprise the application.

An important characteristic of the architecture is that a new instance of each component of a Dialog application - a VoIP, STT, TTS, and an Agent - is created on each call; this means each instance may maintain state relevant to its respective call.

Excerpted from src/main.ts.

controller.on("init", (voip: VoIP) => {
  const stt = new DeepgramSTT({ apiKey: DEEPGRAM_API_KEY });
  const tts = new CartesiaTTS({ apiKey: CARTESIA_API_KEY });
  const agent = new OpenAIAgent({
    apiKey: OPENAI_API_KEY,
    system: OPENAI_SYSTEM_MESSAGE,
    greeting: OPENAI_GREETING_MESSAGE,
    model: OPENAI_MODEL,
  });
  const dialog = new Dialog({ voip, stt, tts, agent });
  dialog.start();
});

Implementations

Dialog provides example implementations for each of the artifacts that comprise a VoIP Agent application.

VoIP

A VoIP implementation is provided that uses the Twilio API.

A VoIP implementation is provided that uses the Telnyx API.

Speech to text (STT)

An STT implementation is provided that uses the Deepgram API.

Text to speech (TTS)

A TTS implementation is provided that uses the Cartesia API.

AI agent

An Agent implementation is provided that uses the OpenAI API.

Custom Implementations

Dialog provides VoIP, STT, Agent, and TTS example implementations. You can use a provided implementation as-is, subclass it, or implement your own. If you plan to implement your own VoIP, STT, Agent, or TTS component, interfaces are provided for each component of the VoIP application.

Custom Agents

A custom Agent implementation will allow you to manage conversation history, turn of speech, agent interruption, STT and TTS selection, and other nuances.

You can extend the provided OpenAIAgent class, as in the example below, or just implement the Agent interface. The straight-forward openai_agent.ts implementation can be used as a guide.

A custom Agent based on openai_agent.ts.

This custom Agent implementation adds a timestamp to each user message.

import { randomUUID } from "node:crypto";
import { log, Agent, OpenAIAgent, OpenAIAgentOptions } from "@farar/dialog";

export class CustomAgent extends OpenAIAgent implements Agent {
  protected mutex: Promise<void>;

  constructor(options: OpenAIAgentOptions) {
    super(options);
    this.mutex = Promise.resolve();
  }
  public onTranscript = (transcript: string): void => {
    this.mutex = (async () => {
      try {
        await this.mutex;
        this.uuid = randomUUID();
        log.notice(`User message: ${transcript}`);
        this.history.push({ role: "user", content: `${new Date().toISOString()}\n${transcript}` });
        this.stream = await this.openAI.chat.completions.create({
          model: "gpt-4o-mini",
          messages: this.history,
          temperature: 0,
          stream: true
        });
        await this.dispatchStream(this.uuid, this.stream);
      }
      catch (err) {
        log.error(err);
      }
    })();
  };
}

Support

If you have a feature request or run into any issues, feel free to submit an issue or start a discussion. You’re also welcome to reach out directly to one of the authors.

About

A modular VoIP ➞ STT ➞ AI Agent ➞ TTS ➞ VoIP implementation.

Topics

Resources

License

Stars

Watchers

Forks