Skip to content

VAP3-957: Adding Inworld docs page and removing Sesame voice page #560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 8, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions fern/apis/api/openapi-overrides.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1083,8 +1083,6 @@ components:
title: TavusVoice
VapiVoice:
title: VapiVoice
SesameVoice:
title: SesameVoice
AIEdgeCondition:
title: AIEdgeCondition
LogicEdgeCondition:
Expand Down
4 changes: 2 additions & 2 deletions fern/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -470,8 +470,8 @@ navigation:
path: providers/voice/rimeai.mdx
- page: Deepgram
path: providers/voice/deepgram.mdx
- page: Sesame
path: providers/voice/sesame.mdx
- page: Inworld
path: providers/voice/inworld.mdx
- section: Video models
contents:
- page: Tavus
Expand Down
61 changes: 61 additions & 0 deletions fern/providers/voice/inworld.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
title: InworldAI
subtitle: What is Inworld.ai?
slug: providers/voice/inworld
---

**What is Inworld.ai?**

Inworld.ai provides developers with tools to create lifelike voice agents. It supports zero-shot voice cloning, enabling the creation of personalized voices from short audio samples. The system is optimized for low-latency streaming, making it suitable for applications requiring immediate audio responses.

**The Evolution of AI Speech Synthesis:**

Advancements in deep learning and neural networks have significantly improved the quality of AI-generated speech. Inworld.ai leverages these developments to deliver natural-sounding, emotionally expressive voices suitable for various applications, including virtual assistants and interactive games.

**Overview of Inworld.ai's Offerings:**

Inworld.ai provides a comprehensive suite of features designed to meet diverse voice synthesis needs:

**Real-Time Speech Synthesis:**

Inworld.ai is engineered for low-latency performance, delivering the first two seconds of audio in approximately 200 milliseconds. This responsiveness is critical for real-time applications such as conversational agents and interactive gaming characters.

**Zero-Shot Voice Cloning:**

The platform offers zero-shot voice cloning, allowing developers to create custom voices from as little as 5 seconds of audio input. This feature facilitates the development of unique voice identities for various applications.

**Multilingual Support:**

Inworld.ai supports 11 languages, including English, Spanish, French, Korean, and Chinese. This multilingual capability enables developers to build applications for diverse global audiences.

**Audio Markup Controls:**

Developers can use audio markup tags such as [happy], [whispering], or [sigh] to control the emotional tone and style of the synthesized speech. This feature enhances the expressiveness of voice agents.

**Developer API:**

Inworld.ai provides an API with comprehensive documentation, facilitating integration into various applications. The API supports real-time streaming and offers options for customizing voice parameters to suit specific use cases.

**Use Cases for Inworld.ai:**

Inworld.ai's versatile platform supports a wide range of applications:

**Interactive Applications:**

Developers can create responsive voice agents for customer service, virtual assistants, and interactive gaming characters, enhancing user engagement through natural-sounding speech.

**Content Creation:**

Content creators can utilize Inworld.ai to generate high-quality voiceovers for videos, podcasts, and other media, streamlining the production process.

**Education and Training:**

Educational platforms can employ Inworld.ai to provide clear and expressive narration for e-learning materials, improving the learning experience for users.

**Integration with Vapi:**

Inworld.ai is integrated with Vapi, allowing developers to access its features through the Vapi platform. This integration simplifies the process of building and deploying voice agents, offering tools for testing and optimizing performance before production.

**Conclusion:**

Inworld.ai offers a combination of expressive voice synthesis, low-latency performance, and multilingual support, making it a valuable tool for developers seeking to enhance their applications with natural-sounding speech.
41 changes: 0 additions & 41 deletions fern/providers/voice/sesame.mdx

This file was deleted.

Loading