Skip to content

Commit 8880121

Browse files
feat: pronunciation dictionaries docs (#567)
1 parent 2772d84 commit 8880121

File tree

4 files changed

+235
-0
lines changed

4 files changed

+235
-0
lines changed
Lines changed: 233 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,233 @@
1+
---
2+
title: Pronunciation dictionaries
3+
subtitle: Control how your AI assistant pronounces specific words and phrases
4+
slug: assistants/pronunciation-dictionaries
5+
---
6+
7+
## Overview
8+
9+
Pronunciation dictionaries allow you to customize how your AI assistant pronounces specific words, names, acronyms, or technical terms. This feature is particularly useful for ensuring consistent pronunciation of brand names, proper nouns, or industry-specific terminology that might be mispronounced by default.
10+
11+
**Note:** Pronunciation dictionaries are exclusive to ElevenLabs voices and require specific model configurations.
12+
13+
## How Pronunciation Dictionaries Work
14+
15+
<Steps>
16+
<Step title="Create Pronunciation Rules">
17+
Define specific words or phrases and how they should be pronounced using either phonetic notation or word substitutions.
18+
</Step>
19+
20+
<Step title="Upload Dictionary to Vapi">
21+
Create a pronunciation dictionary through Vapi's API with your custom rules.
22+
</Step>
23+
24+
<Step title="Configure Your Assistant">
25+
Associate the pronunciation dictionary with your assistant's voice configuration.
26+
</Step>
27+
28+
<Step title="Automatic Application">
29+
When your assistant encounters the specified words during conversation, it will use your custom pronunciations automatically.
30+
</Step>
31+
</Steps>
32+
33+
## Sample Audio Examples
34+
35+
Below are examples demonstrating the difference between pronunciations with and without pronunciation dictionaries:
36+
37+
Corrected pronunciations:
38+
- "Nginx" → "Engine-X" (using alias rule)
39+
- "Kubernetes" → "/ˌkuːbərˈneɪtiːz/" (using phoneme rule)
40+
41+
**Without Pronunciation Dictionary:**
42+
<audio controls src="/static/audio/without-pronunciation-dictionary.wav">Your browser does not support the audio element.</audio>
43+
44+
**With Pronunciation Dictionary:**
45+
<audio controls src="/static/audio/with-pronunciation-dictionary.wav">Your browser does not support the audio element.</audio>
46+
47+
48+
## Prerequisites
49+
50+
- A Vapi assistant configured with an ElevenLabs voice
51+
- Understanding of phonetic notation (IPA or CMU Arpabet) for phoneme-based rules
52+
- Access to Vapi's API for dictionary creation
53+
54+
## Types of Pronunciation Rules
55+
56+
### Phoneme Rules
57+
58+
Phoneme rules specify exact pronunciation using phonetic alphabets. These provide the most precise control over pronunciation.
59+
60+
**Supported Alphabets:**
61+
- **IPA (International Phonetic Alphabet)**: More universal, uses symbols like `/tə'meɪtoʊ/`
62+
- **CMU Arpabet**: ASCII-based format, uses notation like `T AH M EY T OW`
63+
64+
**Model Compatibility:**
65+
Phoneme rules only work with specific ElevenLabs models:
66+
- `eleven_turbo_v2`
67+
- `eleven_flash_v2`
68+
69+
### Alias Rules
70+
71+
Alias rules replace words with alternative spellings or phrases. These work with all ElevenLabs models and are useful for:
72+
- Converting acronyms to full phrases (e.g., "UN" → "United Nations")
73+
- Providing phonetic spellings for difficult words
74+
- Standardizing pronunciation across different contexts
75+
76+
## Implementation
77+
78+
<Steps>
79+
<Step title="Create a Pronunciation Dictionary">
80+
Use Vapi's API to create a pronunciation dictionary with your custom rules.
81+
82+
```bash
83+
POST https://api.vapi.ai/provider/11labs/pronunciation-dictionary
84+
Content-Type: application/json
85+
Authorization: Bearer YOUR_API_KEY
86+
```
87+
88+
```json
89+
{
90+
"name": "My Custom Dictionary",
91+
"rules": [
92+
{
93+
"stringToReplace": "tomato",
94+
"type": "phoneme",
95+
"phoneme": "/tə'meɪtoʊ/",
96+
"alphabet": "ipa"
97+
},
98+
{
99+
"stringToReplace": "Vapi",
100+
"type": "phoneme",
101+
"phoneme": "V AE P IY",
102+
"alphabet": "cmu-arpabet"
103+
},
104+
{
105+
"stringToReplace": "UN",
106+
"type": "alias",
107+
"alias": "United Nations"
108+
}
109+
]
110+
}
111+
```
112+
113+
The API will respond with:
114+
```json
115+
{
116+
"pronunciationDictionaryId": "rjshI10OgN6KxqtJBqO4",
117+
"versionId": "xJl0ImZzi3cYp61T0UQG",
118+
"name": "My Custom Dictionary",
119+
"rules": [...],
120+
"createdAt": "2024-01-15T10:30:00Z"
121+
}
122+
```
123+
</Step>
124+
125+
<Step title="Configure Your Assistant's Voice">
126+
Update your assistant configuration to use the pronunciation dictionary.
127+
128+
```json
129+
{
130+
"voice": {
131+
"model": "eleven_turbo_v2_5",
132+
"voiceId": "sarah",
133+
"provider": "11labs",
134+
"stability": 0.5,
135+
"similarityBoost": 0.75,
136+
"pronunciationDictionaryLocators": [
137+
{
138+
"pronunciationDictionaryId": "rjshI10OgN6KxqtJBqO4",
139+
"versionId": "xJl0ImZzi3cYp61T0UQG"
140+
}
141+
]
142+
}
143+
}
144+
```
145+
146+
<Note>
147+
When a pronunciation dictionary is added, SSML parsing will be automatically enabled for your assistant.
148+
</Note>
149+
</Step>
150+
151+
<Step title="Test Your Pronunciation">
152+
Create a test call or use the Vapi playground to verify that your custom pronunciations are working correctly.
153+
</Step>
154+
</Steps>
155+
156+
## Using Your Own ElevenLabs Account (BYOK)
157+
158+
If you're using your own ElevenLabs API key (Bring Your Own Key), you can create pronunciation dictionaries directly in your ElevenLabs account and reference them in Vapi:
159+
160+
1. Create a pronunciation dictionary in your ElevenLabs account
161+
2. Note the `pronunciationDictionaryId` and `versionId` from ElevenLabs
162+
3. Use these IDs in your Vapi assistant configuration:
163+
164+
```json
165+
{
166+
"voice": {
167+
"model": "eleven_turbo_v2_5",
168+
"voiceId": "your-voice-id",
169+
"provider": "11labs",
170+
"pronunciationDictionaryLocators": [
171+
{
172+
"pronunciationDictionaryId": "your-elevenlabs-dict-id",
173+
"versionId": "your-elevenlabs-version-id"
174+
}
175+
]
176+
}
177+
}
178+
```
179+
180+
## Managing Pronunciation Dictionaries
181+
182+
### List Your Dictionaries
183+
184+
```bash
185+
GET https://api.vapi.ai/provider/11labs/pronunciation-dictionary
186+
Authorization: Bearer YOUR_API_KEY
187+
```
188+
189+
### Update Dictionary Rules
190+
191+
```bash
192+
PATCH https://api.vapi.ai/provider/11labs/pronunciation-dictionary/{dictionaryId}
193+
Content-Type: application/json
194+
Authorization: Bearer YOUR_API_KEY
195+
```
196+
197+
```json
198+
{
199+
"rules": [
200+
{
201+
"stringToReplace": "tomato",
202+
"type": "phoneme",
203+
"phoneme": "/tə'mɑːtoʊ/",
204+
"alphabet": "ipa"
205+
}
206+
]
207+
}
208+
```
209+
210+
## Best Practices
211+
212+
<Note>
213+
- **Case Sensitivity**: Pronunciation dictionary searches are case-sensitive. Create separate entries for different capitalizations if needed.
214+
- **Order Matters**: Rules are applied in the order they appear in the dictionary. The first matching rule is used.
215+
- **Testing**: Always test pronunciation changes with your specific voice and model combination.
216+
- **Phoneme Accuracy**: Ensure proper stress marking for multi-syllable words when using phoneme rules.
217+
- **Model Compatibility**: Remember that phoneme rules only work with specific ElevenLabs models.
218+
</Note>
219+
220+
## Common Issues
221+
222+
**Pronunciation Not Applied**
223+
- Verify you're using a compatible ElevenLabs model for phoneme rules
224+
- Check that the `stringToReplace` exactly matches the text in your content (case-sensitive)
225+
- Ensure the pronunciation dictionary is properly referenced in your voice configuration
226+
227+
**SSML Conflicts**
228+
- When pronunciation dictionaries are enabled, SSML parsing is automatically activated
229+
- Ensure any existing SSML tags in your content are properly formatted
230+
231+
**Performance Impact**
232+
- Large dictionaries may slightly increase processing time
233+
- Consider organizing rules by frequency of use for optimal performance

fern/docs.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,8 @@ navigation:
146146
path: assistants/assistant-hooks.mdx
147147
- page: Background speech denoising
148148
path: assistants/background-speech-denoising.mdx
149+
- page: Pronunciation dictionaries
150+
path: assistants/pronunciation-dictionaries.mdx
149151
- section: Model configurations
150152
icon: fa-light fa-waveform-lines
151153
contents:
Binary file not shown.
Binary file not shown.

0 commit comments

Comments
 (0)