A pure-trait fluent builder API for Text-to-Speech (TTS) and Speech-to-Text (STT) engines in Rust.
Fluent Voice follows a simple, elegant pattern for all voice operations:
One fluent chain β One matcher closure β One .await?
This design eliminates the complexity of multiple awaits, nested async calls, and scattered error handling that plague many voice APIs.
- π Unified API: Single interface for both TTS and STT operations
- β‘ Single Await: All operations complete with exactly one
.await?
- π Multi-Speaker: Built-in support for conversations with multiple speakers
- π§ Engine Agnostic: Works with any TTS/STT engine through trait implementations
- ποΈ Rich Configuration: Comprehensive settings for voice control, audio processing, and recognition
- π Streaming: Real-time audio streams and transcript processing
- π‘οΈ Type Safe: Leverages Rust's type system for compile-time correctness
- π Well Documented: Extensive documentation with practical examples
Add this to your Cargo.toml
:
[dependencies]
fluent_voice = "0.1.0"
For async runtime support, also add:
tokio = { version = "1", features = ["full"] }
futures-util = "0.3"
// Complete fluent chain with microphone support, VAD and wake word detection
let _transcript = FluentVoice::stt()
.with_source(SpeechSource::Microphone {
backend: MicBackend::Default,
format: AudioFormat::Pcm16Khz,
sample_rate: 16_000,
})
.vad_mode(VadMode::Accurate)
.language_hint(Language("en-US"))
.diarization(Diarization::On)
.word_timestamps(WordTimestamps::On)
.punctuation(Punctuation::On)
.listen(|segment| {
Ok => segment.text(), // streaming chunks
Err(e) => Err(e),
})
.collect(); // transcript is now the end-state string
use fluent_voice::prelude::*;
use futures_util::StreamExt;
#[tokio::main]
async fn main() -> Result<(), VoiceError> {
// Note: Requires an engine implementation (see Engine Integration below)
let mut audio_stream = FluentVoice::tts().conversation()
.with_speaker(
Speaker::speaker("Narrator")
.voice_id(VoiceId::new("voice-uuid"))
.with_speed_modifier(VocalSpeedMod(0.9))
.speak("Hello, world!")
.build()
)
.with_speaker(
Speaker::speaker("Bob")
.with_speed_modifier(VocalSpeedMod(1.1))
.speak("Hi Alice! How are you today?")
.build()
)
.synthesize(|conversation| {
Ok => conversation.into_stream(), // Returns audio stream
Err(e) => Err(e),
})
.play();
// Process audio samples
while let Some(sample) = audio_stream.next().await {
// Play sample or save to file
println!("Audio sample: {}", sample);
}
Ok(())
}
use fluent_voice::prelude::*;
use futures_util::StreamExt;
#[tokio::main]
async fn main() -> Result<(), VoiceError> {
let mut transcript_stream = FluentVoice::stt(). conversation()
.with_source(SpeechSource::Microphone {
backend: MicBackend::Default,
format: AudioFormat::Pcm16Khz,
sample_rate: 16_000,
})
.vad_mode(VadMode::Accurate)
.language_hint(Language("en-US"))
.diarization(Diarization::On) // Speaker identification
.word_timestamps(WordTimestamps::On)
.punctuation(Punctuation::On)
.listen(|conversation| {
Ok => conversation.into_stream(), // Returns transcript stream
Err(e) => Err(e),
})
.await?; // Single await point
// Process transcript segments
while let Some(result) = transcript_stream.next().await {
match result {
Ok(segment) => {
println!("[{:.2}s] {}: {}",
segment.start_ms() as f32 / 1000.0,
segment.speaker_id().unwrap_or("Unknown"),
segment.text()
);
},
Err(e) => eprintln!("Recognition error: {}", e),
}
}
Ok(())
}
Fluent Voice is built around a pure-trait architecture:
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β User Code β β Fluent Voice β β Engine Impls β
β β β (Traits) β β (Concrete) β
βββββββββββββββββββ€ ββββββββββββββββββββ€ βββββββββββββββββββ€
β .conversation() βββββΆβ TtsConversation ββββββ ElevenLabsImpl β
β .with_speaker() β β Builder β β OpenAIImpl β
β .synthesize() β β β β AzureImpl β
β .await? β β SttConversation β β GoogleImpl β
βββββββββββββββββββ β Builder β β WhisperImpl β
ββββββββββββββββββββ βββββββββββββββββββ
TtsEngine
/SttEngine
: Engine registration and initializationTtsConversationBuilder
/SttConversationBuilder
: Fluent configuration APITtsConversation
/SttConversation
: Runtime session objectsSpeaker
/SpeakerBuilder
: Voice and speaker configurationTranscriptSegment
/TranscriptStream
: STT result handling
let conversation = engine.conversation()
.with_speaker(
Speaker::speaker("Narrator")
.voice_id(VoiceId::new("narrator-voice"))
.language(Language("en-US"))
.with_speed_modifier(VocalSpeedMod(0.8)) // Slower speech
.with_pitch_range(PitchRange::new(80.0, 200.0)) // Pitch control
.speak("Your text here")
.build()
)
.language(Language("en-US")) // Global language setting
.synthesize(/* matcher */)
.await?;
let conversation = engine.conversation()
.with_source(SpeechSource::File {
path: "audio.wav".to_string(),
format: AudioFormat::Pcm16Khz,
})
.vad_mode(VadMode::Accurate) // Voice activity detection
.noise_reduction(NoiseReduction::High) // Background noise filtering
.language_hint(Language("en-US")) // Language optimization
.diarization(Diarization::On) // Speaker identification
.timestamps_granularity(TimestampsGranularity::Word) // Timing precision
.punctuation(Punctuation::On) // Auto-punctuation
.listen(|conversation|{ Ok => })
.await?;
Fluent Voice is designed to work with any TTS/STT service. Engine implementations provide concrete types that implement the core traits.
- ElevenLabs:
elevenlabs-fluent-voice
(planned) - OpenAI:
openai-fluent-voice
(planned) - Azure Cognitive Services:
azure-fluent-voice
(planned) - Google Cloud:
google-fluent-voice
(planned) - Local Whisper:
whisper-fluent-voice
(planned)
use fluent_voice::prelude::*;
// 1. Define your engine struct
pub struct MyEngine {
api_key: String,
}
// 2. Implement the engine trait
impl TtsEngine for MyEngine {
type Conv = MyConversationBuilder;
fn conversation(&self) -> Self::Conv {
MyConversationBuilder::new(self.api_key.clone())
}
}
// 3. Implement the conversation builder
pub struct MyConversationBuilder { /* ... */ }
impl TtsConversationBuilder for MyConversationBuilder {
type Conversation = MyConversation;
fn with_speaker<S: Speaker>(self, speaker: S) -> Self { /* ... */ }
fn language(self, lang: Language) -> Self { /* ... */ }
fn synthesize<F, R>(self, matcher: F) -> impl Future<Output = R> + Send
where F: FnOnce(Result<Self::Conversation, VoiceError>) -> R + Send + 'static
{
async move {
// Perform synthesis and call matcher with result
let result = self.do_synthesis().await;
matcher(result)
}
}
}
// 4. Implement the conversation object
impl TtsConversation for MyConversation {
type AudioStream = impl Stream<Item = i16> + Send + Unpin;
fn into_stream(self) -> Self::AudioStream {
// Convert to audio stream
}
}
let audio = primary_engine.conversation()
.with_speaker(speaker)
.synthesize(|conversation| {
match conversation {
Ok(conv) => Ok(conv.into_stream()),
Err(primary_error) => {
eprintln!("Primary engine failed: {}", primary_error);
// Could try fallback engine here
Err(primary_error)
}
}
})
.await
.or_else(|_| {
// Fallback to different engine or settings
fallback_engine.conversation()
.with_speaker(speaker)
.synthesize(|conv| {
Ok => conv.into_stream(),
Err(e) => Err(e),
})
})?;
let mut audio_stream = engine.conversation()
.with_speaker(speaker)
.synthesize(|conv| Ok => conv.into_stream(), Err(e) => Err(e))
.await?;
// Apply real-time effects
while let Some(sample) = audio_stream.next().await {
let processed_sample = apply_effects(sample);
audio_output.play(processed_sample)?;
}
let mut transcript_stream = engine.conversation()
.with_source(SpeechSource::from_file("meeting.wav", AudioFormat::Pcm16Khz))
.diarization(Diarization::On)
.listen(|conv| Ok => conv.into_stream(), Err(e) => Err(e))
.await?;
// Collect and format transcript
let mut segments = Vec::new();
while let Some(result) = transcript_stream.next().await {
if let Ok(segment) = result {
segments.push(segment);
}
}
// Generate formatted output
generate_transcript_document(segments)?;
Run the test suite:
cargo test
Run examples:
cargo run --example api_usage
We welcome contributions! Please see our Contributing Guide for details.
- Clone the repository
- Install Rust (latest stable)
- Run tests:
cargo test
- Run examples:
cargo run --example api_usage
- Follow Rust conventions and
cargo fmt
- Add tests for new functionality
- Document public APIs with examples
- Use
cargo clippy
for linting
This project is licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
- π Documentation
- π Issue Tracker
- π¬ Discussions
Made with β€οΈ for the Rust community