Important
This project is currently in research phase, primarily focused on exploring the applications of OpenAI Realtime API in real-time speech translation. Code and features may change at any time, and it is not recommended for production use. Issues and Pull Requests are welcome to help improve the project.
VoxAudio is a real-time speech translation tool based on OpenAI Realtime API. It captures audio input in real-time, translates it through OpenAI's API, and outputs the translated speech.
- Explore applications of OpenAI Realtime API in real-time speech translation
- Research best practices for low-latency speech processing
- Test the impact of different audio devices and sample rates on translation quality
- Capture local audio stream input, transmit to OpenAI servers via WebRTC, and perform real-time translation
- Optimize real-time speech processing performance
- Real-time audio capture and processing
- Real-time speech translation using OpenAI Realtime API
- Support for multiple language pairs
- Low-latency real-time speech processing
- Custom audio device selection
- Audio loopback testing functionality
- WAV file export for verification
- WebRTC for real-time communication
- Support for multiple audio formats and sample rates
- Audio device management and selection
- Comprehensive test suite
- Audio data smoothing and noise reduction
- Go 1.16 or higher
- OpenAI API key
- Audio input device (microphone)
- Set environment variables:
export OPENAI_API_KEY="your-api-key"
export OPENAI_MODEL="gpt-4o-mini-realtime-preview"
export TEST_AUDIO_DEVICE="your-audio-device-name" # for testing
- Install dependencies:
go mod download
// Create new session
session, err := NewSession(apiKey, model, "English", "alloy")
if err != nil {
log.Fatal(err)
}
defer session.Stop()
// Establish WebRTC connection
err = session.Conn()
if err != nil {
log.Fatal(err)
}
// Register audio track
session.RegisterLocalTrack()
// Start audio capture
err = session.Start(deviceName)
if err != nil {
log.Fatal(err)
}
The project includes several test cases:
TestIntegratedRealtime
: End-to-end integration testTestRealtimeConnection
: WebRTC connection testTestLoopbackRecorder
: Audio loopback test
Run tests:
go test -v
This project is currently in research phase, focusing on:
- Real-time speech translation accuracy and latency optimization
- Translation quality comparison between different language pairs
- Audio processing algorithm improvements
- WebRTC connection stability enhancement
Issues and Pull Requests are welcome, especially for:
- New language pair support
- Audio processing algorithm optimization
- Performance improvement suggestions
- User experience feedback
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request