Update more realtime spec #397

codesoda · 2025-06-30T04:59:14Z

🚀 Summary

Syncs async-openai realtime types with the latest OpenAI Realtime API (June 2025).
Adds richer request/response configs, new client & server events, extra enums for models / voices / modalities, plus tracing & noise-reduction support.

✨ What’s new

Client events
- Added ResponseConfig, OutputAudioBufferClearEvent, ConversationItemRetrieveEvent.
- ResponseCancelEvent gains response_id.
- ResponseCreateEvent now uses ResponseConfig instead of SessionResource.
Server events
- Added output_audio_buffer.cleared, conversation.item.input_audio_transcription.delta, conversation.item.retrieved.
- Fixed typo: InputAudioBufferCommitedEvent → InputAudioBufferCommittedEvent.
Response resource
- New fields: finish_reason, created_at.
- New finish reasons: TokenLimit, FunctionCall.
Session resource
- New enums: RealtimeModel, Modality, NoiseReductionType.
- Added fields: speed, input_audio_noise_reduction, tracing.
- model is now RealtimeModel; modalities is Vec<Modality>.
Turn detection
- Introduced semantic_vad mode with create_response and interrupt_response flags.
Audio
- Unified enum names (g711_ulaw, g711_alaw).
- Added InputAudioNoiseReduction.
Tooling
- Wired ToolChoice & ToolDefinition into ResponseConfig.

⚠️ Breaking changes

ResponseCreateEvent: response now expects ResponseConfig, not SessionResource.
Enum casing: g711-ulaw / g711-alaw → g711_ulaw / g711_alaw.
Event rename: InputAudioBufferCommitedEvent → InputAudioBufferCommittedEvent.
Typed model field: SessionResource.model is now RealtimeModel (no longer a free-form String).

- Added `Cancelled` variant to `ResponseStatusDetail` enum for better handling of cancelled responses. - Introduced `LogProb` struct to capture log probability information for transcribed tokens. - Updated `ConversationItemInputAudioTranscriptionCompletedEvent` and `ConversationItemInputAudioTranscriptionDeltaEvent` to include optional `logprobs` for per-token log probability data. - Enhanced `AudioTranscription` struct with optional fields for `language`, `model`, and `prompt` to improve transcription accuracy and customization. - Added new `SemanticVAD` option in the `TurnDetection` enum to control model response eagerness. - Expanded `RealtimeVoice` enum with additional voice options for more variety in audio responses.

- Changed enum variants for `AudioFormat` to use underscores instead of hyphens in their serialized names. - Updated `G711ULAW` from `g711-ulaw` to `g711_law` and `G711ALAW` from `g711-alaw` to `g711_alaw` for improved clarity and adherence to naming conventions.

…nd response management

codesoda and others added 5 commits June 23, 2025 16:28

feat: add auto-response options to VAD configurations

2bb05e3

feat: add realtime API types and event handling for audio, tracing, a…

479bf1e

…nd response management

Merge branch 'main' into chore/update-realtime-spec

edcae25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update more realtime spec #397

Update more realtime spec #397

Uh oh!

codesoda commented Jun 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Update more realtime spec #397

Are you sure you want to change the base?

Update more realtime spec #397

Uh oh!

Conversation

codesoda commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

codesoda commented Jun 30, 2025 •

edited

Loading