Skip to content

Update more realtime spec #397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

codesoda
Copy link
Contributor

@codesoda codesoda commented Jun 30, 2025

🚀 Summary

Syncs async-openai realtime types with the latest OpenAI Realtime API (June 2025).
Adds richer request/response configs, new client & server events, extra enums for models / voices / modalities, plus tracing & noise-reduction support.


✨ What’s new

  • Client events

    • Added ResponseConfig, OutputAudioBufferClearEvent, ConversationItemRetrieveEvent.
    • ResponseCancelEvent gains response_id.
    • ResponseCreateEvent now uses ResponseConfig instead of SessionResource.
  • Server events

    • Added output_audio_buffer.cleared, conversation.item.input_audio_transcription.delta, conversation.item.retrieved.
    • Fixed typo: InputAudioBufferCommitedEventInputAudioBufferCommittedEvent.
  • Response resource

    • New fields: finish_reason, created_at.
    • New finish reasons: TokenLimit, FunctionCall.
  • Session resource

    • New enums: RealtimeModel, Modality, NoiseReductionType.
    • Added fields: speed, input_audio_noise_reduction, tracing.
    • model is now RealtimeModel; modalities is Vec<Modality>.
  • Turn detection

    • Introduced semantic_vad mode with create_response and interrupt_response flags.
  • Audio

    • Unified enum names (g711_ulaw, g711_alaw).
    • Added InputAudioNoiseReduction.
  • Tooling

    • Wired ToolChoice & ToolDefinition into ResponseConfig.

⚠️ Breaking changes

  • ResponseCreateEvent: response now expects ResponseConfig, not SessionResource.
  • Enum casing: g711-ulaw / g711-alawg711_ulaw / g711_alaw.
  • Event rename: InputAudioBufferCommitedEventInputAudioBufferCommittedEvent.
  • Typed model field: SessionResource.model is now RealtimeModel (no longer a free-form String).

codesoda and others added 5 commits June 23, 2025 16:28
- Added `Cancelled` variant to `ResponseStatusDetail` enum for better handling of cancelled responses.
- Introduced `LogProb` struct to capture log probability information for transcribed tokens.
- Updated `ConversationItemInputAudioTranscriptionCompletedEvent` and `ConversationItemInputAudioTranscriptionDeltaEvent` to include optional `logprobs` for per-token log probability data.
- Enhanced `AudioTranscription` struct with optional fields for `language`, `model`, and `prompt` to improve transcription accuracy and customization.
- Added new `SemanticVAD` option in the `TurnDetection` enum to control model response eagerness.
- Expanded `RealtimeVoice` enum with additional voice options for more variety in audio responses.
- Changed enum variants for `AudioFormat` to use underscores instead of hyphens in their serialized names.
- Updated `G711ULAW` from `g711-ulaw` to `g711_law` and `G711ALAW` from `g711-alaw` to `g711_alaw` for improved clarity and adherence to naming conventions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant