AudioTranscriber

Summary

AudioTranscriber is an iOS app built with SwiftUI and Swift Concurrency that lets users record audio, transcribe it using Apple Speech and Whisper AI, and manage sessions with SwiftData.

Requirements

iOS 17.6+

API Key Setup

To use the Whisper API transcription feature, you must add your OpenAI API Key to the app’s configuration:

Open Info.plist
Add a API key:

Key: OpenAIAPIKey, Value: Your API key

Screenshots

Architecture Overview

Follows MVVM
AudioRecorder: manages AVAudioEngine
AudioSegmentWriter: writes segmented audio(30 second) to disk
RecordingControlsViewModel: handles recording state and permission flow
TranscriptionQueueManager: an actor responsible for concurrent transcription with retry logic
WhisperTranscriptionService: handles up to 5 concurrent transcriptions via Whisper API
AppleTranscriptionService: fallback if Whisper API fails, Apple Speech-to-Text
SwiftData models: RecordingSession and AudioSegment with a cascading relationship

Audio System Design

Audio is saved in 30-second segments as .m4a files
Monitors:
- AVAudioSession.routeChangeNotification to detect headphone/Bluetooth connection changes
- AVAudioSession.interruptionNotification to handle phone calls, Siri, etc.
Automatically pauses/resumes recording based on hardware or system events
Supports background recording using the audio background mode

Data Model Design (SwiftData)

RecordingSession has a one-to-many relationship with AudioSegment (with cascade delete)
Each AudioSegment stores:
- fileURL
- createdAt
- transcriptionText
fullTranscription is dynamically generated by combining all segment texts in order

Concurrency Handling

TranscriptionQueueManager is implemented as an actor with:
- a task queue and a maxConcurrentTasks limit
- retry and fallback logic for transcription
Uses TaskGroup for concurrent transcription of segments
Applies @MainActor where needed
Uses Sendable to make sure values passed between tasks are safe and won’t cause race conditions

Known Issues & Future Improvements

Whisper API was not fully tested due to credit limitations. If the API key is not properly set, transcription falls back to Speech-to-Text after 5 retries.
Testing was skipped due to time constraints.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
AudioTranscriber		AudioTranscriber
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AudioTranscriber

Summary

Requirements

API Key Setup

Screenshots

Architecture Overview

Audio System Design

Data Model Design (SwiftData)

Concurrency Handling

Known Issues & Future Improvements

About

Uh oh!

Releases

Packages

Languages

License

bazinga94/AudioTranscriber

Folders and files

Latest commit

History

Repository files navigation

AudioTranscriber

Summary

Requirements

API Key Setup

Screenshots

Architecture Overview

Audio System Design

Data Model Design (SwiftData)

Concurrency Handling

Known Issues & Future Improvements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages