Skip to content

iOS app that records audio, transcribes it using a backend service, and manages datasets using SwiftData.

License

Notifications You must be signed in to change notification settings

bazinga94/AudioTranscriber

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 

Repository files navigation

AudioTranscriber

Summary

AudioTranscriber is an iOS app built with SwiftUI and Swift Concurrency that lets users record audio, transcribe it using Apple Speech and Whisper AI, and manage sessions with SwiftData.

Requirements

  • iOS 17.6+

API Key Setup

To use the Whisper API transcription feature, you must add your OpenAI API Key to the app’s configuration:

  1. Open Info.plist
  2. Add a API key:

Key: OpenAIAPIKey, Value: Your API key

Screenshots

Architecture Overview

  • Follows MVVM
  • AudioRecorder: manages AVAudioEngine
  • AudioSegmentWriter: writes segmented audio(30 second) to disk
  • RecordingControlsViewModel: handles recording state and permission flow
  • TranscriptionQueueManager: an actor responsible for concurrent transcription with retry logic
  • WhisperTranscriptionService: handles up to 5 concurrent transcriptions via Whisper API
  • AppleTranscriptionService: fallback if Whisper API fails, Apple Speech-to-Text
  • SwiftData models: RecordingSession and AudioSegment with a cascading relationship

Audio System Design

  • Audio is saved in 30-second segments as .m4a files
  • Monitors:
    • AVAudioSession.routeChangeNotification to detect headphone/Bluetooth connection changes
    • AVAudioSession.interruptionNotification to handle phone calls, Siri, etc.
  • Automatically pauses/resumes recording based on hardware or system events
  • Supports background recording using the audio background mode

Data Model Design (SwiftData)

  • RecordingSession has a one-to-many relationship with AudioSegment (with cascade delete)
  • Each AudioSegment stores:
    • fileURL
    • createdAt
    • transcriptionText
  • fullTranscription is dynamically generated by combining all segment texts in order

Concurrency Handling

  • TranscriptionQueueManager is implemented as an actor with:
    • a task queue and a maxConcurrentTasks limit
    • retry and fallback logic for transcription
  • Uses TaskGroup for concurrent transcription of segments
  • Applies @MainActor where needed
  • Uses Sendable to make sure values passed between tasks are safe and won’t cause race conditions

Known Issues & Future Improvements

  • Whisper API was not fully tested due to credit limitations. If the API key is not properly set, transcription falls back to Speech-to-Text after 5 retries.
  • Testing was skipped due to time constraints.

About

iOS app that records audio, transcribes it using a backend service, and manages datasets using SwiftData.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages