phonic-api

Live Audio Transcript application that combines real-time audio streaming with AI-powered transcription.

Project Structure

backend/: Contains the FastAPI backend implementation.
- src/: Main application code.
  - main.py: Entry point for the FastAPI application.
  - api/: WebSocket routes for real-time communication.
  - services/: Integration logic for Whisper AI service.
  - models/: Data models for the application.
- requirements.txt: Lists dependencies for the backend.
- README.md: Documentation for the backend.
frontend/: Contains the React frontend implementation.
- public/: Static files for the React application.
  - index.html: Main HTML file for the React app.
- src/: Source code for the React application.
  - App.tsx: Main component of the React application.
  - index.tsx: Entry point for the React application.
  - components/: Reusable components for the application.
    - Recorder.tsx: Component for capturing audio input.
    - Transcript.tsx: Component for displaying live transcription.
  - services/: Logic for managing WebSocket connections.
    - websocket.ts: Functions to connect and communicate with the backend.
- package.json: Configuration file for npm.
- tsconfig.json: TypeScript configuration file.
- README.md: Documentation for the frontend.
ai/: Contains the integration logic for the Whisper AI service.
- whisper_integration.py: Functions to interact with the Whisper API.
- README.md: Documentation for the AI integration.

Demo

You can try on: here

A demonstration of the application:

Key Features

Speak into the microphone to see live transcription.
Save transcripts with timestamps.
Combines AI with real-time streaming.
Optimized for low latency (e.g., chunk audio every 2 seconds).

Setup Instructions

Clone the repository.
Navigate to the backend directory and install dependencies using:
```
pip install -r requirements.txt
```
Navigate to the frontend directory and install dependencies using:
```
npm install
```
Start the backend server:
```
uvicorn app.main:app --reload
```
Start the frontend application:
```
npm start
```

Running with Docker Compose

You can run the entire application stack (frontend, backend, ai and Redis) using Docker Compose. This ensures all services are started and networked correctly for real-time WebSocket communication and AI integration.

Steps

Build and Start All Services From the root directory (phonic-ai/), run:
```
docker-compose up --build
```
- Build the frontend and backend images.
- Start the backend (FastAPI + Redis), frontend (React), and Redis services.
- Expose the frontend on http://localhost:3000 and backend API on http://localhost:8000.
Access the Application
- Open your browser and go to http://localhost:3000 to use the app.
- The frontend will communicate with the backend via WebSockets and the backend will use Redis for rate limiting and session management.
Stopping the Services To stop all running containers, press Ctrl+C in the terminal where Docker Compose is running, then run:
```
docker-compose down
```

Access containers

docker exec -it phonic-ai-frontend bash
docker exec -it phonic-ai-backend bash
docker exec -it phonic-ai-redis redis-cli

Remarks

This project leverages modern web technologies to provide a seamless experience for live transcription using AI.

Sequence Diagram

The following sequence diagram illustrates the flow of audio transcription in the application:

User Interaction: The process begins when the User initiates a recording session in the frontend application.
Frontend to Backend: The Frontend captures the audio stream and sends it to the Backend using a WebSocket connection, transmitting audio chunks in real time.
Backend to Whisper Service: The Backend receives the audio data and forwards it to the WhisperService for transcription by calling the transcribe(audio_data) method.
Whisper Service to AI Integration: The WhisperService interacts with the WhisperIntegration (AI) component, requesting transcription by invoking get_transcription(temp_audio).
AI Integration Response: The WhisperIntegration (AI) processes the audio and returns the transcription segments, including text and timestamp information, back to the WhisperService.
Transcription Propagation: The WhisperService sends the transcription segments to the Backend, which then emits a WebSocket "transcription" event (or an HTTP response) to the Frontend.
Frontend Display: Finally, the Frontend displays the live transcript, including timestamps, to the User.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
ai		ai
backend		backend
docs		docs
frontend		frontend
phonic-ai-chart		phonic-ai-chart
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
multiroot_workspaces.code-workspace		multiroot_workspaces.code-workspace

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

phonic-api

Project Structure

Demo

Key Features

Setup Instructions

Running with Docker Compose

Steps

Remarks

Sequence Diagram

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

gmunumel/phonic-ai

Folders and files

Latest commit

History

Repository files navigation

phonic-api

Project Structure

Demo

Key Features

Setup Instructions

Running with Docker Compose

Steps

Remarks

Sequence Diagram

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages