Skip to content

prathapbelli/simple-voice-assistant-app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Simple Voice Assistant App

A FastAPI-based voice assistant that processes audio input, transcribes it using OpenAI's Whisper, generates responses using GPT-4, and converts the response back to speech using OpenAI's TTS.

Features

  • Audio file upload and processing
  • Speech-to-text transcription using OpenAI Whisper
  • AI-powered responses using GPT-4
  • Text-to-speech conversion using OpenAI TTS
  • CORS-enabled for frontend integration

Prerequisites

  • Python 3.7 or higher
  • OpenAI API key
  • pip (Python package installer)

Installation

  1. Clone the repository

    git clone <repository-url>
    cd simple-voice-assistant-app
  2. Create a virtual environment (recommended)

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Set up environment variables Or export it directly in your terminal:

    export OPEN_AI_KEY=your_openai_api_key_here

Running the Application

  1. Start the FastAPI server

    uvicorn main:app --reload --host 0.0.0.0 --port 8000
  2. The API will be available at

    • Local: http://localhost:8000
    • API Documentation: http://localhost:8000/docs (Swagger UI)
    • Alternative docs: http://localhost:8000/redoc

API Endpoints

POST /api/process-audio

Processes an audio file and returns a speech response.

Request:

  • Method: POST
  • Content-Type: multipart/form-data
  • Body: audio file (WAV format recommended)

Response:

  • Content-Type: audio/mp3
  • Body: MP3 audio file containing the AI's speech response

Example using curl:

curl -X POST "http://localhost:8000/api/process-audio" \
     -H "accept: audio/mp3" \
     -H "Content-Type: multipart/form-data" \
     -F "audio=@your_audio_file.wav"

Usage Example

  1. Record an audio file (WAV format) with your question or request
  2. Send a POST request to /api/process-audio with the audio file
  3. Receive an MP3 file with the AI's spoken response

Frontend Integration

The API is configured with CORS to allow requests from http://localhost:3000. To integrate with a frontend:

  1. Set up your frontend to run on port 3000
  2. Send audio files to http://localhost:8000/api/process-audio
  3. Handle the returned MP3 audio response

Configuration

  • CORS Origins: Currently set to http://localhost:3000. Modify line 14 in main.py to add additional origins.
  • TTS Voice: Currently using "alloy" voice. Available options: alloy, echo, fable, onyx, nova, shimmer
  • AI Model: Using GPT-4 for responses. You can modify line 33 in main.py to use different models.

Troubleshooting

  1. OpenAI API Key Issues

    • Ensure your API key is correctly set in the environment variable
    • Verify you have sufficient credits in your OpenAI account
  2. Audio Format Issues

    • The app expects audio files in WAV format
    • Ensure your audio file is not corrupted
  3. CORS Issues

    • If you're running the frontend on a different port, update the CORS configuration in main.py

Dependencies

  • FastAPI: Web framework for building APIs
  • OpenAI: Python client for OpenAI API
  • Uvicorn: ASGI server for running FastAPI (included in requirements)

License

This project is open source and available under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages