A FastAPI-based voice assistant that processes audio input, transcribes it using OpenAI's Whisper, generates responses using GPT-4, and converts the response back to speech using OpenAI's TTS.
- Audio file upload and processing
- Speech-to-text transcription using OpenAI Whisper
- AI-powered responses using GPT-4
- Text-to-speech conversion using OpenAI TTS
- CORS-enabled for frontend integration
- Python 3.7 or higher
- OpenAI API key
- pip (Python package installer)
-
Clone the repository
git clone <repository-url> cd simple-voice-assistant-app
-
Create a virtual environment (recommended)
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables Or export it directly in your terminal:
export OPEN_AI_KEY=your_openai_api_key_here
-
Start the FastAPI server
uvicorn main:app --reload --host 0.0.0.0 --port 8000
-
The API will be available at
- Local:
http://localhost:8000
- API Documentation:
http://localhost:8000/docs
(Swagger UI) - Alternative docs:
http://localhost:8000/redoc
- Local:
Processes an audio file and returns a speech response.
Request:
- Method: POST
- Content-Type: multipart/form-data
- Body: audio file (WAV format recommended)
Response:
- Content-Type: audio/mp3
- Body: MP3 audio file containing the AI's speech response
Example using curl:
curl -X POST "http://localhost:8000/api/process-audio" \
-H "accept: audio/mp3" \
-H "Content-Type: multipart/form-data" \
-F "audio=@your_audio_file.wav"
- Record an audio file (WAV format) with your question or request
- Send a POST request to
/api/process-audio
with the audio file - Receive an MP3 file with the AI's spoken response
The API is configured with CORS to allow requests from http://localhost:3000
. To integrate with a frontend:
- Set up your frontend to run on port 3000
- Send audio files to
http://localhost:8000/api/process-audio
- Handle the returned MP3 audio response
- CORS Origins: Currently set to
http://localhost:3000
. Modify line 14 inmain.py
to add additional origins. - TTS Voice: Currently using "alloy" voice. Available options: alloy, echo, fable, onyx, nova, shimmer
- AI Model: Using GPT-4 for responses. You can modify line 33 in
main.py
to use different models.
-
OpenAI API Key Issues
- Ensure your API key is correctly set in the environment variable
- Verify you have sufficient credits in your OpenAI account
-
Audio Format Issues
- The app expects audio files in WAV format
- Ensure your audio file is not corrupted
-
CORS Issues
- If you're running the frontend on a different port, update the CORS configuration in
main.py
- If you're running the frontend on a different port, update the CORS configuration in
- FastAPI: Web framework for building APIs
- OpenAI: Python client for OpenAI API
- Uvicorn: ASGI server for running FastAPI (included in requirements)
This project is open source and available under the MIT License.