This project implements a real-time Audio translation system that enables users to input speech in one language and generate speech in another language. The system uses Google Cloud Speech-to-Text API, Google Cloud's Text-to-Speech (TTS) API and Translation API to provide translation and speech output in over 70 languages.
- Python 3.10 🐍
- Uvicorn for running the FastAPI server 🚀
- Google Cloud Speech-to-Text API for converting speech into text 🔊➡️📝
- Google Cloud Translation API for translating text between languages 🌐
- Google Cloud Text-to-Speech API for converting text into speech 🎧
- FastAPI for the backend framework ⚙️
- HTML/CSS for the frontend design 🎨
- JavaScript for handling user interactions 💻
- Real-Time Translation: Users can input Audio input in one language, and it is instantly translated into the target language 🌍🔄
- Text-to-Speech: Translated text is converted into speech using Google Cloud's TTS API 🗣️
- Multiple Language Support: Supports a variety of languages for translation and speech output 🌏
- Web Interface: Provides an interactive interface where users can input Audio and hear the translated speech 💬
-
Create a Google Cloud Account: You need to enable the Google Cloud Speech-to-Text API, Google Cloud Translation API and Google Cloud Text-to-Speech API in your Google Cloud Console.
-
Install Required Python Packages: Install the necessary Python packages using
conda
orpip
.conda create --name translation-env python=3.10 conda activate translation-env pip install fastapi uvicorn google-cloud-translate google-cloud-texttospeech fastapi[all] uvicorn websockets python-multipa pydub
Set up the environment variable for Google Cloud authentication:
```bash
export GOOGLE_APPLICATION_CREDENTIALS="path_to_your_google_cloud_credentials.json"
```
Start the FastAPI server with Uvicorn:
```bash
uvicorn app:app --reload
```
Navigate to http://127.0.0.1:8000 in your web browser to access the translation system.
- GET /: Serves the frontend HTML page, including all the necessary files (JavaScript, CSS, etc.) POST /process-audio: Accepts an uploaded audio file and processes it for speech-to-text conversion, translation, and speech synthesis.
-
Web Interface:
-
Language Selection Menu:
-
Person talks in English and is translated to Japanese text with Audio in background:
-
Person talks in Japanese and is translated to English text with Audio in background:
- Offline Mode: Implement functionality to work offline by caching translations and speech 📴
- Transcription Logging: Save transcriptions of conversations for review and analysis later 🗃️
- Enhanced UI/UX: Improve the user interface and user experience for better usability and accessibility 🌟