A microservice-based system that performs Automatic Speech Recognition (ASR) on English audio files and translates the text to Persian. The system is built with Django and uses an Event-Driven Architecture (EDA) with RabbitMQ for communication between services.
- System Architecture
- Prerequisites
- Installation
- Running the System
- Testing
- Usage
- Features
- Docker Deployment
- Performance Tuning
- Dependencies
The system consists of three main components:
- API Gateway (Django): Handles file uploads and translation status requests
- ASR Service: Performs speech-to-text conversion using VOSK
- Translation Service: Translates English text to Persian using Argostranslate
All components communicate asynchronously through RabbitMQ events.
- Python 3.11+
- RabbitMQ Server
- VOSK English model (vosk-model-small-en-us-0.15)
- Docker
- Prometheus & Grafana (for monitoring)
- PostgreSQL (recommended) or SQLite
- Redis (for caching)
- Clone the repository:
git clone https://github.com/arfa79/ASR-Translator-as-Microservice.git
cd ASR-Translator-as-Microservice- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Set up environment variables:
# Generate a .env file with secure settings
python generate_env.py
# Or create .env manually with necessary settings:
# SECRET_KEY, DB_* settings, etc.- Set up PostgreSQL:
# Install PostgreSQL if not already installed
# On Ubuntu/Debian:
sudo apt install postgresql postgresql-contrib
# Create database
sudo -u postgres createdb asr_translator
# Or use the database settings you specified in the .env file-
Download VOSK model:
- Download vosk-model-small-en-us-0.15
- Extract it to the project root directory
-
Set up RabbitMQ:
- Install Erlang
- Install RabbitMQ Server
- Start RabbitMQ service
-
Set up Redis (optional, but recommended for caching):
# Install Redis if not already installed
# On Ubuntu/Debian:
sudo apt install redis-server
# Start Redis
sudo service redis-server start- Initialize Django:
python manage.py migrate
python manage.py createsuperuser # Optional, for admin access- Clone the repository:
git clone https://github.com/yourusername/ASR-Translator-as-Microservice.git
cd ASR-Translator-as-Microservice- Create a
.envfile from the example template:
cp env.example .env- Edit the
.envfile to configure your environment:- Update
SECRET_KEYwith a secure key - Set database credentials
- Configure other settings as needed
- Update
You need to run three components in separate terminals:
- Django Server:
python manage.py runserver- ASR Service:
python asr_system.py- Translation Service:
python translator_agent.py- (Optional) Run with metrics collection and autoscaling:
# Verify dependencies and configure autoscaling
./setup_autoscaling.sh
# Run the integrated system
python -m asr_translator.main- Build and start all services:
docker-compose up -d- Check service status:
docker-compose ps- Access the application:
- Web API: http://localhost:8000/
- RabbitMQ Management: http://localhost:15672/ (username/password from .env)
- Prometheus: http://localhost:9090/
- Grafana: http://localhost:3000/ (default admin/admin)
The system includes a comprehensive testing suite in the tests/ directory:
To run all tests using pytest:
# Run all tests
cd tests
pytest-
test_vosk_model.py: Tests for the VOSK speech recognition functionality
test_model_loading: Verifies VOSK model loadingtest_recognizer_creation: Tests KaldiRecognizer creationtest_audio_processing: Tests standard audio file processingtest_8k_audio_processing: Tests 8kHz audio file processing
-
Integration Tests: Tests for API endpoints and service communication
-
Performance Tests: Tests for system performance under load
The conftest.py file contains fixtures used across tests:
vosk_model: Loads the VOSK model for testingdata_dir: Creates and manages test data directorysample_audio_file: Provides sample audio for testingservice_url: Configures the service URL for testing
- Upload Audio File:
POST http://localhost:8000/upload/
Content-Type: multipart/form-data
Body: audio=@your-file.wavResponse:
{
"status": "accepted",
"file_id": "unique-identifier",
"message": "File uploaded successfully and processing has begun"
}- Check Translation Status:
GET http://localhost:8000/translation/Response:
{
"file_id": "unique-identifier",
"translation": "Persian translation" # If completed
}or
{
"file_id": "unique-identifier",
"status": "transcribing|translating" # If in progress
}- Asynchronous processing using event-driven architecture
- Automatic file cleanup after processing
- Health monitoring for both services
- Rate limiting for API endpoints
- Comprehensive error handling and logging
- Support for WAV audio files
- Automatic retry logic for service connections
- Streaming Processing: ASR processing in chunks for immediate feedback
- Parallel Processing: Large audio files split into segments and processed concurrently
- Model Caching: VOSK models loaded once and kept in memory
- Translation Caching: Redis-based caching for translations to avoid redundant work
- Message Priorities: RabbitMQ message priorities based on file size
- CPU Affinity Settings: Services assigned to specific CPU cores
- Message Compression: zlib compression for RabbitMQ messages
- HTTP Streaming Responses: Real-time updates to clients
- PostgreSQL Database: High-performance database for production use
The system includes a built-in metrics collection system using Prometheus:
- Setup Monitoring Stack:
./monitoring/setup_monitoring.sh
cd monitoring
docker-compose up -d-
Available Metrics:
- Request Rates: Audio uploads, ASR requests, and translations
- Processing Times: Duration measurements for each step
- Resource Usage: Memory and CPU monitoring
- Queue Sizes: RabbitMQ queue monitoring
- Cache Hit Ratio: Translation cache performance
-
Access Dashboards:
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000 (login with admin/admin)
The system can dynamically scale based on workload metrics:
- Setup Autoscaling:
# Verify dependencies and configure autoscaling
./setup_autoscaling.sh
# Enable autoscaling
export ENABLE_AUTOSCALING=True-
Scaling Logic:
- Scales up when queue sizes exceed thresholds
- Scales up when CPU usage is too high
- Scales up when processing times are too long
- Scales down during low load periods
-
Configuration: Customize thresholds via environment variables
export QUEUE_HIGH_THRESHOLD=10
export CPU_HIGH_THRESHOLD=70.0
export PROCESSING_TIME_THRESHOLD=30.0The Docker setup includes the following services:
- web: Django API server for handling HTTP requests
- asr_worker: Speech recognition worker service using VOSK
- translator_worker: Text translation worker service using Argostranslate
- db: PostgreSQL database for persistent data storage
- redis: Redis cache for improved performance
- rabbitmq: Message broker for communication between services
- prometheus: Metrics collection for monitoring
- grafana: Visualization dashboard for metrics
The Docker Compose configuration is designed to work with the application's internal CPU affinity and resource management:
- CPU Limits: Set to zero (
cpus: '0') to allow the application to manage its own CPU allocation through CPU affinity settings. - CPU Reservations: Set minimum CPU resources that containers should have access to.
- Memory Limits: Set higher than required to accommodate peak usage and prevent OOM kills.
This approach allows the ASR and Translator services to:
- Run their internal CPU affinity optimizations without container interference
- Dynamically scale CPU usage based on workload
- Properly handle parallel processing of audio files
If you observe resource-related issues:
- Check application logs for affinity or resource errors
- Adjust the container settings in
docker-compose.ymlas follows:- Increase memory limits if you see OOM errors
- Adjust CPU reservations based on host capacity
- Consider setting
cpuslimit if the application consumes too many resources
For production deployments on multi-CPU systems, you may want to pin specific containers to specific CPUs to match the application's internal CPU affinity settings:
# Example: Run containers with specific CPU pinning (Docker run example)
docker run --cpuset-cpus="0,1" --name asr_worker_1 your-asr-imageThis ensures the application's internal CPU affinity matches the container's CPU allocation.
To scale the worker services:
# Scale ASR workers to 3 instances
docker-compose up -d --scale asr_worker=3
# Scale translator workers to 2 instances
docker-compose up -d --scale translator_worker=2# View logs from all services
docker-compose logs
# View logs from a specific service
docker-compose logs web
# Follow logs in real-time
docker-compose logs -f asr_workerdocker-compose exec web python manage.py makemigrations
docker-compose exec web python manage.py migratedocker-compose exec web python manage.py createsuperuserdocker-compose exec db pg_dump -U postgres asr_translator > backup.sqlcat backup.sql | docker-compose exec -T db psql -U postgres asr_translator# Stop services but keep volumes and networks
docker-compose down
# Stop services and remove volumes (WARNING: This will delete all data)
docker-compose down -vCheck logs for the failing container:
docker-compose logs [service-name]Ensure PostgreSQL is running and the connection details in .env are correct:
docker-compose exec db psql -U postgres -c "SELECT 1"Check if models are correctly mounted in the volumes:
docker-compose exec asr_worker ls -la /app/models/voskIf you're experiencing issues related to CPU or memory:
# Check container resource usage
docker stats
# View container details including resource limits
docker inspect asr_worker_1 | grep -A 20 "HostConfig"For production deployments:
- Use proper SSL/TLS termination with a reverse proxy like Nginx
- Set
DEBUG=Falsein the .env file - Use strong, unique passwords for all services
- Consider using Docker Swarm or Kubernetes for advanced orchestration
- Set up regular backups of the database and media files
- Use proper monitoring and alerting
- The Docker Compose setup exposes several ports to the host. In production, consider restricting access using a proper network configuration.
- Default credentials are included in the env.example file. Always change these for production deployments.
- Secret management: Consider using Docker secrets or a dedicated solution like HashiCorp Vault for managing sensitive information.
- Proper indexes have been added to commonly queried fields
- Custom QuerySets and Managers optimize database access patterns
- Bulk operations are used for efficiency with large datasets
The deploy section in the compose file includes resource reservations and limits for the worker containers. The default configuration is designed to work with the application's internal CPU affinity and resource management features, but you may need to adjust based on your server capacity and workload requirements.
The project uses dependencies with specific versions as defined in requirements.txt:
- Web Framework: Django==5.2, djangorestframework==3.14.0
- Speech Recognition: vosk==0.3.45, SoundFile==0.10.3.post1
- Translation: argostranslate==1.8.0
- Messaging: pika==1.3.2
- Caching: redis==4.5.5, django-redis==5.2.0
- Database: psycopg2-binary==2.9.6
- Monitoring: prometheus-client==0.16.0
- HTTP: requests==2.28.2
- Utils: python-dotenv==1.0.0, numpy==1.24.3