Skip to content

Assystant/mt_providers_microsoft

Repository files navigation

Microsoft Translator Provider

Tests Version Python License

Microsoft Azure Translator integration for the MT Providers framework.

Overview

This provider enables seamless integration with Microsoft Azure Translator services through the MT Providers framework. It supports both synchronous and asynchronous operations, automatic retries, rate limiting, and comprehensive error handling.

Table of Contents

Installation

Prerequisites

  • Python 3.8 or higher
  • Azure Translator subscription key
  • Azure region identifier

Install from PyPI

pip install mt_provider_microsoft

Install for Development

git clone https://github.com/assystant/mt-provider-microsoft.git
cd mt-provider-microsoft
pip install -e ".[test,docs]"

Features

  • Single and Batch Translations: Translate individual texts or process multiple texts efficiently
  • Async Support: Full async/await support with aiohttp for non-blocking operations
  • Rate Limiting: Built-in rate limiting to respect API quotas
  • Automatic Retries: Configurable retry logic with exponential backoff
  • Error Handling: Comprehensive error handling with detailed error messages
  • Region Support: Multi-region deployment support
  • Response Metadata: Includes detected language and confidence scores
  • Type Safety: Full type annotations with mypy support
  • Framework Integration: Seamless integration with MT Providers ecosystem

Configuration

Basic Configuration

from mt_providers.types import TranslationConfig

config = TranslationConfig(
    api_key="your-azure-translator-key",
    region="westus2",  # Your Azure region
    timeout=30,        # Optional: request timeout in seconds
    rate_limit=10,     # Optional: requests per second
)

Environment Variables

import os
from mt_providers.types import TranslationConfig

config = TranslationConfig(
    api_key=os.getenv("AZURE_TRANSLATOR_KEY"),
    region=os.getenv("AZURE_TRANSLATOR_REGION", "westus2"),
    timeout=int(os.getenv("AZURE_TRANSLATOR_TIMEOUT", "30")),
)

Configuration Options

Option Type Required Default Description
api_key str Yes - Azure Translator subscription key
region str Yes - Azure region (e.g., "westus2", "eastus")
endpoint str No Microsoft default Custom API endpoint URL
timeout int No 30 Request timeout in seconds
rate_limit int No None Maximum requests per second
retry_attempts int No 3 Number of retry attempts
retry_backoff float No 1.0 Retry backoff multiplier

Usage Examples

Basic Translation

from mt_providers import get_provider
from mt_providers.types import TranslationConfig

# Configure the provider
config = TranslationConfig(
    api_key="your-azure-translator-key",
    region="westus2"
)

# Get the Microsoft provider
translator = get_provider("microsoft")(config)

# Translate a single text
result = translator.translate("Hello world", "en", "es")
print(f"Translation: {result['translated_text']}")  # "¡Hola mundo!"
print(f"Detected language: {result['metadata']['detected_language']}")

Batch Translation

# Translate multiple texts efficiently
texts = [
    "Hello world",
    "How are you?", 
    "Good morning",
    "Thank you very much"
]

results = translator.bulk_translate(texts, "en", "es")

for i, result in enumerate(results):
    print(f"{texts[i]}{result['translated_text']}")
    # Hello world → ¡Hola mundo!
    # How are you? → ¿Cómo estás?
    # Good morning → Buenos días
    # Thank you very much → Muchas gracias

Async Translation

import asyncio

async def async_translate_example():
    # Single async translation
    result = await translator.translate_async("Hello world", "en", "fr")
    print(f"Async result: {result['translated_text']}")  # "Bonjour le monde"
    
    # Batch async translation
    texts = ["Hello", "World", "Python"]
    results = await translator.bulk_translate_async(texts, "en", "de")
    
    for text, result in zip(texts, results):
        print(f"{text}{result['translated_text']}")

# Run async function
asyncio.run(async_translate_example())

Error Handling

from mt_providers.exceptions import (
    ConfigurationError,
    TranslationError, 
    ProviderError
)
from mt_providers.types import TranslationStatus

try:
    result = translator.translate("Hello", "en", "es")
    
    if result['status'] == TranslationStatus.SUCCESS:
        print(f"Success: {result['translated_text']}")
    else:
        print(f"Translation failed: {result['error']}")
        
except ConfigurationError as e:
    print(f"Configuration error: {e}")
except TranslationError as e:
    print(f"Translation error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Advanced Configuration

# Custom endpoint and advanced settings
config = TranslationConfig(
    api_key="your-key",
    region="westus2",
    endpoint="https://custom-endpoint.cognitiveservices.azure.com/translator/text/v3.0/translate",
    timeout=60,
    rate_limit=50,  # 50 requests per second
    retry_attempts=5,
    retry_backoff=2.0
)

translator = get_provider("microsoft")(config)

API Reference

MicrosoftTranslator Class

The main translator class that implements the MT Providers interface.

Methods

translate(text: str, source_lang: str, target_lang: str) -> TranslationResult

Translates a single text synchronously.

Parameters:

  • text (str): Text to translate (max 5000 characters)
  • source_lang (str): Source language code (ISO 639-1, e.g., "en", "es")
  • target_lang (str): Target language code (ISO 639-1, e.g., "fr", "de")

Returns:

  • TranslationResult: Dictionary with translation results and metadata

Example:

result = translator.translate("Hello", "en", "es")
# Returns: {
#     'translated_text': '¡Hola',
#     'status': TranslationStatus.SUCCESS,
#     'metadata': {
#         'detected_language': 'en',
#         'confidence': 0.95,
#         'provider': 'microsoft',
#         'model': 'azure-translator-3.0'
#     }
# }
bulk_translate(texts: List[str], source_lang: str, target_lang: str) -> List[TranslationResult]

Translates multiple texts in a single batch request.

Parameters:

  • texts (List[str]): List of texts to translate (max 100 texts)
  • source_lang (str): Source language code
  • target_lang (str): Target language code

Returns:

  • List[TranslationResult]: List of translation results
translate_async(text: str, source_lang: str, target_lang: str) -> TranslationResult

Asynchronous version of translate().

bulk_translate_async(texts: List[str], source_lang: str, target_lang: str) -> List[TranslationResult]

Asynchronous version of bulk_translate().

Supported Languages

Microsoft Translator supports 100+ languages. Common language codes include:

Language Code Language Code
English en Spanish es
French fr German de
Italian it Portuguese pt
Russian ru Chinese (Simplified) zh
Japanese ja Korean ko
Arabic ar Hindi hi

For the complete list, see Microsoft's language support documentation.

Error Handling

Exception Types

The provider raises specific exceptions for different error scenarios:

from mt_providers.exceptions import (
    ConfigurationError,     # Invalid configuration
    TranslationError,       # Translation-specific errors
    ProviderError,          # Provider-specific errors
    RateLimitError,         # Rate limit exceeded
    TimeoutError           # Request timeout
)

Error Response Handling

try:
    result = translator.translate("Hello", "en", "invalid-lang")
except TranslationError as e:
    print(f"Translation failed: {e}")
    print(f"Error code: {e.error_code}")
    print(f"Provider: {e.provider}")
except RateLimitError as e:
    print(f"Rate limit exceeded. Retry after: {e.retry_after} seconds")
except TimeoutError as e:
    print(f"Request timed out after {e.timeout} seconds")

Status Codes

Translation results include status information:

from mt_providers.types import TranslationStatus

result = translator.translate("Hello", "en", "es")

if result['status'] == TranslationStatus.SUCCESS:
    print("Translation successful")
elif result['status'] == TranslationStatus.ERROR:
    print(f"Translation failed: {result['error']}")
elif result['status'] == TranslationStatus.PARTIAL:
    print("Partial success (some texts failed in batch)")

Limits and Quotas

Azure Translator Limits

  • Character limit: 5,000 characters per request
  • Batch size: Maximum 100 texts per batch request
  • Rate limits: Varies by subscription tier
    • Free tier: 2M characters/month
    • Standard tier: Configurable quotas
  • Text length: No hard limit, but optimal performance under 1000 characters

Provider Limits

  • Timeout: Default 30 seconds (configurable)
  • Retries: Default 3 attempts with exponential backoff
  • Rate limiting: Configurable requests per second
# Example: High-throughput configuration
config = TranslationConfig(
    api_key="your-key",
    region="westus2",
    timeout=60,
    rate_limit=100,     # 100 requests/second
    retry_attempts=5,
    retry_backoff=1.5
)

Troubleshooting

Common Issues

1. Authentication Errors

# Error: Invalid subscription key
ConfigurationError: Invalid subscription key. Check your API key and region.

# Solution: Verify your credentials
config = TranslationConfig(
    api_key="your-valid-key",  # Check Azure portal
    region="westus2"           # Match your resource region
)

2. Rate Limiting

# Error: Too Many Requests
RateLimitError: Rate limit exceeded. Retry after 60 seconds.

# Solution: Implement backoff or reduce rate
config = TranslationConfig(
    api_key="your-key",
    region="westus2",
    rate_limit=10  # Reduce requests per second
)

3. Language Code Issues

# Error: Unsupported language
TranslationError: Language 'xyz' is not supported.

# Solution: Use valid ISO 639-1 codes
result = translator.translate("Hello", "en", "es")  # ✓ Valid
result = translator.translate("Hello", "english", "spanish")  # ✗ Invalid

4. Text Length Issues

# Error: Text too long
TranslationError: Text exceeds maximum length of 5000 characters.

# Solution: Split long texts
def translate_long_text(text, source, target):
    max_length = 4500  # Leave buffer for safety
    if len(text) <= max_length:
        return translator.translate(text, source, target)
    
    # Split and translate in chunks
    chunks = [text[i:i+max_length] for i in range(0, len(text), max_length)]
    results = translator.bulk_translate(chunks, source, target)
    
    return {
        'translated_text': ''.join(r['translated_text'] for r in results),
        'status': results[0]['status'],
        'metadata': results[0]['metadata']
    }

Debug Mode

Enable debug logging for troubleshooting:

import logging
from mt_providers import configure_logging

# Enable debug logging
configure_logging(level=logging.DEBUG)

# Now all API calls will be logged
translator = get_provider("microsoft")(config)
result = translator.translate("Hello", "en", "es")

Integration Examples

Web Application Integration

from flask import Flask, request, jsonify
from mt_providers import get_provider
from mt_providers.types import TranslationConfig

app = Flask(__name__)

# Initialize translator
config = TranslationConfig(
    api_key=os.getenv("AZURE_TRANSLATOR_KEY"),
    region=os.getenv("AZURE_TRANSLATOR_REGION")
)
translator = get_provider("microsoft")(config)

@app.route('/translate', methods=['POST'])
def translate_text():
    data = request.json
    
    try:
        result = translator.translate(
            data['text'],
            data['source_lang'],
            data['target_lang']
        )
        return jsonify(result)
    except Exception as e:
        return jsonify({'error': str(e)}), 400

if __name__ == '__main__':
    app.run(debug=True)

Async Web Framework (FastAPI)

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from mt_providers import get_provider
from mt_providers.types import TranslationConfig

app = FastAPI()

class TranslationRequest(BaseModel):
    text: str
    source_lang: str
    target_lang: str

# Initialize async translator
config = TranslationConfig(
    api_key=os.getenv("AZURE_TRANSLATOR_KEY"),
    region=os.getenv("AZURE_TRANSLATOR_REGION")
)
translator = get_provider("microsoft")(config)

@app.post("/translate")
async def translate_text(request: TranslationRequest):
    try:
        result = await translator.translate_async(
            request.text,
            request.source_lang,
            request.target_lang
        )
        return result
    except Exception as e:
        raise HTTPException(status_code=400, detail=str(e))

Batch Processing Pipeline

import asyncio
from typing import List
import pandas as pd

async def translate_dataframe(df: pd.DataFrame, text_column: str, 
                            source_lang: str, target_lang: str) -> pd.DataFrame:
    """Translate a column in a pandas DataFrame."""
    
    # Batch translate all texts
    texts = df[text_column].tolist()
    batch_size = 100  # Microsoft's batch limit
    
    all_results = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i+batch_size]
        results = await translator.bulk_translate_async(batch, source_lang, target_lang)
        all_results.extend(results)
    
    # Add translated column
    df[f'{text_column}_translated'] = [r['translated_text'] for r in all_results]
    df[f'{text_column}_confidence'] = [r['metadata']['confidence'] for r in all_results]
    
    return df

# Usage
df = pd.read_csv('multilingual_data.csv')
df_translated = asyncio.run(translate_dataframe(df, 'content', 'auto', 'en'))
df_translated.to_csv('translated_data.csv', index=False)

Best Practices

1. Configuration Management

# Use environment variables for sensitive data
import os
from dataclasses import dataclass

@dataclass
class Config:
    azure_key: str = os.getenv("AZURE_TRANSLATOR_KEY")
    azure_region: str = os.getenv("AZURE_TRANSLATOR_REGION", "westus2")
    timeout: int = int(os.getenv("TRANSLATION_TIMEOUT", "30"))
    rate_limit: int = int(os.getenv("TRANSLATION_RATE_LIMIT", "10"))

config = Config()
translation_config = TranslationConfig(
    api_key=config.azure_key,
    region=config.azure_region,
    timeout=config.timeout,
    rate_limit=config.rate_limit
)

2. Error Handling Strategy

from mt_providers.exceptions import TranslationError, RateLimitError
import time

def robust_translate(text, source, target, max_retries=3):
    """Translate with robust error handling."""
    
    for attempt in range(max_retries):
        try:
            return translator.translate(text, source, target)
            
        except RateLimitError as e:
            if attempt < max_retries - 1:
                time.sleep(e.retry_after or 60)
                continue
            raise
            
        except TranslationError as e:
            if e.error_code == "TEMPORARY_ERROR" and attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # Exponential backoff
                continue
            raise

3. Performance Optimization

# Use batch translation for multiple texts
texts = ["Hello", "World", "Python", "Translation"]

# ✗ Inefficient: Multiple API calls
results = []
for text in texts:
    result = translator.translate(text, "en", "es")
    results.append(result)

# ✓ Efficient: Single batch API call
results = translator.bulk_translate(texts, "en", "es")

# ✓ Even better: Async batch translation
results = await translator.bulk_translate_async(texts, "en", "es")

4. Caching Strategy

from functools import lru_cache
import hashlib

class CachedTranslator:
    def __init__(self, translator):
        self.translator = translator
        self._cache = {}
    
    def _cache_key(self, text, source, target):
        """Generate cache key for translation."""
        content = f"{text}:{source}:{target}"
        return hashlib.md5(content.encode()).hexdigest()
    
    def translate(self, text, source, target):
        """Translate with caching."""
        cache_key = self._cache_key(text, source, target)
        
        if cache_key in self._cache:
            return self._cache[cache_key]
        
        result = self.translator.translate(text, source, target)
        self._cache[cache_key] = result
        return result

# Usage
cached_translator = CachedTranslator(translator)

Development

Setting Up Development Environment

# Clone the repository
git clone https://github.com/assystant/mt-provider-microsoft.git
cd mt-provider-microsoft

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install development dependencies
pip install -e ".[test,docs,dev]"

# Install pre-commit hooks
pre-commit install

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=mt_provider_microsoft --cov-report=html

# Run only async tests
pytest -k "async"

# Run with verbose output
pytest -v

Code Quality

# Format code
black mt_provider_microsoft/ tests/

# Sort imports
isort mt_provider_microsoft/ tests/

# Lint code
flake8 mt_provider_microsoft/ tests/

# Type checking
mypy mt_provider_microsoft/

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Quick Start for Contributors

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes and add tests
  4. Ensure all tests pass: pytest
  5. Ensure code quality: black . && isort . && flake8
  6. Commit your changes: git commit -m 'Add amazing feature'
  7. Push to the branch: git push origin feature/amazing-feature
  8. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Changelog

See CHANGELOG.md for a detailed history of changes.


Made with ❤️ by the MT Providers team

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages