A powerful Python framework for deploying ML models on Modal with production-ready features
While Modal provides excellent serverless infrastructure, Modalkit adds a complete ML deployment framework:
- Structured Inference Pipeline: Enforced
preprocess()
→predict()
→postprocess()
pattern - Consistent API Endpoints:
/predict_sync
,/predict_batch
,/predict_async
across all deployments - Type-Safe Interfaces: Pydantic models ensure data validation at API boundaries
- YAML Configuration: Version-controlled deployment settings instead of scattered code
- Environment Management: Easy dev/staging/prod configs with override capabilities
- Reproducible Builds: Declarative infrastructure removes deployment inconsistencies
- Shared Standards: All team members deploy models the same way
- Code Separation: Model logic decoupled from Modal deployment boilerplate
- Collaboration: Config files in git enable infrastructure review and collaboration
- Authentication Middleware: Built-in API key or Modal proxy auth
- Queue Integration: Async processing with multiple backend support
- Cloud Storage: Direct S3/GCS/R2 mounting without manual setup
- Batch Processing: Intelligent request batching for GPU efficiency
- Error Handling: Comprehensive error responses and logging
- Less Boilerplate: Focus on model code, not FastAPI/Modal setup
- Modern Tooling: Pre-configured with ruff, mypy, pre-commit hooks
- Testing Framework: Built-in patterns for testing ML deployments
In short: Modalkit transforms Modal from infrastructure primitives into a complete ML platform, letting teams deploy models consistently while maintaining Modal's performance and scalability.
- 🚀 Native Modal Integration: Seamless deployment on Modal's serverless infrastructure
- 🔐 Flexible Authentication: Modal proxy auth or custom API keys with AWS SSM support
- ☁️ Cloud Storage Support: Direct mounting of S3, GCS, and R2 buckets
- 🔄 Flexible Queue Integration: Optional queue backends with dependency injection - use TaskIQ, SQS, or any custom queue system
- 📦 Batch Inference: Efficient batch processing with configurable batch sizes
- 🎯 Type Safety: Full Pydantic integration for request/response validation
- 🛠️ Developer Friendly: Pre-configured with modern Python tooling (ruff, pre-commit)
- 📊 Production Ready: Comprehensive error handling and logging
# Using pip (recommended)
pip install modalkit
# Using uv
uv pip install modalkit
# Development/latest version from GitHub
pip install git+https://github.com/prassanna-ravishankar/modalkit.git
Working examples are available in the documentation:
- Queue Backend Patterns - Queue backend patterns and dependency injection
- TaskIQ Integration - Full TaskIQ integration tutorial
Follow the step-by-step tutorials to build complete working examples with your own ML models.
Create an inference class that inherits from InferencePipeline
:
from modalkit.inference_pipeline import InferencePipeline
from pydantic import BaseModel
from typing import List
# Define input/output schemas with Pydantic
class TextInput(BaseModel):
text: str
language: str = "en"
class TextOutput(BaseModel):
translated_text: str
confidence: float
# Implement your model logic
class TranslationModel(InferencePipeline):
def __init__(self, model_name: str, all_model_data_folder: str, common_settings: dict, *args, **kwargs):
super().__init__(model_name, all_model_data_folder, common_settings)
# Load your model here
# self.model = load_model(...)
def preprocess(self, input_list: List[TextInput]) -> dict:
"""Prepare inputs for the model"""
texts = [item.text for item in input_list]
return {"texts": texts, "languages": [item.language for item in input_list]}
def predict(self, input_list: List[TextInput], preprocessed_data: dict) -> dict:
"""Run model inference"""
# Your model prediction logic
translations = [text.upper() for text in preprocessed_data["texts"]] # Example
return {"translations": translations, "scores": [0.95] * len(translations)}
def postprocess(self, input_list: List[TextInput], raw_output: dict) -> List[TextOutput]:
"""Format model outputs"""
return [
TextOutput(translated_text=text, confidence=score)
for text, score in zip(raw_output["translations"], raw_output["scores"])
]
import modal
from modalkit.modal_service import ModalService, create_web_endpoints
from modalkit.modal_config import ModalConfig
# Initialize with your config
modal_config = ModalConfig()
app = modal.App(name=modal_config.app_name)
# Define your Modal app class
@app.cls(**modal_config.get_app_cls_settings())
class TranslationApp(ModalService):
inference_implementation = TranslationModel
model_name: str = modal.parameter(default="translation_model")
modal_utils: ModalConfig = modal_config
# Optional: Inject custom queue backend
# def __init__(self, queue_backend=None):
# super().__init__(queue_backend=queue_backend)
# Create API endpoints
@app.function(**modal_config.get_handler_settings())
@modal.asgi_app(**modal_config.get_asgi_app_settings())
def web_endpoints():
return create_web_endpoints(
app_cls=TranslationApp,
input_model=TextInput,
output_model=TextOutput
)
💡 Queue backends are optional - your service works perfectly without any queue configuration. Add TaskIQ or custom queues when you need async processing. See the documentation examples for working implementations.
Create a modalkit.yaml
configuration file:
# modalkit.yaml
app_settings:
app_prefix: "translation-service"
# Authentication configuration
auth_config:
# Option 1: Use API key from AWS SSM
ssm_key: "/translation/api-key"
auth_header: "x-api-key"
# Option 2: Use hardcoded API key (not recommended for production)
# api_key: "your-api-key-here"
# auth_header: "x-api-key"
# Container configuration
build_config:
image: "python:3.11-slim" # or your custom image
tag: "latest"
workdir: "/app"
env:
MODEL_VERSION: "v1.0"
# Deployment settings
deployment_config:
gpu: "T4" # Options: T4, A10G, A100, or null for CPU
concurrency_limit: 10
container_idle_timeout: 300
secure: false # Set to true for Modal proxy auth
# Cloud storage mounts (optional)
cloud_bucket_mounts:
- mount_point: "/mnt/models"
bucket_name: "my-model-bucket"
secret: "aws-credentials"
read_only: true
key_prefix: "models/"
# Batch processing settings
batch_config:
max_batch_size: 32
wait_ms: 100 # Wait up to 100ms to fill batch
# Queue configuration (optional - for async endpoints)
# Leave empty to disable queues, or configure fallback backend
queue_config:
backend: "memory" # Options: "sqs", "memory", or omit for no queues
# broker_url: "redis://localhost:6379" # For TaskIQ via dependency injection
# Model configuration
model_settings:
local_model_repository_folder: "./models"
common:
cache_dir: "./cache"
device: "cuda" # or "cpu"
model_entries:
translation_model:
model_path: "path/to/model.pt"
vocab_size: 50000
# Test locally
modal serve app.py
# Deploy to production
modal deploy app.py
# View logs
modal logs -f
import requests
import asyncio
# For standard API key auth
headers = {"x-api-key": "your-api-key"}
# Synchronous endpoint
response = requests.post(
"https://your-org--translation-service.modal.run/predict_sync",
json={"text": "Hello world", "language": "en"},
headers=headers
)
print(response.json())
# {"translated_text": "HELLO WORLD", "confidence": 0.95}
# Asynchronous endpoint (returns immediately)
response = requests.post(
"https://your-org--translation-service.modal.run/predict_async",
json={"text": "Hello world", "language": "en"},
headers=headers
)
print(response.json())
# {"message_id": "550e8400-e29b-41d4-a716-446655440000"}
# Batch endpoint
response = requests.post(
"https://your-org--translation-service.modal.run/predict_batch",
json=[
{"text": "Hello", "language": "en"},
{"text": "World", "language": "en"}
],
headers=headers
)
print(response.json())
# [{"translated_text": "HELLO", "confidence": 0.95}, {"translated_text": "WORLD", "confidence": 0.95}]
Modalkit provides flexible authentication options:
Configure with secure: false
in your deployment config.
# modalkit.yaml
deployment_config:
secure: false
auth_config:
# Store in AWS SSM (recommended)
ssm_key: "/myapp/api-key"
# OR hardcode (not recommended)
# api_key: "sk-1234567890"
auth_header: "x-api-key"
# Client usage
headers = {"x-api-key": "your-api-key"}
response = requests.post(url, json=data, headers=headers)
Configure with secure: true
for Modal's built-in auth:
# modalkit.yaml
deployment_config:
secure: true # Enables Modal proxy auth
# Client usage
headers = {
"Modal-Key": "your-modal-key",
"Modal-Secret": "your-modal-secret"
}
response = requests.post(url, json=data, headers=headers)
💡 Tip: Modal proxy auth is recommended for production as it's managed by Modal and requires no additional setup.
Modalkit uses YAML configuration with two main sections:
# modalkit.yaml
app_settings: # Application deployment settings
app_prefix: str # Prefix for your Modal app name
auth_config: # Authentication configuration
build_config: # Container build settings
deployment_config: # Runtime deployment settings
batch_config: # Batch processing settings
queue_config: # Async queue settings
model_settings: # Model-specific settings
local_model_repository_folder: str
common: dict # Shared settings across models
model_entries: # Model-specific configurations
model_name: dict
Set configuration file location:
# Default location
export MODALKIT_CONFIG="modalkit.yaml"
# Multiple configs (later files override earlier ones)
export MODALKIT_CONFIG="base.yaml,prod.yaml"
# Other environment variables
export MODALKIT_APP_POSTFIX="-prod" # Appended to app name
deployment_config:
# GPU configuration
gpu: "T4" # T4, A10G, A100, H100, or null
# Resource limits
concurrency_limit: 10
container_idle_timeout: 300
retries: 3
# Memory/CPU (when gpu is null)
memory: 8192 # MB
cpu: 4.0 # cores
# Volumes and mounts
volumes:
"/mnt/cache": "model-cache-vol"
mounts:
- local_path: "configs/prod.json"
remote_path: "/app/config.json"
type: "file"
Modalkit seamlessly integrates with cloud storage providers through Modal's CloudBucketMount:
Provider | Configuration |
---|---|
AWS S3 | Native support with IAM credentials |
Google Cloud Storage | Service account authentication |
Cloudflare R2 | S3-compatible API |
MinIO/Others | Any S3-compatible endpoint |
AWS S3 Configuration
cloud_bucket_mounts:
- mount_point: "/mnt/models"
bucket_name: "my-ml-models"
secret: "aws-credentials" # Modal secret name
key_prefix: "production/" # Only mount this prefix
read_only: true
First, create the Modal secret:
modal secret create aws-credentials \
AWS_ACCESS_KEY_ID=xxx \
AWS_SECRET_ACCESS_KEY=yyy \
AWS_DEFAULT_REGION=us-east-1
Google Cloud Storage
cloud_bucket_mounts:
- mount_point: "/mnt/datasets"
bucket_name: "my-datasets"
bucket_endpoint_url: "https://storage.googleapis.com"
secret: "gcp-credentials"
Create secret from service account:
modal secret create gcp-credentials \
--from-gcp-service-account path/to/key.json
Cloudflare R2
cloud_bucket_mounts:
- mount_point: "/mnt/artifacts"
bucket_name: "ml-artifacts"
bucket_endpoint_url: "https://accountid.r2.cloudflarestorage.com"
secret: "r2-credentials"
class MyInference(InferencePipeline):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# Load model from mounted bucket
model_path = "/mnt/models/my_model.pt"
self.model = torch.load(model_path)
# Load dataset
with open("/mnt/datasets/vocab.json") as f:
self.vocab = json.load(f)
- ✅ Use read-only mounts for model artifacts
- ✅ Mount only required prefixes with
key_prefix
- ✅ Use separate buckets for models vs. data
- ✅ Cache frequently accessed files locally
- ❌ Avoid writing logs to mounted buckets
- ❌ Don't mount entire buckets if you only need specific files
Modalkit offers optional queue processing with multiple approaches:
Perfect for sync-only APIs:
class MyService(ModalService):
inference_implementation = MyModel
# No queue backend - async requests process but don't queue responses
service = MyService()
Use dependency injection for full TaskIQ support:
from taskiq_redis import AsyncRedisTaskiqBroker
class TaskIQBackend:
def __init__(self):
self.broker = AsyncRedisTaskiqBroker("redis://localhost:6379")
async def send_message(self, queue_name: str, message: str) -> bool:
@self.broker.task(task_name=f"process_{queue_name}")
async def process_result(msg: str) -> str:
# Your custom processing logic
return f"Processed: {msg}"
await process_result.kiq(message)
return True
# Inject TaskIQ backend
service = MyService(queue_backend=TaskIQBackend())
Use YAML configuration for simple setups:
queue_config:
backend: "sqs" # or "memory"
# Additional backend-specific settings
Implement any queue system:
class MyCustomQueue:
async def send_message(self, queue_name: str, message: str) -> bool:
# Your custom queue implementation (RabbitMQ, Kafka, etc.)
return True
service = MyService(queue_backend=MyCustomQueue())
See complete tutorials in the documentation:
- Queue Backend Patterns - Queue backend patterns
- TaskIQ Integration - Full TaskIQ integration
# Async endpoint usage
response = requests.post("/predict_async", json={
"message": {"text": "Process this"},
"success_queue": "results",
"failure_queue": "errors"
})
# {"job_id": "uuid"}
Configure intelligent batching for better GPU utilization:
batch_config:
max_batch_size: 32
wait_ms: 100 # Max time to wait for batch to fill
Auto-reload Modal volumes for model updates:
deployment_config:
volumes:
"/mnt/models": "model-volume"
volume_reload_interval_seconds: 300 # Reload every 5 minutes
# Clone repository
git clone https://github.com/prassanna-ravishankar/modalkit.git
cd modalkit
# Install with uv (recommended)
uv sync
# Install pre-commit hooks
uv run pre-commit install
# Run all tests
uv run pytest --cov --cov-config=pyproject.toml --cov-report=xml
# Run specific tests
uv run pytest tests/test_modal_service.py -v
# Run with HTML coverage report
uv run pytest --cov=modalkit --cov-report=html
# Run all checks
uv run pre-commit run -a
# Run type checking
uv run mypy modalkit/
# Format code
uv run ruff format modalkit/ tests/
# Lint code
uv run ruff check modalkit/ tests/
Endpoint | Method | Description | Returns |
---|---|---|---|
/predict_sync |
POST | Synchronous inference | Model output |
/predict_async |
POST | Async inference (queued) | Message ID |
/predict_batch |
POST | Batch inference | List of outputs |
/health |
GET | Health check | Status |
Your model class must implement:
def preprocess(self, input_list: List[InputModel]) -> dict
def predict(self, input_list: List[InputModel], preprocessed_data: dict) -> dict
def postprocess(self, input_list: List[InputModel], raw_output: dict) -> List[OutputModel]
We welcome contributions! Please see our Contributing Guidelines for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Make your changes
- Run tests and linting (
uv run pytest && uv run pre-commit run -a
) - Commit your changes (pre-commit hooks will run automatically)
- Push to your fork and open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Built with ❤️ using:
- Modal - Serverless infrastructure for ML
- FastAPI - Modern web framework
- Pydantic - Data validation
- Taskiq - Async task processing