Clustrix

Clustrix is a Python package that enables seamless distributed computing on clusters. With a simple decorator, you can execute any Python function remotely on cluster resources while automatically handling dependency management, environment setup, and result collection.

Features

Simple Decorator Interface: Just add @cluster to any function
Automated SSH Key Setup: Create and deplot SSH keys to enable secure passwordless authentication with one click or API call
Interactive Jupyter Widget: %%clusterfy magic command with GUI configuration manager
Multiple Cluster Support: SLURM, PBS, SGE, Kubernetes, SSH, and major cloud providers
Cloud Provider Integration: Native support for AWS (EC2/EKS), Google Cloud (GCE/GKE), Azure (VM/AKS), Lambda Cloud, and HuggingFace Spaces
Unified Filesystem Utilities: Work with files seamlessly across local and remote clusters
Automatic Dependency Management: Captures and replicates your exact Python environment
Native Cost Monitoring: Built-in cost tracking for all major cloud providers
Kubernetes Support: Deploy to EKS, GKE, AKS, or any Kubernetes cluster
Loop Parallelization: Automatically distributes loops across cluster nodes
Flexible Configuration: Easy setup with config files, environment variables, or interactive widget
Dynamic Instance Selection: Auto-populated dropdowns for cloud instance types and regions
Error Handling: Comprehensive error reporting and job monitoring

Quick Start

Installation

pip install clustrix

Basic Configuration

import clustrix

# Configure your cluster
clustrix.configure(
    cluster_type='slurm',
    cluster_host='your-cluster.example.com',
    username='your-username',
    default_cores=4,
    default_memory='8GB'
)

Using the Decorator

from clustrix import cluster

@cluster(cores=8, memory='16GB', time='02:00:00')
def expensive_computation(data, iterations=1000):
    import numpy as np
    result = 0
    for i in range(iterations):
        result += np.sum(data ** 2)
    return result

# This function will execute on the cluster
data = [1, 2, 3, 4, 5]
result = expensive_computation(data, iterations=10000)
print(f"Result: {result}")

Jupyter Notebook Integration

Clustrix provides seamless integration with Jupyter notebooks through an interactive widget:

import clustrix  # Auto-loads the magic command

# Use the %%clusterfy magic command to open the configuration widget

%%clusterfy
# Interactive widget appears with:
# - Dropdown to select configurations
# - Forms to create/edit cluster setups  
# - One-click configuration application
# - Save/load configurations to files

Interactive Configuration Widget

The Clustrix widget provides a comprehensive GUI for managing cluster configurations directly in Jupyter notebooks. Here's what you'll see when you use the %%clusterfy magic command:

Default View

When the widget first loads, it displays the "Local Single-core" configuration for quick testing:

Configuration Dropdown

The dropdown menu includes pre-built templates for various cluster types and cloud providers:

The widget includes pre-built templates for:

Local Development:
- Local Single-core: Run jobs on one CPU core
- Local Multi-core: Utilize all available CPU cores
HPC Clusters:
- SLURM: University and research cluster support
- PBS/SGE: Traditional HPC schedulers
Cloud Providers:
- AWS: EC2 instances and EKS Kubernetes clusters
- Google Cloud: Compute Engine VMs and GKE clusters
- Azure: Virtual Machines and AKS Kubernetes clusters
- Lambda Cloud: GPU-optimized instances for ML/AI
- HuggingFace Spaces: Deploy to HF Spaces infrastructure
Kubernetes: Native container orchestration support

Cluster Configuration Examples

SLURM Cluster Configuration

For traditional HPC clusters, the widget provides all essential fields:

The advanced settings accordion reveals additional options:

Advanced options include:

Module loads (e.g., python/3.9, cuda/11.2)
Environment variables
Pre-execution commands
Custom SSH key paths
Cost monitoring toggles

Cloud Provider Configuration

Google Cloud Platform: When selecting a cloud provider, only relevant fields are displayed:

Lambda Cloud GPU Instances: The widget dynamically populates instance type dropdowns based on the selected provider:

Key Widget Features

Dynamic Field Visibility: Only shows fields relevant to the selected cluster type
Provider-Specific Options:
- AWS: Region selection, instance types, EKS cluster options
- Azure: Resource groups, VM sizes, AKS configuration
- GCP: Projects, zones, machine types, GKE options
- Lambda Cloud: GPU instance selection with live pricing
Input Validation: Real-time validation for hostnames, IP addresses, and configuration values
Tooltips: Hover over any field label to see detailed help text
Configuration Management:
- Save configurations to YAML/JSON files
- Load existing configurations
- Test configurations before applying
- Add/delete custom configurations

Using the Widget

Select a Configuration: Choose from the dropdown or create a new one
Edit Settings: Modify cluster connection details and resource requirements
Advanced Options: Expand the accordion for environment setup and additional settings
Apply Configuration: Click "Apply Configuration" to use these settings for subsequent @cluster decorated functions
Save for Later: Use "Save Configuration" to persist settings to a file

Configuration File

Create a clustrix.yml file in your project directory:

cluster_type: slurm
cluster_host: cluster.example.com
username: myuser
key_file: ~/.ssh/id_rsa

default_cores: 4
default_memory: 8GB
default_time: "01:00:00"
default_partition: gpu

remote_work_dir: /scratch/myuser/clustrix
conda_env_name: myproject

auto_parallel: true
max_parallel_jobs: 50
cleanup_on_success: true

module_loads:
  - python/3.9
  - cuda/11.2

environment_variables:
  CUDA_VISIBLE_DEVICES: "0,1"

SSH Key Automation

Clustrix provides automated SSH key setup to eliminate the manual process of generating and deploying SSH keys to clusters. This feature transforms a 15-30 minute manual setup into a 15-second automated process.

Quick Setup Methods

Method 1: Jupyter Widget (Recommended)

The easiest way is using the interactive widget that appears when you import Clustrix:

import clustrix
# Look for the "SSH Key Setup" section in the widget interface

Enter your cluster hostname and username
Enter your password
Click "Setup SSH Keys"
✅ Done! Secure access in <15 seconds

Method 2: CLI Command

# Basic setup
clustrix ssh-setup --host cluster.university.edu --user your_username

# With custom alias for easy access
clustrix ssh-setup --host cluster.university.edu --user your_username --alias my_hpc

# Now you can connect with: ssh my_hpc

Method 3: Python API

from clustrix import setup_ssh_keys_with_fallback
from clustrix.config import ClusterConfig

config = ClusterConfig(
    cluster_type="slurm",
    cluster_host="cluster.university.edu", 
    username="your_username"
)

result = setup_ssh_keys_with_fallback(config)
if result["success"]:
    print("✅ SSH keys setup successfully!")

Key Features

🔒 Secure: Ed25519 keys with proper permissions (600/644)
🧹 Smart Cleanup: Automatically removes conflicting old keys
🔄 Key Rotation: Force refresh to generate new keys
🌐 Cross-platform: Works on Windows, macOS, Linux
🏢 Enterprise Ready: Handles Kerberos clusters gracefully
💡 Smart Fallbacks: Environment-specific password retrieval

Password Fallback System

No need to enter passwords manually! Clustrix automatically retrieves passwords from:

Google Colab: Colab secrets (stored securely)
Environment Variables: CLUSTRIX_PASSWORD_* or CLUSTER_PASSWORD
Interactive Prompts: GUI popups in notebooks, terminal prompts in CLI

Enterprise Cluster Support

For university/enterprise clusters using Kerberos authentication:

# Clustrix deploys keys successfully, then use Kerberos for auth
kinit your_netid@UNIVERSITY.EDU
ssh your_netid@cluster.university.edu

📖 For complete details, try the interactive SSH Key Automation Tutorial

Advanced Usage

Unified Filesystem Utilities

Clustrix provides unified filesystem operations that work seamlessly across local and remote clusters:

from clustrix import cluster_ls, cluster_find, cluster_stat, cluster_exists, cluster_glob
from clustrix.config import ClusterConfig

# Configure for local or remote operations
config = ClusterConfig(
    cluster_type="slurm",  # or "local" for local operations
    cluster_host="cluster.edu",
    username="researcher",
    remote_work_dir="/scratch/project"
)

# List directory contents (works locally and remotely)
files = cluster_ls("data/", config)

# Find files by pattern
csv_files = cluster_find("*.csv", "datasets/", config)

# Check file existence
if cluster_exists("results/output.json", config):
    print("Results already computed!")

# Get file information
file_info = cluster_stat("large_dataset.h5", config)
print(f"Dataset size: {file_info.size / 1e9:.1f} GB")

# Use with @cluster decorator for data-driven workflows
@cluster(cores=8)
def process_datasets(config):
    # Find all data files on the cluster
    data_files = cluster_glob("*.csv", "input/", config)
    
    results = []
    for filename in data_files:  # Loop gets parallelized automatically
        # Check file size before processing
        file_info = cluster_stat(filename, config)
        if file_info.size > 100_000_000:  # Large files
            result = process_large_file(filename, config)
        else:
            result = process_small_file(filename, config)
        results.append(result)
    
    return results

Available filesystem operations:

cluster_ls() - List directory contents
cluster_find() - Find files by pattern (recursive)
cluster_stat() - Get file information (size, modified time, permissions)
cluster_exists() - Check if file/directory exists
cluster_isdir() / cluster_isfile() - Check file type
cluster_glob() - Pattern matching for files
cluster_du() - Directory usage information
cluster_count_files() - Count files matching pattern

Cost Monitoring

Clustrix includes built-in cost monitoring for cloud providers:

from clustrix import cost_tracking_decorator, get_cost_monitor

# Automatic cost tracking with decorator
@cost_tracking_decorator('aws', 'p3.2xlarge')  
@cluster(cores=8, memory='60GB')
def expensive_training():
    # Your training code here
    pass

# Manual cost monitoring
monitor = get_cost_monitor('gcp')
cost_estimate = monitor.estimate_cost('n2-standard-4', hours_used=2.0)
print(f"Estimated cost: ${cost_estimate.estimated_cost:.2f}")

# Get pricing information
pricing = monitor.get_pricing_info()
recommendations = monitor.get_cost_optimization_recommendations()

Supported cloud providers: AWS, Google Cloud, Azure, Lambda Cloud

Custom Resource Requirements

@cluster(
    cores=16,
    memory='32GB',
    time='04:00:00',
    partition='gpu',
    environment='tensorflow-env'
)
def train_model(data, epochs=100):
    # Your machine learning code here
    pass

Manual Parallelization Control

@cluster(parallel=False)  # Disable automatic loop parallelization
def sequential_computation(data):
    result = []
    for item in data:
        result.append(process_item(item))
    return result

@cluster(parallel=True)   # Enable automatic loop parallelization
def parallel_computation(data):
    results = []
    for item in data:  # This loop will be automatically distributed
        results.append(expensive_operation(item))
    return results

Different Cluster Types

# SLURM cluster
clustrix.configure(cluster_type='slurm', cluster_host='slurm.example.com')

# PBS cluster  
clustrix.configure(cluster_type='pbs', cluster_host='pbs.example.com')

# Kubernetes cluster
clustrix.configure(cluster_type='kubernetes')

# Simple SSH execution (no scheduler)
clustrix.configure(cluster_type='ssh', cluster_host='server.example.com')

Cloud Provider Integration

Clustrix provides native integration with major cloud providers for both VM and Kubernetes deployments:

AWS Integration

# Configure AWS credentials and region
clustrix.configure(
    cluster_type='aws',
    access_key_id='YOUR_ACCESS_KEY',
    secret_access_key='YOUR_SECRET_KEY',
    region='us-west-2'
)

# Run on EC2 instance
@cluster(provider='aws', instance_type='p3.2xlarge', cores=8, memory='61GB')
def train_on_aws():
    # GPU-accelerated training on AWS
    pass

# Run on EKS Kubernetes cluster
@cluster(provider='aws', cluster_type='kubernetes', cluster_name='my-eks-cluster')
def distributed_training():
    # Runs on Amazon EKS
    pass

Google Cloud Integration

# Configure GCP with service account
clustrix.configure(
    cluster_type='gcp',
    project_id='your-project-id',
    service_account_key='path/to/service-account-key.json',
    region='us-central1'
)

# Run on Compute Engine
@cluster(provider='gcp', machine_type='n1-highmem-8', cores=8, memory='52GB')
def analyze_data():
    # High-memory computation on GCP
    pass

# Run on GKE cluster
@cluster(provider='gcp', cluster_type='kubernetes', cluster_name='my-gke-cluster')
def kubernetes_job():
    # Runs on Google Kubernetes Engine
    pass

Azure Integration

# Configure Azure with service principal
clustrix.configure(
    cluster_type='azure',
    subscription_id='YOUR_SUBSCRIPTION_ID',
    client_id='YOUR_CLIENT_ID',
    client_secret='YOUR_CLIENT_SECRET',
    tenant_id='YOUR_TENANT_ID',
    region='eastus'
)

# Run on Azure VM
@cluster(provider='azure', vm_size='Standard_NC6', cores=6, memory='56GB')
def gpu_workload():
    # GPU computation on Azure
    pass

# Run on AKS cluster
@cluster(provider='azure', cluster_type='kubernetes', cluster_name='my-aks-cluster')
def container_workload():
    # Runs on Azure Kubernetes Service
    pass

Lambda Cloud Integration

# Configure Lambda Cloud for GPU workloads
clustrix.configure(
    cluster_type='lambda_cloud',
    api_key='YOUR_LAMBDA_API_KEY'
)

# Run on Lambda GPU instance
@cluster(provider='lambda_cloud', instance_type='gpu_1x_a100', cores=30, memory='200GB')
def train_large_model():
    # A100 GPU training on Lambda Cloud
    pass

HuggingFace Spaces Integration

# Configure HuggingFace Spaces
clustrix.configure(
    cluster_type='huggingface_spaces',
    token='YOUR_HF_TOKEN'
)

# Deploy to HuggingFace Spaces
@cluster(provider='huggingface_spaces', space_hardware='gpu-t4-medium')
def inference_endpoint():
    # Runs on HuggingFace infrastructure
    pass

Command Line Interface

# Configure Clustrix
clustrix config --cluster-type slurm --cluster-host cluster.example.com --cores 8

# Check current configuration
clustrix config

# Load configuration from file
clustrix load my-config.yml

# Check cluster status
clustrix status

How It Works

Function Serialization: Clustrix captures your function, arguments, and dependencies using advanced serialization
Environment Replication: Creates an identical Python environment on the cluster with all required packages
Job Submission: Submits your function as a job to the cluster scheduler
Execution: Runs your function on cluster resources with specified requirements
Result Collection: Automatically retrieves results once execution completes
Cleanup: Optionally cleans up temporary files and environments

Important Notes

⚠️ REPL/Interactive Python Limitation: Functions defined interactively in the Python REPL (command line python interpreter) cannot be serialized for remote execution because their source code is not available. This affects:

Interactive Python sessions (python command)
Some notebook environments that don't preserve function source

✅ Recommended Approach: Define functions in:

Python files (.py scripts)
Jupyter notebooks
IPython environments
Any environment where inspect.getsource() can access the function source code

# ❌ This won't work in interactive Python REPL
>>> @cluster(cores=2)
... def my_function(x):
...     return x * 2
>>> my_function(5)  # Error: source code not available

# ✅ This works in .py files and notebooks
@cluster(cores=2)
def my_function(x):
    return x * 2

result = my_function(5)  # Works correctly

Supported Cluster Types

SLURM: Full support for Slurm Workload Manager
PBS/Torque: Support for PBS Professional and Torque
SGE: Sun Grid Engine support
Kubernetes: Execute jobs as Kubernetes pods
SSH: Direct execution via SSH (no scheduler)

Repository Structure

The Clustrix repository follows standard Python project organization:

clustrix/
├── clustrix/              # Main package source code
│   ├── __init__.py       # Package initialization and public API
│   ├── decorator.py      # @cluster decorator implementation
│   ├── config.py         # Configuration management
│   ├── executor_*.py     # Execution engines (core, connections, schedulers, etc.)
│   ├── filesystem.py     # Cross-cluster filesystem utilities
│   ├── utils.py          # Core utilities and job management
│   ├── cli.py            # Command line interface
│   ├── kubernetes/       # Kubernetes providers (AWS, GCP, Azure, etc.)
│   └── pricing_clients/  # Cost monitoring integrations
├── tests/                # Test suite organized by category
│   ├── unit/            # Fast unit tests (run in CI)
│   ├── integration/     # Integration tests (run in CI)
│   ├── real_world/      # Tests requiring actual cluster access
│   ├── comprehensive/   # Performance and edge case tests
│   └── infrastructure/  # Test infrastructure setup
├── docs/                # Documentation and tutorials
│   ├── source/          # Sphinx documentation source
│   ├── notebooks/       # Tutorial notebooks
│   └── *.md             # Various documentation files
├── scripts/             # Utility scripts for development
│   ├── check_quality.py        # Code quality validation
│   ├── pre_push_check.py       # Pre-push validation script
│   └── run_real_world_tests.py # Real world test runner
└── .github/             # CI/CD configuration
    ├── workflows/       # GitHub Actions workflows
    └── ISSUE_TEMPLATE/  # Issue templates

Development Workflow

Testing: Use pytest tests/unit/ tests/integration/ for development testing
Quality Checks: Run python scripts/check_quality.py before committing
Real World Tests: Use python scripts/run_real_world_tests.py when credentials available
Documentation: Build with cd docs && make html

Dependencies

Clustrix automatically handles dependency management by:

Capturing your current Python environment with pip freeze
Creating virtual environments on cluster nodes
Installing exact package versions to match your local environment
Supporting conda environments for complex scientific software stacks

Error Handling and Monitoring

from clustrix import ClusterExecutor

# Monitor job status
executor = ClusterExecutor(clustrix.get_config())
job_id = "12345"
status = executor._check_job_status(job_id)

# Cancel jobs if needed
executor.cancel_job(job_id)

Examples

Machine Learning Training

@cluster(cores=8, memory='32GB', time='12:00:00', partition='gpu')
def train_neural_network(training_data, model_config):
    import tensorflow as tf
    
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dense(10, activation='softmax')
    ])
    
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
    model.fit(training_data, epochs=model_config['epochs'])
    
    return model.get_weights()

# Execute training on cluster
weights = train_neural_network(my_data, {'epochs': 50})

Scientific Computing

@cluster(cores=16, memory='64GB')
def monte_carlo_simulation(n_samples=1000000):
    import numpy as np
    
    # This loop will be automatically parallelized
    results = []
    for i in range(n_samples):
        x, y = np.random.random(2)
        if x*x + y*y <= 1:
            results.append(1)
        else:
            results.append(0)
    
    pi_estimate = 4 * sum(results) / len(results)
    return pi_estimate

pi_value = monte_carlo_simulation(10000000)

Data Processing Pipeline

@cluster(cores=8, memory='16GB')
def process_large_dataset(file_path, chunk_size=10000):
    import pandas as pd
    
    results = []
    for chunk in pd.read_csv(file_path, chunksize=chunk_size):
        # Process each chunk
        processed = chunk.groupby('category').sum()
        results.append(processed)
    
    return pd.concat(results)

# Process data on cluster
processed_data = process_large_dataset('/path/to/large_file.csv')

Testing Philosophy

Clustrix follows a strict NO MOCKS testing policy. All tests use real infrastructure to ensure reliability:

✅ Real Infrastructure: Tests run on actual clusters, containers, and cloud services
✅ Real Computations: Genuine data processing validates functionality
✅ Real Failures: Actual error conditions test recovery mechanisms
❌ No Mock Objects: Zero use of @patch, Mock(), or simulations

Running Tests

# Quick unit tests (local execution)
pytest tests/ -m "not real_world"

# Comprehensive real-world tests (requires infrastructure)
python tests/run_real_world_tests.py

# Setup local test infrastructure (Docker-based)
python tests/infrastructure/setup_test_infrastructure.py setup

# Run specific test categories
pytest tests/comprehensive/test_edge_cases_real.py
pytest tests/comprehensive/test_performance_benchmarks_real.py
pytest tests/comprehensive/test_failure_recovery_real.py

Test Infrastructure

Clustrix provides Docker-based local test infrastructure for cost-free testing:

Kubernetes: Kind (Kubernetes in Docker) cluster
SSH Server: OpenSSH test server on port 2222
SLURM Mock: Simulated SLURM scheduler
MinIO: S3-compatible object storage
PostgreSQL: Database for state management
Redis: Cache and message queue

Test Categories

Unit Tests: Core functionality with local execution
Integration Tests: Multi-component interactions with real services
Edge Cases: Boundary conditions and unusual scenarios
Performance: Latency, throughput, and scalability benchmarks
Failure Recovery: Error handling and resilience testing

Code Quality

Clustrix maintains high code quality standards:

Testing: Real-world tests following NO MOCKS principle
Code Style: Enforced with Black formatter
Linting: Checked with flake8
Type Checking: Validated with mypy
CI/CD: GitHub Actions for automated testing

To check code quality locally:

# Run comprehensive quality check (auto-retries until clean)
python scripts/pre_push_check.py

# Run individual checks
pytest --cov=clustrix --cov-report=term
black clustrix/
flake8 clustrix/
mypy clustrix/

Pre-commit Hooks

Pre-commit hooks automatically run quality checks before each commit:

# Install pre-commit hooks
pre-commit install

# Manual run
pre-commit run --all-files

Additional Documentation

For more detailed information on specific topics, see the organized documentation in the docs/ directory:

Cloud Provider Setup

AWS Setup Guide - Complete AWS permissions configuration
AWS Console Quick Steps - Fast AWS setup guide
AWS EKS Policy Setup - EKS-specific policy configuration
AWS EKS Troubleshooting - Common AWS access issues

GPU Computing

GPU Parallelization Design - Comprehensive GPU parallelization guide
GPU Detection Fix - GPU detection troubleshooting

Technical Design

Function Dependency Design - Function dependency resolution architecture
Complexity Threshold Analysis - Function complexity analysis and optimization

Complete Documentation

Full Documentation - Complete API reference and tutorials
SSH Setup Tutorial - Interactive SSH key automation guide

Contributing

We welcome contributions! Please see our Contributing Guide for details.

License

Clustrix is released under the MIT License. See LICENSE for details.

Support

Documentation: https://clustrix.readthedocs.io
Issues: https://github.com/ContextLab/clustrix/issues

Name		Name	Last commit message	Last commit date
Latest commit History 520 Commits
.claude		.claude
.github		.github
clustrix		clustrix
docs		docs
examples		examples
scripts		scripts
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS.md		CONTRIBUTORS.md
LICENSE		LICENSE
MIGRATION.md		MIGRATION.md
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
setup.py		setup.py

License

ContextLab/clustrix

Folders and files

Latest commit

History

Repository files navigation

Clustrix

Features

Quick Start

Installation

Basic Configuration

Using the Decorator

Jupyter Notebook Integration

Interactive Configuration Widget

Default View

Configuration Dropdown

Cluster Configuration Examples

SLURM Cluster Configuration

Cloud Provider Configuration

Key Widget Features

Using the Widget

Configuration File

SSH Key Automation

Quick Setup Methods

Method 1: Jupyter Widget (Recommended)

Method 2: CLI Command

Method 3: Python API

Key Features

Password Fallback System

Enterprise Cluster Support

Advanced Usage

Unified Filesystem Utilities

Cost Monitoring

Custom Resource Requirements

Manual Parallelization Control

Different Cluster Types

Cloud Provider Integration

AWS Integration

Google Cloud Integration

Azure Integration

Lambda Cloud Integration

HuggingFace Spaces Integration

Command Line Interface

How It Works

Important Notes

Supported Cluster Types

Repository Structure

Development Workflow

Dependencies

Error Handling and Monitoring

Examples

Machine Learning Training

Scientific Computing

Data Processing Pipeline

Testing Philosophy

Running Tests

Test Infrastructure

Test Categories

Code Quality

Pre-commit Hooks

Additional Documentation

Cloud Provider Setup

GPU Computing

Technical Design

Complete Documentation

Contributing

License

Support

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Packages