A comprehensive web-based management interface for OpenStack GPU compute resources, providing unified control over host aggregates, VM deployments, and multi-cloud resource management.
The OpenStack Spot Manager is a Flask-based application that enables efficient management of GPU hosts across different resource pools (on-demand, spot, contract), with integrated support for RunPod deployments via Hyperstack API. The system provides real-time monitoring, drag-and-drop host migration, and automated resource optimization.
- Multi-Pool Host Management: Manage hosts across on-demand, spot, and contract aggregates
- Real-time GPU Monitoring: Track GPU utilization and VM counts across all hosts
- Drag-and-Drop Operations: Intuitive interface for host migrations between pools
- Modular Column System: Clean, uniform column layout with consistent spacing and headers
- Contract Management: Dedicated interface with filtering for multi-tenant contract aggregate management
- Background Data Loading: Automatic preloading of GPU data with intelligent caching
- Intelligent Cache Updates: Instant UI feedback - VM launches and host migrations appear immediately without waiting for cache expiry
- RunPod Integration: Deploy VMs directly to RunPod platform via Hyperstack API
- NetBox Integration: Automatic tenant and owner group classification
- Parallel Data Collection: 4-agent concurrent system reduces load times from ~300s to ~30s
- Smart Caching: Multi-level TTL-based caching with targeted updates
- Bulk Operations: Concurrent processing for large-scale operations
- Command Logging: Complete audit trail of all operations
- Responsive Design: Bootstrap-based UI that works on all devices
- L40: High-performance compute GPUs
- RTX-A6000: Professional workstation GPUs
- A100: Data center AI/ML GPUs
- H100: Next-generation AI training GPUs
- Parallel Data Collection: 4-agent concurrent system processes 100+ hosts in ~30s vs previous ~300s
- Smart Cache Updates: VM launches and host migrations update cache instantly instead of waiting 10 minutes
- Multi-Level TTL Caching: NetBox (30min), Aggregates (1hr), Parallel data (10min) with targeted invalidation
- Real-Time Feedback: UI shows changes immediately without manual refresh
- 10x Faster: Data collection optimized from 5+ minutes to <30 seconds
- Instant Updates: Operations appear in UI immediately vs 10-minute cache wait
- Reduced API Load: Intelligent caching minimizes redundant OpenStack API calls
- Better UX: No more "refresh and wait" - changes appear instantly
- OpenStack environment with properly configured aggregates
- Python 3.8+ and pip
- Network access to OpenStack APIs
# Clone the repository
git clone https://github.com/your-org/openstack-spot-manager.git
cd openstack-spot-manager
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env with your OpenStack credentials
# Run the application
python app.py
Navigate to http://localhost:6969
to access the web interface.
- Deployment Guide - Complete setup and deployment instructions
- API Documentation - REST API endpoints and examples
- Architecture Overview - System design and component interactions
- Frontend Guide - JavaScript modules and UI components
# OpenStack Authentication
OS_AUTH_URL=https://your-openstack.com:5000/v3
OS_USERNAME=your-username
OS_PASSWORD=your-password
OS_PROJECT_NAME=your-project
OS_USER_DOMAIN_NAME=Default
OS_PROJECT_DOMAIN_NAME=Default
# NetBox DCIM Integration
NETBOX_URL=https://your-netbox.com
NETBOX_API_KEY=your-netbox-token
# RunPod/Hyperstack Integration
HYPERSTACK_API_KEY=your-hyperstack-key
RUNPOD_API_KEY=your-runpod-key
Real-time GPU utilization across all resource pools
Drag-and-drop host migration between aggregates
Dedicated contract aggregate management interface
The system consists of:
- Flask Backend: REST API server with OpenStack integration
- JavaScript Frontend: Responsive web interface with real-time updates
- External Integrations: OpenStack, NetBox, Hyperstack APIs
- Background Processing: Concurrent data loading and caching
# Run in development mode with auto-reload
export FLASK_ENV=development
python app.py
# Run test suite
python -m pytest tests/
# Test specific components
python -m pytest tests/test_api.py
# Install Gunicorn
pip install gunicorn
# Run with Gunicorn
gunicorn -w 4 -b 0.0.0.0:6969 app:app
# Build container
docker build -t openstack-spot-manager .
# Run container
docker run -p 6969:6969 --env-file .env openstack-spot-manager
See Deployment Guide for complete production setup instructions.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
- All credentials are managed via environment variables
- API keys are masked in logs and UI
- Comprehensive input validation on all endpoints
- Secure session management
- Issues: Report issues on GitHub Issues
- Discussions: Community discussions on GitHub Discussions
- Documentation: Complete docs in the
/docs
directory
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenStack SDK for cloud integration
- Bootstrap for responsive UI framework
- Font Awesome for icons and visual elements