Welcome to the Model Server and Metrics demo! This repository showcases key capabilities of the Red Hat OpenShift AI platform—specifically around deploying and monitoring machine learning (ML) models at scale using KServe.
This demo highlights the work of the Red Hat OpenShift AI Model Server and Metrics team. Our goal is to:
- Deploy and monitor containerized ML models across hybrid cloud environments.
- Enable real-time observability and ensure performance and reliability of models.
- Provide end-to-end ML lifecycle management via scalable and modular serving infrastructure.
- Show real-world use cases of serving and monitoring models on OpenShift AI.
- Demonstrate specialized features like transformers, autoscaling, and multi-node LLM support.
- Highlight tools that improve observability, performance, and reliability.
We deploy a lightweight image classification model that detects dog breeds from images using OpenShift AI and KServe.
- Basic inference service
- Transformer Integration: Clean and normalize input data before inference.
- Autoscaling (Scale to 0): Efficient resource use with demand-based scaling.
- Stop/Resume Annotations: Pause/resume models for experimental workflows.
- Metrics & Dashboards: Real-time insight into model health and performance.
- See: Basic demo steps
Large language models (LLMs) often exceed the memory limits of a single node.
Multi-node support allows you to split a single model inference across several nodes in the cluster.
- Horizontally scales LLM deployments across the cluster.
- No longer constrained by single-node GPU memory.
Chain multiple models together so the output of one becomes the input of the next—all behind a single endpoint.
Example:
Dog breed → LLM poem generation about that dog → Combined response
See: Inference Graph demo steps
- Simplifies application logic with a declarative graph managed by KServe.
- Provides a unified API endpoint for complex model workflows.
- Efficient resource use—each
InferenceService
can scale independently.
- Hannah DeFazio – Red Hat OpenShift AI
- Mariah Holder – Red Hat OpenShift AI