Skip to content

A Kubernetes serving manager for machine learning inference system enabled with NVIDIA MIG/MPS GPU-Sharing support

License

Notifications You must be signed in to change notification settings

deeeelin/SSIS-Dispatcher

Repository files navigation

SSIS-Dispatcher

About

The SSIS-Dispatcher project is a subproject branched from the SSIS(Scalable Serving Inference System for Language Models with NVIDIA MIG) project. It is a served as a serving manager component in the system. SSIS-Dispatcher is capable of receiving model inference requests and luanching inference pod under Knative framework while leveraging GPU sharing features supported my Nvidia Multi-Instance GPU(MIG) or Multi-Process Service (MPS), which allows finegrained unitlization of GPU resources, enhancing system efficiency.

  • Check out the K-SSIS Repository, for additional autoscaler or performance monitor support.

Getting Start

Prerequisite

  • Requires a kubernetes cluster with version > 1.28
  • This demo project default runs all knative service, pods on nthulab namespace
  • You should have MIG or MPS kubernetes resource registered on your cluster

1. Setup Knative and Kourier Ingress/ Load Balancer

  • Run make setup_knative
  • k get po -n kourier-system, check if kourier gateway is running
  • k get svc -n kourier-system, check if kourier svc and kourier-internal service is established
  • You can use curl <kourier service external ip> to test kourier external gateway or run a pod on cluster that runs curl http://kourier-internal.kourier-system.svc.cluster.local to check the in-cluster gateway is operating
  • Use kn service list and find the url for the dispatcher, ex: http://dispatcher.nthulab.192.168.1.10.sslip.io

2. Build Your Own Dispatcher Image (Optional)

  • If you want to build your own dispatcher image, Run make build

3. Deploy dispatcher

  • Run make deploy to deploy your own dispatcher image, run kubectl apply -f https://raw.githubusercontent.com/deeeelin/SSIS-Dispatcher/main-deployment/configuration.yaml to deploy prebuilt image from main branch

4. Configure Dispatcher and Restart pod

  • Run `kubectl edit configmap dispatcher-config

  • Edit data section to set service namespace, inference image and GPU resource names that applies to your system environment

    • The MIG resource defined in node may have the example resource name format below:
    nvidia.com/mig-1g.5gb
    nvidia.com/mig-2g.10gb
    nvidia.com/mig-3g.20gb
    nvidia.com/mig-4g.20gb
    nvidia.com/mig-7g.40gb
    
    • The nebuly MPS resource defined in node may have the example the resource name format below:
    nvidia.com/gpu-1gb
    nvidia.com/gpu-2gb
    nvidia.com/gpu-3gb
    nvidia.com/gpu-4gb
    ...
    nvidia.com/gpu-30gb
    nvidia.com/gpu-31gb
    nvidia.com/gpu-32gb
    
  • Restart the dispatcher pod to reload configurations (by deleting it)

5. Forward kourier in-cluster gateway

  • Assume the cluster external ip is unavailable, we make our test using in-cluster ip, which is likely available in most cases

  • Open another terminal window , then : make forward

6. Send test API request to Dispatcher

  • Export your HuggingFace token : export HF_TOKEN="<Your token>"
  • Change Directory to /test and install required python package through pip install -r requirements.txt
  • Run python test.py to send sample request to Dispatcher

(OPTIONAL) Send Customize request to Dispatcher

  • Make sure you done all steps above.
  • You can set custom request through modifying /test/payload.json :
{
    "token": "What is Deep Learning?",
    "par": {
        "max_new_tokens": "20"
    },
    "env": {
        "MODEL_ID": "openai-community/gpt2",
        "HF_TOKEN": ""
    }
}

Uninstall Project

  • Delete all service running
  • Run make clean to remove dispatcher
  • Run make remove_knative to remove knative

About

A Kubernetes serving manager for machine learning inference system enabled with NVIDIA MIG/MPS GPU-Sharing support

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages