Pipeline Invocation Response is Different From Model Invocation Response #6357

charleschangdp · 2025-03-28T20:25:21Z

Describe the bug

My team is working on deploying this HuggingFace model on Seldon v2.8.5. I'm running into this interesting issue where results from invoking model and invoking pipeline are materially different. When calling the model directly via http://localhost:1234/v2/models/cc-hf-test/infer the response is vector as expected. in the data part of the tensor ( ex: "[[-0.19405314326286316, -0.12709179520606995, 0.06382999569177628, -0.1323239356 ...]]" ) But when calling the pipeline I'm getting a base64 encoded response in the tensor data. (ex: "W1stMC4xMDAyNjA4OTg0NzA4Nzg2LCAwLjEzMDAyODcwOTc2OTI0ODk2LCAwLjAxOTk1NTk1OTE3MTA1Njc0Nywg.... " ) The Kafka messages show the response [[-0.19405314326286316, -0.12709179520606995, 0.06382999569177628, -0.1323239356 ...]]" My best guess right now modelgateway is doing this when taking returning a response from Kafka message to Envoy.

To reproduce

model-setting.json

{
 "implementation": "mlserver_huggingface.HuggingFaceRuntime",
 "parameters": {
  "extra": {
   "optimum_model": "true",
   "pretrained_model": "optimum/all-MiniLM-L6-v2",
   "task": "feature-extraction"
  }
 }
}

manifests

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: cc-hf-test
  namespace: seldon
spec:
  requirements:
    - huggingface
  secretName: seldon-rclone-s3-secret
  storageUri: s3://seldonbucket/cc-hf-test/1/
  replicas: 2
---
apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
  annotations:
  name: cc-hf-test
  namespace: seldon
spec:
  capabilities:
  - huggingface
  podSpec:
    containers:
    - image: seldonio/mlserver:1.6.1-huggingface
      name: mlserver
      resources:
        requests:
          cpu:     "2"
          memory:  4Gi
        limits:
          cpu:     "2"
          memory:  4Gi
    serviceAccountName: sa
  replicas: 2
  serverConfig: mlserver
---
apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
  name: cc-hf-test-pipeline
  namespace: seldon
spec:
  output:
    steps:
    - cc-hf-test
    stepsJoin: inner
  steps:
  - inputsJoinType: inner
    name: cc-hf-test

Invocation

curl --location 'http://localhost:1234/v2/pipelines/cc-hf-test-pipeline/infer' \
--header 'Content-Type: application/json' \
--data '{
    "inputs": [
        {
            "name": "args",
            "shape": [
                2
            ],
            "datatype": "BYTES",
            "data": [
                "interesting issue"
            ]
        }      
    ]
}'

Result

{
  "model_name": "",
  "outputs": [
    {
      "data": [
        "W1stMC4xMDAyNjA4OTg0NzA4Nzg2LCAwLjEzMDAyODcwOTc2OT.... TRUNCATED"
      ],
      "name": "output",
      "shape": [
        1,
        1
      ],
      "datatype": "BYTES",
      "parameters": {
        "content_type": "hg_jsonlist"
      }
    }
  ]
}

Expected behaviour

{
    "model_name": "cc-hf-test_1",
    "model_version": "1",
    "id": "261005c3-0721-4406-b7f5-1cab60744815",
    "parameters": {},
    "outputs": [
        {
            "name": "output",
            "shape": [
                1,
                1
            ],
            "datatype": "BYTES",
            "parameters": {
                "content_type": "hg_jsonlist"
            },
            "data": [
                "[[-0.19405314326286316, -0.12709179520606995, 0.06382999569177628,...TRUNCATED]]"
            ]
        }
    ]
}

Environment

Model Details

Images of your model: seldonio/mlserver:1.6.1-huggingface
Logs of your model:

mlserver 2025-03-28 20:17:51,652 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer                                                                                                      
mlserver Ignoring args : ('',)                                                                                                                                                                          
agent time="2025-03-28T20:17:51Z" level=debug msg="Extracted model name seldon-internal-model:cc-hf-test_1 seldon-model:cc-hf-test" Source=GRPCProxy                                                    
agent time="2025-03-28T20:17:51Z" level=debug msg="Ensure that model cc-hf-test_1 is loaded in memory" Source=StateManager                                                                              
agent time="2025-03-28T20:17:51Z" level=debug msg="Model exists in cache cc-hf-test_1" Source=StateManager                                                                                              
agent time="2025-03-28T20:17:51Z" level=debug msg="Request ids from incoming meta [cvjg7rr51kgc73dm4prg]" Source=GRPCProxy

The text was updated successfully, but these errors were encountered:

lc525 · 2025-03-31T11:26:33Z

This is indeed a bug, and we'll look into it after the 2.9 release.
Would you be able to also try the inference with

--data '{
    "inputs": [
        {
            "name": "args",
            "shape": [
                2
            ],
            "datatype": "BYTES",
            "data": [
                "interesting issue"
            ],
            "parameters": {
                "content_type": "str"
            }
        }      
    ]
}'

Internally for pipelines, all requests get converted into gRPC inference requests (kafka topics also contain serialized gRPC), and the format is slightly more strict wrt datatypes when compared to making the inference directly towards the model.

charleschangdp · 2025-03-31T15:40:30Z

Response was the same as without the parameters.

charleschangdp added the bug label Mar 28, 2025

lc525 added the v2 label Mar 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline Invocation Response is Different From Model Invocation Response #6357

Pipeline Invocation Response is Different From Model Invocation Response #6357

charleschangdp commented Mar 28, 2025

lc525 commented Mar 31, 2025

charleschangdp commented Mar 31, 2025

Pipeline Invocation Response is Different From Model Invocation Response #6357

Pipeline Invocation Response is Different From Model Invocation Response #6357

Comments

charleschangdp commented Mar 28, 2025

Describe the bug

To reproduce

Expected behaviour

Environment

Model Details

lc525 commented Mar 31, 2025

charleschangdp commented Mar 31, 2025