Skip to content

Pipeline Invocation Response is Different From Model Invocation Response #6357

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
charleschangdp opened this issue Mar 28, 2025 · 2 comments
Labels

Comments

@charleschangdp
Copy link

Describe the bug

My team is working on deploying this HuggingFace model on Seldon v2.8.5. I'm running into this interesting issue where results from invoking model and invoking pipeline are materially different. When calling the model directly via http://localhost:1234/v2/models/cc-hf-test/infer the response is vector as expected. in the data part of the tensor ( ex: "[[-0.19405314326286316, -0.12709179520606995, 0.06382999569177628, -0.1323239356 ...]]" ) But when calling the pipeline I'm getting a base64 encoded response in the tensor data. (ex: "W1stMC4xMDAyNjA4OTg0NzA4Nzg2LCAwLjEzMDAyODcwOTc2OTI0ODk2LCAwLjAxOTk1NTk1OTE3MTA1Njc0Nywg.... " ) The Kafka messages show the response [[-0.19405314326286316, -0.12709179520606995, 0.06382999569177628, -0.1323239356 ...]]" My best guess right now modelgateway is doing this when taking returning a response from Kafka message to Envoy.

To reproduce

model-setting.json

{
 "implementation": "mlserver_huggingface.HuggingFaceRuntime",
 "parameters": {
  "extra": {
   "optimum_model": "true",
   "pretrained_model": "optimum/all-MiniLM-L6-v2",
   "task": "feature-extraction"
  }
 }
}

manifests

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: cc-hf-test
  namespace: seldon
spec:
  requirements:
    - huggingface
  secretName: seldon-rclone-s3-secret
  storageUri: s3://seldonbucket/cc-hf-test/1/
  replicas: 2
---
apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
  annotations:
  name: cc-hf-test
  namespace: seldon
spec:
  capabilities:
  - huggingface
  podSpec:
    containers:
    - image: seldonio/mlserver:1.6.1-huggingface
      name: mlserver
      resources:
        requests:
          cpu:     "2"
          memory:  4Gi
        limits:
          cpu:     "2"
          memory:  4Gi
    serviceAccountName: sa
  replicas: 2
  serverConfig: mlserver
---
apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
  name: cc-hf-test-pipeline
  namespace: seldon
spec:
  output:
    steps:
    - cc-hf-test
    stepsJoin: inner
  steps:
  - inputsJoinType: inner
    name: cc-hf-test

Invocation

curl --location 'http://localhost:1234/v2/pipelines/cc-hf-test-pipeline/infer' \
--header 'Content-Type: application/json' \
--data '{
    "inputs": [
        {
            "name": "args",
            "shape": [
                2
            ],
            "datatype": "BYTES",
            "data": [
                "interesting issue"
            ]
        }      
    ]
}'

Result

{
  "model_name": "",
  "outputs": [
    {
      "data": [
        "W1stMC4xMDAyNjA4OTg0NzA4Nzg2LCAwLjEzMDAyODcwOTc2OT.... TRUNCATED"
      ],
      "name": "output",
      "shape": [
        1,
        1
      ],
      "datatype": "BYTES",
      "parameters": {
        "content_type": "hg_jsonlist"
      }
    }
  ]
}

Expected behaviour

{
    "model_name": "cc-hf-test_1",
    "model_version": "1",
    "id": "261005c3-0721-4406-b7f5-1cab60744815",
    "parameters": {},
    "outputs": [
        {
            "name": "output",
            "shape": [
                1,
                1
            ],
            "datatype": "BYTES",
            "parameters": {
                "content_type": "hg_jsonlist"
            },
            "data": [
                "[[-0.19405314326286316, -0.12709179520606995, 0.06382999569177628,...TRUNCATED]]"
            ]
        }
    ]
}

Environment

Model Details

  • Images of your model: seldonio/mlserver:1.6.1-huggingface
  • Logs of your model:
mlserver 2025-03-28 20:17:51,652 [mlserver.grpc] INFO - /inference.GRPCInferenceService/ModelInfer                                                                                                      
mlserver Ignoring args : ('',)                                                                                                                                                                          
agent time="2025-03-28T20:17:51Z" level=debug msg="Extracted model name seldon-internal-model:cc-hf-test_1 seldon-model:cc-hf-test" Source=GRPCProxy                                                    
agent time="2025-03-28T20:17:51Z" level=debug msg="Ensure that model cc-hf-test_1 is loaded in memory" Source=StateManager                                                                              
agent time="2025-03-28T20:17:51Z" level=debug msg="Model exists in cache cc-hf-test_1" Source=StateManager                                                                                              
agent time="2025-03-28T20:17:51Z" level=debug msg="Request ids from incoming meta [cvjg7rr51kgc73dm4prg]" Source=GRPCProxy   
@lc525 lc525 added the v2 label Mar 31, 2025
@lc525
Copy link
Member

lc525 commented Mar 31, 2025

This is indeed a bug, and we'll look into it after the 2.9 release.
Would you be able to also try the inference with

--data '{
    "inputs": [
        {
            "name": "args",
            "shape": [
                2
            ],
            "datatype": "BYTES",
            "data": [
                "interesting issue"
            ],
            "parameters": {
                "content_type": "str"
            }
        }      
    ]
}'

Internally for pipelines, all requests get converted into gRPC inference requests (kafka topics also contain serialized gRPC), and the format is slightly more strict wrt datatypes when compared to making the inference directly towards the model.

@charleschangdp
Copy link
Author

Response was the same as without the parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants