Skip to content

Open source the actual production configuration #11

@p-bizouard

Description

@p-bizouard
models:
  - name: llama3 #must be lowercase
    model: "casperhansen/llama-3-70b-instruct-awq"
    servedModelName: ""
    quantization: "awq"
    dtype: ""
    gpuMemoryUtilization: "0.96"
    huggingface_token: ""
    ropeScaling:
      enabled: true
      jsonConfiguration: '{"type":"dynamic","factor":4.0}'
      theta: "500000"
    replicaCount: 1
    pvc:
      enabled: true
      storageSize: 60Gi

sender:
  image:
    tag: v1.1.1
consumer:
  image:
    tag: v1.1.1
inferenceserver:
  image:
    repository: vllm/vllm-openai
    tag: v0.5.0

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions