Skip to content

Commit 3253079

Browse files
authored
Merge pull request #345 from vbedida79/patch-191224-2
tests_gaudi: Update vllm workload and readme
2 parents 51d0fa9 + 3fd0fe7 commit 3253079

File tree

2 files changed

+41
-1
lines changed

2 files changed

+41
-1
lines changed

tests/gaudi/l2/README.md

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,10 @@ Welcome to HCCL demo
7979
## vLLM
8080
vLLM is a serving engine for LLM's. The following workloads deploys a VLLM server with an LLM using Intel Gaudi. Refer to [Intel Gaudi vLLM fork](https://github.com/HabanaAI/vllm-fork.git) for more details.
8181

82+
Use the gaudi-validation project
83+
```
84+
$ oc project gaudi-validation
85+
```
8286
Build the workload container image:
8387
```
8488
git clone https://github.com/HabanaAI/vllm-fork.git --branch v1.18.0
@@ -104,6 +108,7 @@ $ oc apply -f https://raw.githubusercontent.com/intel/intel-technology-enabling-
104108
```
105109
meta-llama/Llama-3.1-8B model is used in this deployment and the hugging face token is used to access such gated models.
106110
* For the PV setup with NFS, refer to [documentation](https://docs.openshift.com/container-platform/4.17/storage/persistent_storage/persistent-storage-nfs.html).
111+
* The vLLM pod needs to access the host's shared memory for tensor parallel inference, which is mounted as a volume.
107112
```
108113
$ oc apply -f https://raw.githubusercontent.com/intel/intel-technology-enabling-for-openshift/main/tests/gaudi/l2/vllm_deployment.yaml
109114
```
@@ -160,7 +165,27 @@ Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:10<00:03, 3.59s/i
160165
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:11<00:00, 2.49s/it]
161166
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:11<00:00, 2.93s/it]
162167
```
163-
Run inference requests using the service url.
168+
169+
* The internal service url is used to run inference requests to the vLLM server. This service is only accessible from pods running within the same namespace i.e gaudi-validation. Run the below commands to create a sample pod and run requests.
170+
171+
```
172+
$ oc apply -f https://raw.githubusercontent.com/intel/intel-technology-enabling-for-openshift/main/tests/gaudi/l2/test-pod.yaml
173+
```
174+
175+
Check for the pod
176+
177+
```
178+
$ oc get pods
179+
NAME READY STATUS RESTARTS AGE
180+
test 1/1 Running 0 2s
181+
```
182+
183+
Use the command below to enter pod terminal to run curl requests
184+
185+
```
186+
$ oc debug pod/test
187+
```
188+
164189
```
165190
sh-5.1# curl "http://vllm-workload.gaudi-validation.svc.cluster.local:8000/v1/models"{"object":"list","data":[{"id":"meta-llama/Llama-3.1-8B","object":"model","created":1730317412,"owned_by":"vllm","root":"meta-llama/Llama-3.1-8B","parent":null,"max_model_len":131072,"permission":[{"id":"modelperm-452b2bd990834aa5a9416d083fcc4c9e","object":"model_permission","created":1730317412,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}
166191
```

tests/gaudi/l2/test-pod.yaml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Copyright (c) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
apiVersion: v1
4+
kind: Pod
5+
metadata:
6+
name: test
7+
labels:
8+
app: test
9+
namespace: gaudi-validation
10+
spec:
11+
containers:
12+
- name: test
13+
command: [ "/bin/bash", "-c", "--" ]
14+
args: [ "while true; do sleep 30; done;"]
15+
image: registry.access.redhat.com/ubi9-minimal:latest

0 commit comments

Comments
 (0)