vllm-project
diff --git a/‎.buildkite/nightly-benchmarks/nightly-annotation.md
Lines changed: 1 addition & 1 deletion b/‎.buildkite/nightly-benchmarks/nightly-annotation.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/deployment/docker.md
Lines changed: 6 additions & 6 deletions b/‎docs/deployment/docker.md
Lines changed: 6 additions & 6 deletions
diff --git a/‎docs/deployment/frameworks/anything-llm.md
Lines changed: 1 addition & 1 deletion b/‎docs/deployment/frameworks/anything-llm.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/deployment/frameworks/autogen.md
Lines changed: 2 additions & 2 deletions b/‎docs/deployment/frameworks/autogen.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/deployment/frameworks/cerebrium.md
Lines changed: 3 additions & 3 deletions b/‎docs/deployment/frameworks/cerebrium.md
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/deployment/frameworks/chatbox.md
Lines changed: 1 addition & 1 deletion b/‎docs/deployment/frameworks/chatbox.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/deployment/frameworks/dify.md
Lines changed: 2 additions & 2 deletions b/‎docs/deployment/frameworks/dify.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/deployment/frameworks/dstack.md
Lines changed: 2 additions & 2 deletions b/‎docs/deployment/frameworks/dstack.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/deployment/frameworks/haystack.md
Lines changed: 2 additions & 2 deletions b/‎docs/deployment/frameworks/haystack.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/deployment/frameworks/helm.md
Lines changed: 2 additions & 2 deletions b/‎docs/deployment/frameworks/helm.md
Lines changed: 2 additions & 2 deletions
@@ -16,7 +16,7 @@ Please download the visualization scripts in the post
   - Download `nightly-benchmarks.zip`.
   - In the same folder, run the following code:
 
-  ```console
+  ```bash
   export HF_TOKEN=<your HF token>
   apt update
   apt install -y git
 
@@ -10,7 +10,7 @@ title: Using Docker
 vLLM offers an official Docker image for deployment.
 The image can be used to run OpenAI compatible server and is available on Docker Hub as [vllm/vllm-openai](https://hub.docker.com/r/vllm/vllm-openai/tags).
 
-```console
+```bash
 docker run --runtime nvidia --gpus all \
     -v ~/.cache/huggingface:/root/.cache/huggingface \
     --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
@@ -22,7 +22,7 @@ docker run --runtime nvidia --gpus all \
 
 This image can also be used with other container engines such as [Podman](https://podman.io/).
 
-```console
+```bash
 podman run --gpus all \
   -v ~/.cache/huggingface:/root/.cache/huggingface \
   --env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
@@ -71,7 +71,7 @@ You can add any other [engine-args][engine-args] you need after the image tag (`
 
 You can build and run vLLM from source via the provided <gh-file:docker/Dockerfile>. To build vLLM:
 
-```console
+```bash
 # optionally specifies: --build-arg max_jobs=8 --build-arg nvcc_threads=2
 DOCKER_BUILDKIT=1 docker build . \
     --target vllm-openai \
@@ -99,7 +99,7 @@ of PyTorch Nightly and should be considered **experimental**. Using the flag `--
 
 ??? Command
 
-    ```console
+    ```bash
     # Example of building on Nvidia GH200 server. (Memory usage: ~15GB, Build time: ~1475s / ~25 min, Image size: 6.93GB)
     python3 use_existing_torch.py
     DOCKER_BUILDKIT=1 docker build . \
@@ -118,7 +118,7 @@ of PyTorch Nightly and should be considered **experimental**. Using the flag `--
 
     Run the following command on your host machine to register QEMU user static handlers:
 
-    ```console
+    ```bash
     docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
     ```
 
@@ -128,7 +128,7 @@ of PyTorch Nightly and should be considered **experimental**. Using the flag `--
 
 To run vLLM with the custom-built Docker image:
 
-```console
+```bash
 docker run --runtime nvidia --gpus all \
     -v ~/.cache/huggingface:/root/.cache/huggingface \
     -p 8000:8000 \
 
@@ -15,7 +15,7 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac
 
 - Start the vLLM server with the supported chat completion model, e.g.
 
-```console
+```bash
 vllm serve Qwen/Qwen1.5-32B-Chat-AWQ --max-model-len 4096
 ```
 
 
@@ -11,7 +11,7 @@ title: AutoGen
 
 - Setup [AutoGen](https://microsoft.github.io/autogen/0.2/docs/installation/) environment
 
-```console
+```bash
 pip install vllm
 
 # Install AgentChat and OpenAI client from Extensions
@@ -23,7 +23,7 @@ pip install -U "autogen-agentchat" "autogen-ext[openai]"
 
 - Start the vLLM server with the supported chat completion model, e.g.
 
-```console
+```bash
 python -m vllm.entrypoints.openai.api_server \
     --model mistralai/Mistral-7B-Instruct-v0.2
 ```
 
@@ -11,14 +11,14 @@ vLLM can be run on a cloud based GPU machine with [Cerebrium](https://www.cerebr
 
 To install the Cerebrium client, run:
 
-```console
+```bash
 pip install cerebrium
 cerebrium login
 ```
 
 Next, create your Cerebrium project, run:
 
-```console
+```bash
 cerebrium init vllm-project
 ```
 
@@ -58,7 +58,7 @@ Next, let us add our code to handle inference for the LLM of your choice (`mistr
 
 Then, run the following code to deploy it to the cloud:
 
-```console
+```bash
 cerebrium deploy
 ```
 
 
@@ -15,7 +15,7 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac
 
 - Start the vLLM server with the supported chat completion model, e.g.
 
-```console
+```bash
 vllm serve qwen/Qwen1.5-0.5B-Chat
 ```
 
 
@@ -18,13 +18,13 @@ This guide walks you through deploying Dify using a vLLM backend.
 
 - Start the vLLM server with the supported chat completion model, e.g.
 
-```console
+```bash
 vllm serve Qwen/Qwen1.5-7B-Chat
 ```
 
 - Start the Dify server with docker compose ([details](https://github.com/langgenius/dify?tab=readme-ov-file#quick-start)):
 
-```console
+```bash
 git clone https://github.com/langgenius/dify.git
 cd dify
 cd docker
 
@@ -11,14 +11,14 @@ vLLM can be run on a cloud based GPU machine with [dstack](https://dstack.ai/),
 
 To install dstack client, run:
 
-```console
+```bash
 pip install "dstack[all]
 dstack server
 ```
 
 Next, to configure your dstack project, run:
 
-```console
+```bash
 mkdir -p vllm-dstack
 cd vllm-dstack
 dstack init
 
@@ -13,15 +13,15 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac
 
 - Setup vLLM and Haystack environment
 
-```console
+```bash
 pip install vllm haystack-ai
 ```
 
 ## Deploy
 
 - Start the vLLM server with the supported chat completion model, e.g.
 
-```console
+```bash
 vllm serve mistralai/Mistral-7B-Instruct-v0.1
 ```
 
 
@@ -22,15 +22,15 @@ Before you begin, ensure that you have the following:
 
 To install the chart with the release name `test-vllm`:
 
-```console
+```bash
 helm upgrade --install --create-namespace --namespace=ns-vllm test-vllm . -f values.yaml --set secrets.s3endpoint=$ACCESS_POINT --set secrets.s3bucketname=$BUCKET --set secrets.s3accesskeyid=$ACCESS_KEY --set secrets.s3accesskey=$SECRET_KEY
 ```
 
 ## Uninstalling the Chart
 
 To uninstall the `test-vllm` deployment:
 
-```console
+```bash
 helm uninstall test-vllm --namespace=ns-vllm
 ```