vLLM API server running Red Hat Enterprise Linux 9 in a Docker container.
See the vllm documentation for more arguments https://docs.vllm.ai/en/latest/models/engine_args.html
MODEL='TheBloke/dolphin-2.2.1-mistral-7B-AWQ'
QUANTIZATION='awq'
DTYPE='auto'
MAX_MODEL_LEN='4096'
API_SERVER='openai.api_server'
docker compose up -d
docker compose stop
docker compose start