Loading microsoft/Phi-3-medium-128k-instruct with vLLM. #5106

AkshataDM · 2024-05-29T17:42:06Z

AkshataDM
May 29, 2024

i am using an NVIDIA A100 80GB MIG 3g.40gb slice to deploy microsoft/Phi-3-medium-128k-instruct (~26gb) using vllm. However, i keep running into OOM issues. here is how i am initializing the model:
engine_args = AsyncEngineArgs(
model="microsoft/Phi-3-medium-128k-instruct",
gpu_memory_utilization=0.8,
dtype=torch.float16,
enforce_eager=True,
trust_remote_code=True
)
loaded_llm = AsyncLLMEngine.from_engine_args(engine_args)

and this is the error:
RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":844, please report a bug to PyTorch.

any suggestions on what parameters to tweak to make this model fit in my 40g mig slice?

AkshataDM · 2024-06-11T19:43:15Z

AkshataDM
Jun 11, 2024
Author

bump

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Loading microsoft/Phi-3-medium-128k-instruct with vLLM. #5106

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Loading microsoft/Phi-3-medium-128k-instruct with vLLM. #5106

Uh oh!

Uh oh!

AkshataDM May 29, 2024

Replies: 1 comment

Uh oh!

AkshataDM Jun 11, 2024 Author

AkshataDM
May 29, 2024

AkshataDM
Jun 11, 2024
Author