-
On Twitter it was said that x2 A6000 is the minimal configuration that should work to fine-tune Qwen-2.5-1.5b in about an hour for most experiment. I run vf-vllm like this:
Then I run nvidia-smi and see that nearly all the memory is taken
Finally I run my application which starts the GRPOTrainer and see this
which is expected given that there is no memory left for backpropagation. What am I doing wrong? Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
It seems that vllm defaults to using 90% of VRAM for kv cache and should use |
Beta Was this translation helpful? Give feedback.
-
if you have 2 GPUs, make sure only 1 is used for training and 1 is used for inference so vLLM will take ~all (up to 90%) of the memory you allow it to see, which leaves no space for the trainer on the same GPUs this should work with a 1.5B model if you tune your batch size / context length |
Beta Was this translation helpful? Give feedback.
if you have 2 GPUs, make sure only 1 is used for training and 1 is used for inference
so
CUDA_VISIBLE_DEVICES=0 vf-vllm ....
andCUDA_VISIBLE_DEVICES=1 accelerate launch ...
vLLM will take ~all (up to 90%) of the memory you allow it to see, which leaves no space for the trainer on the same GPUs
this should work with a 1.5B model if you tune your batch size / context length