Skip to content
Discussion options

You must be logged in to vote

if you have 2 GPUs, make sure only 1 is used for training and 1 is used for inference

so CUDA_VISIBLE_DEVICES=0 vf-vllm .... and CUDA_VISIBLE_DEVICES=1 accelerate launch ...

vLLM will take ~all (up to 90%) of the memory you allow it to see, which leaves no space for the trainer on the same GPUs

this should work with a 1.5B model if you tune your batch size / context length

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@oxysoft
Comment options

Answer selected by oxysoft
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants