How to Offload Models and Load in Other Ones #1904
jmsalvador2395
announced in
Q&A
Replies: 2 comments 4 replies
-
I am also looking for an answer. |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have some example code here
I want to be able to load in another model and run inference using the same prompt. Is there a way for me to offload the current model safely?
For context, I've tried calling
del llm
anddel llm.llm_engine
along with callingtorch.cuda.empty_cache()
and saw that it causes issues.When trying to create another llm object I get the message
2023-12-03 03:37:22,346 INFO worker.py:1507 -- Calling ray.init() again after it has already been called
. Then after callingray.shutdown()
and trying to define a new llm object, i get the warning that tokenizers parallelism was disabled.Any help is appreciated.
Beta Was this translation helpful? Give feedback.
All reactions