-
On my 2.2b model, my M3 Max laptop takes ~16 hours to produce it for a 96MB dataset (1k samples per language). For the 7.7b version that I will do next, it should be 5k samples per language, which is 464MB and I am guesstimating about 320 hours to train on my laptop (4 x model size * 5 times total tokens). I am considering using a cloud service for this. The thing is, I've not set the values for batch size and ubatch on my MacBook yet, just threads. Before I migrate this task to a docker env to bring to a cloud service, or however that will go, I want to know how to dynamically set the batch/ubatch size for optimizing the time this will run. Since I'm running this imatrix one last time on my MacBook currently, I haven't played with it much, but I have a Mac mini. I dont see much of a difference in efficiency changing batch size with my M1 mini, which can't fit the model it is building for into memory (16gb total memory, 7.7b model): going down from default to 1k yielded about 15 minutes improvement in 60 hours. How can I optimize the run time? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
@bartowski1182 @TheBloke please forgive this cold call out -- would either of you know? |
Beta Was this translation helpful? Give feedback.
-
FYIW https://gist.github.com/robbiemu/4f53fd8d02eabbecbeb164ee0957e01b |
Beta Was this translation helpful? Give feedback.
@bartowski1182 @TheBloke please forgive this cold call out -- would either of you know?