Replies: 1 comment
-
I guess gguf will need to support it, given the size limitation of Huggingface uploads and the increasing model sizes. A temporary solution is to just use a splitting/recombination utility, so you work with one gguf file locally only |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
If we want to distribute shards of a larger model (say llama 70B or larger) across several machines, we can cut off the architecture and weights at the end of a specified transformer block, outputting intermediate activations, which would get fed back into the next shard.
How easy/hard would it be to generate .gguf files that don't lose performance? Is there work being done on that?
If not, I would love to help getting this to work, got it to work on tinygrad already.
Beta Was this translation helpful? Give feedback.
All reactions