Replies: 2 comments 5 replies
-
Beta Was this translation helpful? Give feedback.
-
This isn't exactly the same, but I've recently been thinking that an unpacked GGUF format could be really useful. By that, I mean the different GGUF metadata keys could just be text files in a simple format. The actual weights could be separate binary files. To load it, you'd just The way it relates to this discussion is with that approach, you could easily swap out layers/weights, change metadata, etc, without having to rewrite huge files or deal with file format organization issues. Or include extra weights, of course. It could also be really useful for testing stuff like different approaches to quantizing different layers, weights, etc. Currently testing that (especially on bigger models) is really awkward and time consuming. If you could just swap tensors with different versions, testing a bunch of permutations would be wayyy easier. I actually don't think this would be too hard to implement either. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I don't think there is going to be much interest in this idea, but thought I would put it out there for any feedback.
I've been working on a weight layer block system. The idea being that lately a number of merged models have had different versions with either different order of the weight layers from the merged parent models or different blocks of layers from each parent. So I was thinking about a system that would allow a single model file that contained more layers than is going to normally be used at a time. But with definition files that defined what layers and how they were used from that larger model file.
I wrote a (rough work in progress) overview of the idea here: https://pastebin.com/6VT3SUy9
That also includes a link to the (extremely rough) work in progress code. That code is very much just a proof of concept and needs a lot of work. However with not knowing if there is any interest or use in this idea and with the Parallel decoding + continuous batching support comits coming that I believe will break what I've done. I'm not sure if it is worth continuing with.
Beta Was this translation helpful? Give feedback.
All reactions