You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the introduction of the GGML backend API, we're taking a big step towards portability of GGML-based binaries. One remaining challenge before full portability is lack of CPU dispatching, thus compiling different binaries to support different SIMD versions. Instead, we can think of supporting compilation of all SIMD sets and determining which one to use at runtime. This may slightly increase the binary file size, but I think it's negligible compared to gigabytes of model weights. It may bring a huge benefit to distribute GGML-based binaries far more easily and get the most out of all machines.
I opened this topic as a discussion because it's a philosophical one and maintainers may not like it, but it can be converted to an issue if it's regarded as a valid point.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
With the introduction of the GGML backend API, we're taking a big step towards portability of GGML-based binaries. One remaining challenge before full portability is lack of CPU dispatching, thus compiling different binaries to support different SIMD versions. Instead, we can think of supporting compilation of all SIMD sets and determining which one to use at runtime. This may slightly increase the binary file size, but I think it's negligible compared to gigabytes of model weights. It may bring a huge benefit to distribute GGML-based binaries far more easily and get the most out of all machines.
I opened this topic as a discussion because it's a philosophical one and maintainers may not like it, but it can be converted to an issue if it's regarded as a valid point.
To make sure that we're on the same page, here's a good read about CPU dispatching in the sense that I understand and/or mean it.
Beta Was this translation helpful? Give feedback.
All reactions