Multiplication Ops for int16 #1601
pablogranolabar
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
So I am in the process of releasing a research paper related to int32 and float32 representation using int16 variable space. This is a method in a similar vein as a time/memory tradeoff attack, where instead of using traditional addition-based bitwise operators, multiplication operators are used within the same int16 memory space to provide a continuous int32+ or float32+ representation at the expense of front end computational resources. Which shouldn't be such a big deal soon, given AMD's decision to expand AVX-512 acceleration primitives while Intel is shelving the same, so in theory this method could be CPU-accelerated at the tensor level and even plugged into PyTorch using an ATen subclass.
So an int16 variable describes a sequence of flags which are used with multiplication operators to represent a continuous space larger than int32/float32. The POC library will be released in a similar fashion as GNU MP Bignum, which is a multiprecision library used to wrangle with 2048+ bit large numbers for things like cryptographic key material generation.
The thought would be, to first refactor the ggml/llama weight conversion scripts to accommodate the smaller int16 representation, and then integrate the float32 functions in llama/ggml inference. And then explore the AVX-512 acceleration idea from there.
Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions