Feature Request: Integrate HiP Attention for Extended Context Length #11910
MubarakHAlketbi
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Description:
This issue requests the integration of HiP (Hierarchically Pruned) Attention into Ollama to enable significantly extended context lengths for supported models. HiP Attention is a training-free method that allows for sub-quadratic cost in Transformer models, making it possible to handle very long contexts with reduced computational resources.
Motivation:
Ollama users frequently need to process long documents, codebases, or conversations. Current context length limitations can hinder the effectiveness of LLMs in these scenarios. Integrating HiP Attention would provide the following benefits:
Proposed Implementation (Suggestions):
Integrate the HiP Attention Library: The core implementation is available at DeepAuto-AI/hip-attention. This repository provides Python bindings and CUDA kernels for implementing HiP Attention. The installation instructions (building from source or using Docker) are provided in the repository.
Model Compatibility: Determine which models within Ollama's supported model set are compatible with HiP Attention. The initial focus could be on models commonly used for long-context tasks.
Configuration Options: Expose configuration options to users, allowing them to:
ainl-hip-offload
version or future versions with this feature).Performance Testing: Thoroughly test the integration to ensure performance gains and stability, especially with very long contexts. Compare performance with and without HiP Attention enabled.
Licensing Considerations: HiP Attention is currently under the FSL-1.1-MIT license, which is free for non-commercial use but transitions to MIT after two years. Ensure that Ollama's usage complies with this license. This is a crucial point to address.
Relevant Links:
Conclusion:
Adding support for HiP Attention would be a significant enhancement to Ollama, enabling users to work with much longer contexts more efficiently. This would open up new possibilities for using LLMs in various applications that require processing large amounts of text. I believe this feature would be highly valuable to the Ollama community.
Beta Was this translation helpful? Give feedback.
All reactions