This repo contains the codebase for the paper Exploiting Sparsity for Long Context Inference
[!NOTE] This codebase is currently under construction and the API is subject to large changes.
To run RULER, first download the necessary data:
cd benchmark/ruler
cd data/synthetic/json/ && python -u download_paulgraham_essay.py && bash download_qa_dataset.sh && cd ../../../
Then execute the script:
bash run.sh llama-3-8b-1048k ivf 32768 niah_single_1 128 3
The parameters of the script are:
-model_name
-index_type
-context_length
-task
-k
-num_samples
To run tests, ensure that pytest is installed with pip install pytest
. Once pytest is installed (and the package itself is installed) simply invoke the command pytest
from the top-level directory.
- Make AutoTopk able to be generated with instantiated model
- Import TopkCache directly, with multiple constructors
- Default k to full context if no k is given with a warning
- Make tests better and more isolated