Skip to content

LCLM-Horizon/A-Comprehensive-Survey-For-Long-Context-Language-Modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 

Repository files navigation

A Comprehensive Survey on Long Context Language Modeling 💡

LICENSE Awesome commit PR GitHub Repo stars

This repository provides a collection of papers and resources focused on Long Context Language Modeling. For a clear taxonomy and more insights about the methodology, you can refer to our survey: A Comprehensive Survey on Long Context Language Modeling with a overview shown below.

We appreciate any useful suggestions for improvement of this paper list or survey from peers and commit to regularly updating the repository.

If you would like to include your paper or any modifications in this survey and repository, please feel free to raise issues or send an email to dwzhu@pku.edu.cn or liujiaheng@nju.edu.cn or liaohuanxuan2023@ia.ac.cn. We sincerely appreciate your collaboration!

We would like to extend our sincere gratitude to Awesome-LLM-Long-Context-Modeling for providing valuable reference to support the expansion of this project and the development of the comprehensive scholarly survey.

We would also like to mention Thus Spake Long-Context Large Language Model (Github), a concurrent survey that details the development history of long-context LLMs. They've created a video with Thus Spake Zarathustra symphony to introduce LCLM-related work.

If you find our survey useful for your research, please consider citing the following paper:

@article{liu2025comprehensive,
  title={A Comprehensive Survey on Long Context Language Modeling},
  author={Liu, Jiaheng and Zhu, Dawei and Bai, Zhiqi and He, Yancheng and Liao, Huanxuan and Que, Haoran and Wang, Zekun and Zhang, Chenchen and Zhang, Ge and Zhang, Jiebin and others},
  journal={arXiv preprint arXiv:2503.17407},
  year={2025}
}

Updates

  • [2025.03.25] Our paper is finally out on arxiv.
  • [2025.03.13] We have a good communication with the authors of concurrent work, and will promote work of both parties in the future.
  • [2025.03.11] We release the first version of the survey on Long Context Language Modeling [lclm-survey.pdf] and opensource our repo.

Table of Contents

Paper List

Data

Pretraining

  1. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. J. Mach. Learn. Res. 2020

  2. Scaling Language Models: Methods, Analysis {&} Insights from Training Gopher. Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, H. Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po{-}Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor, Irina Higgins, Antonia Creswell, Nat McAleese, Amy Wu, Erich Elsen, Siddhant M. Jayakumar, Elena Buchatskaya, David Budden, Esme Sutherland, Karen Simonyan, Michela Paganini, Laurent Sifre, Lena Martens, Xiang Lorraine Li, Adhiguna Kuncoro, Aida Nematzadeh, Elena Gribovskaya, Domenic Donato, Angeliki Lazaridou, Arthur Mensch, Jean{-}Baptiste Lespiau, Maria Tsimpoukelli, Nikolai Grigorev, Doug Fritz, Thibault Sottiaux, Mantas Pajarskas, Toby Pohlen, Zhitao Gong, Daniel Toyama, Cyprien de Masson d'Autume, Yujia Li, Tayfun Terzi, Vladimir Mikulik, Igor Babuschkin, Aidan Clark, Diego de Las Casas, Aurelia Guy, Chris Jones, James Bradbury, Matthew J. Johnson, Blake A. Hechtman, Laura Weidinger, Iason Gabriel, William Isaac, Edward Lockhart, Simon Osindero, Laura Rimell, Chris Dyer, Oriol Vinyals, Kareem Ayoub, Jeff Stanway, Lorrayne Bennett, Demis Hassabis, Koray Kavukcuoglu, Geoffrey Irving. Arxiv 2021

  3. Structured Packing in LLM Training Improves Long Context Utilization. Konrad Staniszewski, Szymon Tworkowski, Sebastian Jaszczur, Henryk Michalewski, Łukasz Kuciński, Piotr Miłoś. Arxiv 2024.

  4. SemDeDup: Data-efficient learning at web-scale through semantic deduplication. Amro Abbas, Kushal Tirumala, Daniel Simig, Surya Ganguli, Ari S. Morcos. Arxiv 2023

  5. {SlimPajama: A 627B token cleaned and deduplicated version of RedPajama}. Daria Soboleva, Faisal Al-Khateeb, Robert Myers, Jacob R Steeves, Joel Hestness, Nolan Dey. Arxiv 2023

  6. In-Context Pretraining: Language Modeling Beyond Document Boundaries. Weijia Shi, Sewon Min, Maria Lomeli, Chunting Zhou, Margaret Li, Xi Victoria Lin, Noah A. Smith, Luke Zettlemoyer, Wen-tau Yih, Mike Lewis. ICLR 2024 Spotlight.         GitHub Repo stars

  7. Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance. Jiasheng Ye, Peiju Liu, Tianxiang Sun, Yunhua Zhou, Jun Zhan, Xipeng Qiu. Arxiv 2024

  8. Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models. Longze Chen, Ziqiang Liu, Wanwei He, Yunshui Li, Run Luo, Min Yang. Arxiv 2024.         GitHub Repo stars

  9. {L}ong{W}anjuan: Towards Systematic Measurement for Long Text Quality. Xiaoran Liu, Kai Lv, Qipeng Guo, Hang Yan, Conghui He, Xipeng Qiu, Dahua Lin. ACL 2024

  10. Map-neo: Highly capable and transparent bilingual large language model series. Ge Zhang, Scott Qu, Jiaheng Liu, Chenchen Zhang, Chenghua Lin, Chou Leuang Yu, Danny Pan, Esther Cheng, Jie Liu, Qunshu Lin, others. Arxiv 2024

  11. Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model. Chaochen Gao, Xing Wu, Qi Fu, Songlin Hu. Arxiv 2024.

  12. Data Engineering for Scaling Language Models to 128K Context. Yao Fu, Rameswar Panda, Xinyao Niu, Xiang Yue, Hannaneh Hajishirzi, Yoon Kim, Hao Peng. Arxiv 2024.         GitHub Repo stars

  13. RegMix: Data Mixture as Regression for Language Model Pre-training. Qian Liu, Xiaosen Zheng, Niklas Muennighoff, Guangtao Zeng, Longxu Dou, Tianyu Pang, Jing Jiang, Min Lin. Arxiv 2024

  14. How to Train Long-Context Language Models (Effectively). Tianyu Gao, Alexander Wettig, Howard Yen, Danqi Chen. Arxiv 2024.         GitHub Repo stars

  15. LongAttn: Selecting Long-context Training Data via Token-level Attention. Longyun Wu, Dawei Zhu, Guangxiang Zhao, Zhuocheng Yu, Junfeng Ran, Xiangyu Wong, Lin Sun, Sujian Li. Arxiv 2025.         GitHub Repo stars

  16. Untie the Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language Models. Junfeng Tian, Da Zheng, Yang Cheng, Rui Wang, Colin Zhang, Debing Zhang. Arxiv 2024.         GitHub Repo stars

Posttraining

  1. The {N}arrative{QA} Reading Comprehension Challenge. Tom{'a}{\v{s}} Ko{\v{c}}isk{'y}, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, G{'a}bor Melis, Edward Grefenstette. ACL 2018

  2. Training language models to follow instructions with human feedback. Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke E. Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Francis Christiano, Jan Leike, Ryan J. Lowe. Arxiv 2022

  3. {SlimPajama: A 627B token cleaned and deduplicated version of RedPajama}. Daria Soboleva, Faisal Al-Khateeb, Robert Myers, Jacob R Steeves, Joel Hestness, Nolan Dey. Arxiv 2023

  4. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn. Arxiv 2023

  5. WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models. Conghui He, Zhenjiang Jin, Chaoxi Xu, Jiantao Qiu, Bin Wang, Wei Li, Hang Yan, Jiaqi Wang, Da Lin. Arxiv 2023

  6. {L}ong{W}anjuan: Towards Systematic Measurement for Long Text Quality. Xiaoran Liu, Kai Lv, Qipeng Guo, Hang Yan, Conghui He, Xipeng Qiu, Dahua Lin. ACL 2024

  7. LOGO--Long cOntext aliGnment via efficient preference Optimization. Zecheng Tang, Zechen Sun, Juntao Li, Qiaoming Zhu, Min Zhang. Arxiv 2024

  8. {L}ong{A}lign: A Recipe for Long Context Alignment of Large Language Models. Yushi Bai, Xin Lv, Jiajie Zhang, Yuze He, Ji Qi, Lei Hou, Jie Tang, Yuxiao Dong, Juanzi Li. ACL 2024

  9. What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices. Zhi Chen, Qiguang Chen, Libo Qin, Qipeng Guo, Haijun Lv, Yicheng Zou, Wanxiang Che, Hang Yan, Kai Chen, Dahua Lin. Arxiv 2024.         GitHub Repo stars

  10. Weaver: Foundation Models for Creative Writing. Tiannan Wang, Jiamin Chen, Qingrui Jia, Shuai Wang, Ruoyu Fang, Huilin Wang, Zhaowei Gao, Chunzhao Xie, Chuou Xu, Jihong Dai, Yibin Liu, Jialong Wu, Shengwei Ding, Long Li, Zhiwei Huang, Xinle Deng, Teng Yu, Gangan Ma, Han Xiao, Zixin Chen, Danjun Xiang, Yunxia Wang, Yuanyuan Zhu, Yi Xiao, Jing Wang, Yiru Wang, Siran Ding, Jiayang Huang, Jiayi Xu, Yilihamu Tayier, Zhenyu Hu, Yuan Gao, Chengfeng Zheng, Yueshu Ye, Yihang Li, Lei Wan, Xinyue Jiang, Yujie Wang, Siyu Cheng, Zhule Song, Xiangru Tang, Xiaohua Xu, Ningyu Zhang, Huajun Chen, Yuchen Eleanor Jiang, Wangchunshu Zhou. Arxiv 2024

  11. LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs. Yushi Bai, Jiajie Zhang, Xin Lv, Linzhi Zheng, Siqi Zhu, Lei Hou, Yuxiao Dong, Jie Tang, Juanzi Li. Arxiv 2024.         GitHub Repo stars

  12. LongReward: Improving Long-context Large Language Models with AI Feedback. Jiajie Zhang, Zhongni Hou, Xin Lv, Shulin Cao, Zhenyu Hou, Yilin Niu, Lei Hou, Yuxiao Dong, Ling Feng, Juanzi Li. Arxiv 2024

  13. ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities. Peng Xu, Wei Ping, Xianchao Wu, Zihan Liu, Mohammad Shoeybi, Bryan Catanzaro. Arxiv 2024.

  14. LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models. Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia. ICLR 2024 Oral.         GitHub Repo stars

  15. {ORPO}: Monolithic Preference Optimization without Reference Model. Jiwoo Hong, Noah Lee, James Thorne. EMNLP 2024

  16. Never Lost in the Middle: Mastering Long-Context Question Answering with Position-Agnostic Decompositional Training. Junqing He, Kunhao Pan, Xiaoqun Dong, Zhuoyang Song, LiuYiBo LiuYiBo, Qianguosun Qianguosun, Yuxin Liang, Hao Wang, Enming Zhang, Jiaxing Zhang. ACL 2024

  17. Make Your {LLM} Fully Utilize the Context. Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou, Weizhu Chen. NeurIPS 2024

  18. LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information. Bowen Ping, Jiali Zeng, Fandong Meng, Shuo Wang, Jie Zhou, Shanghang Zhang. Arxiv 2025.

  19. LongAttn: Selecting Long-context Training Data via Token-level Attention. Longyun Wu, Dawei Zhu, Guangxiang Zhao, Zhuocheng Yu, Junfeng Ran, Xiangyu Wong, Lin Sun, Sujian Li. Arxiv 2025.         GitHub Repo stars

  20. LongFaith: Enhancing Long-Context Reasoning in LLMs with Faithful Synthetic Data. Cehao Yang, Xueyuan Lin, Chengjin Xu, Xuhui Jiang, Shengjie Ma, Aofan Liu, Hui Xiong, Jian Guo. Arxiv 2025.

Model

Position Embeddings

  1. An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding. Tong Wu, Yanpeng Zhao, Zilong Zheng. NeurIPS 2024. GitHub Repo stars

  2. PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training. Dawei Zhu,Nan Yang,Liang Wang,Yifan Song,Wenhao Wu,Furu Wei,Sujian Li. Arxiv 2023. GitHub Repo stars

  3. Contextual Position Encoding: Learning to Count What's Important. Olga Golovneva, Tianlu Wang, Jason Weston, Sainbayar Sukhbaatar. Arxiv 2024.

  4. Why Does the Effective Context Length of LLMs Fall Short?. Chenxin An, Jun Zhang, Ming Zhong, Lei Li, Shansan Gong, Yao Luo, Jingjing Xu, Lingpeng Kong. Arxiv 2024.

  5. HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation. Yuhan Chen, Ang Lv, Jian Luan, Bin Wang, Wei Liu. Arxiv 2024.

  6. DAPE: Data-Adaptive Positional Encoding for Length Extrapolation. Chuanyang Zheng, Yihang Gao, Han Shi, Minbin Huang, Jingyao Li, Jing Xiong, Xiaozhe Ren, Michael Ng, Xin Jiang, Zhenguo Li, Yu Li. NeurIPS 2024. GitHub Repo stars

  7. Convolutional sequence to sequence learning. Jonas Gehring and Michael Auli and David Grangier and Denis Yarats and Yann N. Dauphin. Arxiv 2017

  8. Self-attention with relative position representations. Peter Shaw and Jakob Uszkoreit and Ashish Vaswani. Arxiv 2018

  9. Encoding word order in complex embeddings. Benyou Wang and Donghao Zhao and Christina Lioma and Qiuchi Li and Peng Zhang and Jakob Grue Simonsen. Arxiv 2020

  10. Train short, test long: Attention with linear biases enables input length extrapolation. Ofir Press and Noah A. Smith and Mike Lewis. Arxiv 2022

  11. Kerple: Kernelized relative positional embedding for length extrapolation. Ta-Chung Chi and Ting-Han Fan and Peter J. Ramadge and Alexander I. Rudnicky. Arxiv 2022

  12. Dissecting transformer length extrapolation via the lens of receptive field analysis. Ta-Chung Chi and Ting-Han Fan and Alexander I. Rudnicky and Peter J. Ramadge. Arxiv 2023

  13. A length-extrapolatable transformer. Yutao Sun and Li Dong and Barun Patra and Shuming Ma and Shaohan Huang and Alon Benhaim and Vishrav Chaudhary and Xia Song and Furu Wei. Arxiv 2022

  14. Functional interpolation for relative positions improves long context transformers. Shanda Li and Chong You and Guru Guruganesh and Joshua Ainslie and Santiago Ontanon and Manzil Zaheer and Sumit Sanghai and Yiming Yang and Sanjiv Kumar and Srinadh Bhojanapalli. Arxiv 2024

  15. Latent positional information is in the self-attention variance of transformer language models without positional embeddings. Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings. Arxiv 2023

  16. Extending context window of large language models via positional interpolation. Shouyuan Chen and Sherman Wong and Liangjian Chen and Yuandong Tian. Arxiv 2023

  17. Randomized positional encodings boost length generalization of transformers. Anian Ruoss and Grégoire Delétang and Tim Genewein and Jordi Grau-Moya and Róbert Csordás and Mehdi Bennani and Shane Legg and Joel Veness. Arxiv 2023

  18. Yarn: Efficient context window extension of large language models. Bowen Peng and Jeffrey Quesnelle and Honglu Fan and Enrico Shippole. Arxiv 2023

  19. Clex: Continuous length extrapolation for large language models. Guanzheng Chen and Xin Li and Zaiqiao Meng and Shangsong Liang and Lidong Bing. Arxiv 2024

  20. Effective long-context scaling of foundation models. Wenhan Xiong and Jingyu Liu and Igor Molybog and Hejia Zhang and Prajjwal Bhargava and Rui Hou and Louis Martin and Rashi Rungta and Karthik Abinav Sankararaman and Barlas Oguz and Madian Khabsa and Han Fang and Yashar Mehdad and Sharan Narang and Kshitiz Malik and Angela Fan and Shruti Bhosale and Sergey Edunov and Mike Lewis and Sinong Wang and Hao Ma. Arxiv 2023

  21. Giraffe: Adventures in expanding context lengths in llms. Arka Pal and Deep Karkhanis and Manley Roberts and Samuel Dooley and Arvind Sundararajan and Siddartha Naidu. Arxiv 2023

  22. Resonance rope: Improving context length generalization of large language models. Suyuchen Wang and Ivan Kobyzev and Peng Lu and Mehdi Rezagholizadeh and Bang Liu. Arxiv 2024

  23. Long context alignment with short instructions and synthesized positions. Wenhao Wu and Yizhong Wang and Yao Fu and Xiang Yue and Dawei Zhu and Sujian Li. Arxiv 2024

  24. Two stones hit one bird: Bilevel positional encoding for better length extrapolation. Zhenyu He and Guhao Feng and Shengjie Luo and Kai Yang and Liwei Wang and Jingjing Xu and Zhi Zhang and Hongxia Yang and Di He. Arxiv 2024

  25. Found in the middle: How language models use long contexts better via plug-and-play positional encoding. Zhenyu Zhang and Runjin Chen and Shiwei Liu and Zhewei Yao and Olatunji Ruwase and Beidi Chen and Xiaoxia Wu and Zhangyang Wang. Arxiv 2024

  26. Llm maybe longlm: Self-extend llm context window without tuning. Hongye Jin and Xiaotian Han and Jingfeng Yang and Zhimeng Jiang and Zirui Liu and Chia-Yuan Chang and Huiyuan Chen and Xia Hu. Arxiv 2024

  27. Longrope: Extending llm context window beyond 2 million tokens. Yiran Ding and Li Lyna Zhang and Chengruidong Zhang and Yuanyuan Xu and Ning Shang and Jiahang Xu and Fan Yang and Mao Yang. Arxiv 2024

  28. The impact of positional encoding on length generalization in transformers. Amirhossein Kazemnejad and Inkit Padhi and Karthikeyan Natesan Ramamurthy and Payel Das and Siva Reddy. Arxiv 2024

  29. Roformer: Enhanced transformer with rotary position embedding. Jianlin Su and Yu Lu and Shengfeng Pan and Ahmed Murtadha and Bo Wen and Yunfeng Liu. Arxiv 2023

  30. Training-free long-context scaling of large language models. Chenxin An and Fei Huang and Jun Zhang and Shansan Gong and Xipeng Qiu and Chang Zhou and Lingpeng Kong. Arxiv 2024

  31. PSC: Extending Context Window of Large Language Models via Phase Shift Calibration. Wenqiao Zhu and Chao Xu and Lulu Wang and Jun Wu. EMNLP 2024. GitHub Repo stars

  32. Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models. Zhisong Zhang, Yan Wang, Xinting Huang, Tianqing Fang, Hongming Zhang, Chenlong Deng, Shuaiyi Li, Dong Yu. Arxiv 2024.

  33. DCIS: Efficient Length Extrapolation of LLMs via Divide-and-Conquer Scaling Factor Search. Lei Yang, Shaoyang Xu, Deyi Xiong. Arxiv 2024.

  34. Adjoint sharding for very long context training of state space models. Xingzi Xu, Amir Tavanaei, Kavosh Asadi, Karim Bouyarmane. Arxiv 2025.

  35. Information Entropy Invariance: Enhancing Length Extrapolation in Attention Mechanisms. Kewei Li, Yanwen Kong, Yiping Xu, Lan Huang, Ruochi Zhang, Fengfeng Zhou. Arxiv 2025. GitHub Repo stars

  36. LeMo: Enabling LEss Token Involvement for MOre Context Fine-tuning. Tuowei Wang, Xingyu Chen, Kun Li, Ting Cao, Ju Ren, Yaoxue Zhang. Arxiv 2025.

  37. NExtLong: Toward Effective Long-Context Training without Long Documents. Chaochen Gao, Xing Wu, Zijia Lin, Debing Zhang, Songlin Hu. Arxiv 2025. GitHub Repo stars

  38. SEAL: Scaling to Emphasize Attention for Long-Context Retrieval. Changhun Lee, Jun-gyu Jin, Younghyun Cho, Eunhyeok Park. Arxiv 2025.

  39. DINT Transformer. Yueyang Cang, Yuhang Liu, Xiaoteng Zhang, Erlu Zhao, Li Shi. Arxiv 2025.

  40. Scalable-Softmax Is Superior for Attention. Ken M. Nakanishi. Arxiv 2025.

  41. Rope to Nope and Back Again: A New Hybrid Attention Strategy. Bowen Yang, Bharat Venkitesh, Dwarak Talupuru, Hangyu Lin, David Cairuz, Phil Blunsom, Acyr Locatelli. Arxiv 2025.

  42. A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI). Yan Li, Tianyi Zhang, Zechuan Li, Soyeon Caren Han. Arxiv 2025. GitHub Repo stars

  43. LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation. Zican Dong, Junyi Li, Jinhao Jiang, Mingyu Xu, Wayne Xin Zhao, Bingning Wang, Weipeng Chen. Arxiv 2025.

  44. Unveiling Simplicities of Attention: Adaptive Long-Context Head Identification. Konstantin Donhauser, Charles Arnal, Mohammad Pezeshki, Vivien Cabannes, David Lopez-Paz, Kartik Ahuja. Arxiv 2025.

  45. The Rotary Position Embedding May Cause Dimension Inefficiency in Attention Heads for Long-Distance Retrieval. Ting-Rui Chiang, Dani Yogatama. Arxiv 2025.

  46. LongFaith: Enhancing Long-Context Reasoning in LLMs with Faithful Synthetic Data. Cehao Yang, Xueyuan Lin, Chengjin Xu, Xuhui Jiang, Shengjie Ma, Aofan Liu, Hui Xiong, Jian Guo. Arxiv 2025. GitHub Repo stars

  47. LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization. Guanzheng Chen, Xin Li, Michael Qizhe Shieh, Lidong Bing. Arxiv 2025. GitHub Repo stars

  48. ParallelComp: Parallel Long-Context Compressor for Length Extrapolation. Jing Xiong, Jianghan Shen, Chuanyang Zheng, Zhongwei Wan, Chenyang Zhao, Chiwun Yang, Fanghua Ye, Hongxia Yang, Lingpeng Kong, Ngai Wong. Arxiv 2025.

  49. Generalizing From Short to Long: Effective Data Synthesis for Long-Context Instruction Tuning. Wenhao Zhu, Pinzhen Chen, Hanxu Hu, Shujian Huang, Fei Yuan, Jiajun Chen, Alexandra Birch. Arxiv 2025. GitHub Repo stars

  50. LongAttn: Selecting Long-context Training Data via Token-level Attention. Longyun Wu, Dawei Zhu, Guangxiang Zhao, Zhuocheng Yu, Junfeng Ran, Xiangyu Wong, Lin Sun, Sujian Li. Arxiv 2025. GitHub Repo stars

  51. WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale. Jiaxi Li, Xingxing Zhang, Xun Wang, Xiaolong Huang, Li Dong, Liang Wang, Si-Qing Chen, Wei Lu, Furu Wei. Arxiv 2025.

  52. Sliding Window Attention Training for Efficient Large Language Models. Zichuan Fu, Wentao Song, Yejing Wang, Xian Wu, Yefeng Zheng, Yingying Zhang, Derong Xu, Xuetao Wei, Tong Xu, Xiangyu Zhao. Arxiv 2025. GitHub Repo stars

  53. LongRoPE2: Near-Lossless LLM Context Window Scaling. Ning Shang, Li Lyna Zhang, Siyuan Wang, Gaokai Zhang, Gilsinia Lopez, Fan Yang, Weizhu Chen, Mao Yang. Arxiv 2025. GitHub Repo stars

  54. ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs. Hao Ge, Junda Feng, Qi Huang, Fangcheng Fu, Xiaonan Nie, Lei Zuo, Haibin Lin, Bin Cui, Xin Liu. Arxiv 2025.

  55. Pause-Tuning for Long-Context Comprehension: A Lightweight Approach to LLM Attention Recalibration. James Begin, Namit Agrawal, Eshan Singh, Yicheng Fu, Sean O'Brien, Vasu Sharma, Kevin Zhu. Arxiv 2025. GitHub Repo stars

  56. LADM: Long-context Training Data Selection with Attention-based Dependency Measurement for LLMs. Jianghao Chen, Junhong Wu, Yangyifan Xu, Jiajun Zhang. Arxiv 2025.

  57. Forgetting Transformer: Softmax Attention with a Forget Gate. Zhixuan Lin, Evgenii Nikishin, Xu Owen He, Aaron Courville. ICLR 2025. GitHub Repo stars

  58. Layer-Specific Scaling of Positional Encodings for Superior Long-Context Modeling. Zhenghua Wang, Yiran Ding, Changze Lv, Zhibo Xu, Tianlong Li, Tianyuan Shi, Xiaoqing Zheng, Xuanjing Huang. Arxiv 2025.

  59. Token Weighting for Long-Range Language Modeling. Falko Helm, Nico Daheim, Iryna Gurevych. NAACL 2025. GitHub Repo stars

  60. From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models. Chejian Xu, Wei Ping, Peng Xu, Zihan Liu, Boxin Wang, Mohammad Shoeybi, Bo Li, Bryan Catanzaro. Arxiv 2025. Static Badge

  61. SWAN-GPT: An Efficient and Scalable Approach for Long-Context Language Modeling. Krishna C. Puvvada, Faisal Ladhak, Santiago Akle Serrano, Cheng-Ping Hsieh, Shantanu Acharya, Somshubra Majumdar, Fei Jia, Samuel Kriman, Simeng Sun, Dima Rekesh, Boris Ginsburg. Arxiv 2025.

  62. Scaling Instruction-Tuned LLMs to Million-Token Contexts via Hierarchical Synthetic Data Generation. Linda He, Jue Wang, Maurice Weber, Shang Zhu, Ben Athiwaratkun, Ce Zhang. Arxiv 2025.

  63. Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation. Yi Lu, Wanxu Zhao, Xin Zhou, Chenxin An, Chenglong Wang, Shuo Li, Yuming Yang, Jun Zhao, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang. Arxiv 2025. GitHub Repo stars

  64. Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length Extrapolation. Arthur S. Bianchessi, Rodrigo C. Barros, Lucas S. Kupssinskü. Arxiv 2025. GitHub Repo stars

Architecture

  1. Compressive Transformers for Long-Range Sequence Modelling. Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Timothy P. Lillicrap. Arxiv 2019. GitHub Repo stars

  2. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention. Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret. ICML 2020. GitHub Repo stars

  3. Block-Recurrent Transformers. DeLesley Hutchins, Imanol Schlag, Yuhuai Wu, Ethan Dyer, Behnam Neyshabur. Arxiv 2023. GitHub Repo stars

  4. Memorizing Transformers. Yuhuai Wu, Markus N. Rabe, DeLesley Hutchins, Christian Szegedy. Arxiv 2022. GitHub Repo stars

  5. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, Sumit Sanghai. Arxiv 2023.

  6. Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention. Kaiqiang Song, Xiaoyang Wang, Sangwoo Cho, Xiaoman Pan, Dong Yu. Arxiv 2023.

  7. Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention. Tsendsuren Munkhdalai, Manaal Faruqui, Siddharth Gopal. Arxiv 2024.

  8. Weighted Grouped Query Attention in Transformers. Sai Sena Chinnakonduru, Astarag Mohapatra. Arxiv 2024.

  9. Associative Recurrent Memory Transformer. Ivan Rodkin, Yuri Kuratov, Aydar Bulatov, Mikhail Burtsev. ICML 2024 Workshop. GitHub Repo stars

  10. Simple linear attention language models balance the recall-throughput tradeoff. Simran Arora, Sabri Eyuboglu, Michael Zhang, Aman Timalsina, Silas Alberti, Dylan Zinsley, James Zou, Atri Rudra, Christopher Ré. Arxiv 2024. GitHub Repo stars

  11. DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads. Guangxuan Xiao, Jiaming Tang, Jingwei Zuo, Junxian Guo, Shang Yang, Haotian Tang, Yao Fu, Song Han. Arxiv 2024. GitHub Repo stars

  12. TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention. Lijie Yang, Zhihao Zhang, Zhuofu Chen, Zikun Li, Zhihao Jia. Arxiv 2024. GitHub Repo stars

  13. Selective Attention Improves Transformer. Yaniv Leviathan, Matan Kalman, Yossi Matias. Arxiv 2024.

  14. SnapKV: LLM Knows What You are Looking for Before Generation. Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, Deming Chen. Arxiv 2024. GitHub Repo stars

  15. Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures. Evan Lucas, Dylan Kangas, Timothy C Havens. Arxiv 2024.

  16. An Empirical Study of Mamba-based Language Models. Roger Waleffe, Wonmin Byeon, Duncan Riach, Brandon Norick, Vijay Korthikanti, Tri Dao, Albert Gu, Ali Hatamizadeh, Sudhakar Singh, Deepak Narayanan, Garvit Kulshreshtha, Vartika Singh, Jared Casper, Jan Kautz, Mohammad Shoeybi, Bryan Catanzaro. Arxiv 2024. GitHub Repo stars

  17. Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models. Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong. Arxiv 2024. GitHub Repo stars

  18. Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention. Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong. Arxiv 2024.

  19. SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs. Yizhao Gao, Zhichen Zeng, Dayou Du, Shijie Cao, Hayden Kwok-Hay So, Ting Cao, Fan Yang, Mao Yang. Arxiv 2024. GitHub Repo stars

  20. Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling. Yingfa Chen, Xinrong Zhang, Shengding Hu, Xu Han, Zhiyuan Liu, Maosong Sun. Arxiv 2024. GitHub Repo stars

  21. Taipan: Efficient and Expressive State Space Language Models with Selective Attention. Chien Van Nguyen, Huy Huu Nguyen, Thang M. Pham, Ruiyi Zhang, Hanieh Deilamsalehy, Puneet Mathur, Ryan A. Rossi, Trung Bui, Viet Dac Lai, Franck Dernoncourt, Thien Huu Nguyen. Arxiv 2024.

  22. Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length. Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou. Arxiv 2024. ![GitHub Repo stars](https://img.shields.io/github/stars/XuezheMax/megalodon

  23. Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling. Liliang Ren, Yang Liu, Yadong Lu, Yelong Shen, Chen Liang, Weizhu Chen. Arxiv 2024. GitHub Repo stars

  24. ReMamba: Equip Mamba with Effective Long-Sequence Modeling. Danlong Yuan, Jiahao Liu, Bei Li, Huishuai Zhang, Jingang Wang, Xunliang Cai, Dongyan Zhao. Arxiv 2024.

  25. Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention. Jingyang Yuan, Huazuo Gao, Damai Dai, Junyu Luo, Liang Zhao, Zhengyan Zhang, Zhenda Xie, Y. X. Wei, Lean Wang, Zhiping Xiao, Yuqing Wang, Chong Ruan, Ming Zhang, Wenfeng Liang, Wangding Zeng. Arxiv 2025.

  26. MoBA: Mixture of Block Attention for Long-Context LLMs. Enzhe Lu, Zhejun Jiang, Jingyuan Liu, Yulun Du, Tao Jiang, Chao Hong, Shaowei Liu, Weiran He, Enming Yuan, Yuzhi Wang, Zhiqi Huang, Huan Yuan, Suting Xu, Xinran Xu, Guokun Lai, Yanru Chen, Huabin Zheng, Junjie Yan, Jianlin Su, Yuxin Wu, Neo Y. Zhang, Zhilin Yang, Xinyu Zhou, Mingxing Zhang, Jiezhong Qiu. Arxiv 2025. GitHub Repo stars

  27. MiniMax-01: Scaling Foundation Models with Lightning Attention. MiniMax, Aonian Li, Bangwei Gong, Bo Yang, Boji Shan, Chang Liu, Cheng Zhu, Chunhao Zhang, Congchao Guo, Da Chen, Dong Li, Enwei Jiao, Gengxin Li, Guojun Zhang, Haohai Sun, Houze Dong, Jiadai Zhu, Jiaqi Zhuang, Jiayuan Song, Jin Zhu, Jingtao Han, Jingyang Li, Junbin Xie, Junhao Xu, Junjie Yan, Kaishun Zhang, Kecheng Xiao, Kexi Kang, Le Han, Leyang Wang, Lianfei Yu, Liheng Feng, Lin Zheng, Linbo Chai, Long Xing, Meizhi Ju, Mingyuan Chi, Mozhi Zhang, Peikai Huang, Pengcheng Niu, Pengfei Li, Pengyu Zhao, Qi Yang, Qidi Xu, Qiexiang Wang, Qin Wang, Qiuhui Li, Ruitao Leng, Shengmin Shi, Shuqi Yu, Sichen Li, Songquan Zhu, Tao Huang, Tianrun Liang, Weigao Sun, Weixuan Sun, Weiyu Cheng, Wenkai Li, Xiangjun Song, Xiao Su, Xiaodong Han, Xinjie Zhang, Xinzhu Hou, Xu Min, Xun Zou, Xuyang Shen, Yan Gong, Yingjie Zhu, Yipeng Zhou, Yiran Zhong, Yongyi Hu, Yuanxiang Fan, Yue Yu, Yufeng Yang, Yuhao Li, Yunan Huang, Yunji Li, Yunpeng Huang, Yunzhi Xu, Yuxin Mao, Zehan Li, Zekang Li, Zewei Tao, Zewen Ying, Zhaoyang Cong, Zhen Qin, Zhenhua Fan, Zhihang Yu, Zhuo Jiang, Zijia Wu. Arxiv 2025. GitHub Repo stars

  28. Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks. Jongho Park and Jaeseung Park and Zheyang Xiong and Nayoung Lee and Jaewoong Cho and Samet Oymak and Kangwook Lee and Dimitris Papailiopoulos. Arxiv 2024

  29. A new approach to linear filtering and prediction problems. Basar, Tamer. IEEE 2001

  30. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter. Arxiv 2016

  31. Neural Discrete Representation Learning. Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu. Arxiv 2018

  32. Improving spiking dynamical networks: Accurate delays, higher-order synapses, and time cells. Voelker, Aaron R and Eliasmith, Chris. IEEE 2018

  33. Improving language understanding by generative pre-training. Radford, Alec and Narasimhan, Karthik and Salimans, Tim and Sutskever, Ilya and others. mikecaptain 2018

  34. Memformer: The Memory-Augmented Transformer. Qingyang Wu, Zhenzhong Lan, Jing Gu, Zhou Yu. Arxiv 2020

  35. Linformer: Self-Attention with Linear Complexity. Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, Hao Ma. Arxiv 2020

  36. Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers. Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, Christopher R{'{e}}. Arxiv 2021

  37. Nystr"omformer: A Nystr"om-Based Algorithm for Approximating Self-Attention. Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh. Arxiv 2021

  38. Efficient attention: Attention with linear complexities. Zhuoran Shen and Mingyuan Zhang and Haiyu Zhao and Shuai Yi and Hongsheng Li. Arxiv 2024

  39. ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through Regularized Self-Attention. Yang Liu, Jiaxiang Liu, Li Chen, Yuxiang Lu, Shikun Feng, Zhida Feng, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang. Arxiv 2022

  40. cosFormer: Rethinking Softmax in Attention. Zhen Qin, Weixuan Sun, Hui Deng, Dongxu Li, Yunshen Wei, Baohong Lv, Junjie Yan, Lingpeng Kong, Yiran Zhong. Arxiv 2022

  41. Rethinking Attention with Performers. Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller. Arxiv 2022

  42. Multi-head state space model for speech recognition. Yassir Fathullah and Chunyang Wu and Yuan Shangguan and Junteng Jia and Wenhan Xiong and Jay Mahadeokar and Chunxi Liu and Yangyang Shi and Ozlem Kalinli and Mike Seltzer and Mark J. F. Gales. Arxiv 2023

  43. Attention Is All You Need. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. Arxiv 2023

  44. Retentive Network: A Successor to Transformer for Large Language Models. Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei. Arxiv 2023

  45. Scaling Transformer to 1M tokens and beyond with {RMT}. Aydar Bulatov and Yuri Kuratov and Yermek Kapushev and Mikhail S. Burtsev. Arxiv 2024

  46. FLatten Transformer: Vision Transformer using Focused Linear Attention. Dongchen Han, Xuran Pan, Yizeng Han, Shiji Song, Gao Huang. Arxiv 2023

  47. TRAMS: Training-free Memory Selection for Long-range Language Modeling. Haofei Yu and Cunxiang Wang and Yue Zhang and Wei Bi. Arxiv 2023

  48. Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model. Yinghan Long and Sayeed Shafayet Chowdhury and Kaushik Roy. Arxiv 2023

  49. Transformer-VQ: Linear-Time Transformers via Vector Quantization. Lucas D. Lingle. Arxiv 2024

  50. Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality. Tri Dao and Albert Gu. Arxiv 2024

  51. Block-state transformers. Mahan Fathi and Jonathan Pilault and Orhan Firat and Christopher Pal and Pierre-Luc Bacon and Ross Goroshin. Arxiv 2023

  52. Extensible Embedding: {A} Flexible Multipler For LLM's Context Length. Ninglu Shao and Shitao Xiao and Zheng Liu and Peitian Zhang. Arxiv 2024

  53. DeciMamba: Exploring the Length Extrapolation Potential of Mamba. Assaf Ben-Kish, Itamar Zimerman, Shady Abu-Hussein, Nadav Cohen, Amir Globerson, Lior Wolf, Raja Giryes. Arxiv 2024

  54. CORM: Cache Optimization with Recent Message for Large Language Model Inference. Jincheng Dai, Zhuowei Huang, Haiyun Jiang, Chen Chen, Deng Cai, Wei Bi, Shuming Shi. Arxiv 2024

  55. Longformer: The Long-Document Transformer. Iz Beltagy, Matthew E. Peters, Arman Cohan. Arxiv 2020. GitHub Repo stars

  56. Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs. Suyu Ge, Yunan Zhang, Liyuan Liu, Minjia Zhang, Jiawei Han, Jianfeng Gao. ICLR 2024 Oral.

  57. PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling. Zefan Cai., Yichi Zhang, Bofei Gao, Tianyu Liu, Keming Lu, Wayne Xiong, Yue Dong, Baobao Chang, Junjie Hu, Wen Xiao. Arxiv 2024.

  58. RazorAttention: Efficient KV Cache Compression Through Retrieval Heads. Hanlin Tang, Yang Lin, Jing Lin, Qingsen Han, Shikuan Hong, Yiwu Yao, Gongyi Wang. Arxiv 2024.

  59. Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning. Yu Fu, Zefan Cai, Abedelkadir Asi, Wayne Xiong, Yue Dong, Wen Xiao. Arxiv 2024.

  60. Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference. Jiaming Tang, Yilong Zhao, Kan Zhu, Guangxuan Xiao, Baris Kasikci, Song Han. ICML 2024. GitHub Repo stars

  61. Efficient Streaming Language Models with Attention Sinks. Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, Mike Lewis. Arxiv 2023. GitHub Repo stars

  62. PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference. William Brandon, Mayank Mishra, Aniruddha Nrusimha, Rameswar Panda, Jonathan Ragan Kelly. Arxiv 2024. GitHub Repo stars

  63. MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention. Huiqiang Jiang, Yucheng Li, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H. Abdi, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu. Arxiv 2024. GitHub Repo stars Static Badge

  64. LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference. Qichen Fu, Minsik Cho, Thomas Merth, Sachin Mehta, Mohammad Rastegari, Mahyar Najibi. Arxiv 2024.

  65. DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs. Xiabin Zhou, Wenbin Wang, Minyan Zeng, Jiaxian Guo, Xuebo Liu, Li Shen, Min Zhang, Liang Ding. Arxiv 2024.

  66. H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models. Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher R'{e}, Clark Barrett, Zhangyang "Atlas" Wang, Beidi Chen. Arxiv 2023

  67. Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time. Zichang Liu, Aditya Desai, Fangshuo Liao, Weitao Wang, Victor Xie, Zhaozhuo Xu, Anastasios Kyrillidis, Anshumali Shrivastava. Arxiv 2023

  68. Loki: Low-rank Keys for Efficient Sparse Attention. Prajwal Singhania, Siddharth Singh, Shwai He, Soheil Feizi, Abhinav Bhatele. Arxiv 2024

  69. LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models. Chi Han, Qifan Wang, Hao Peng, Wenhan Xiong, Yu Chen, Heng Ji, Sinong Wang. Arxiv 2024

  70. Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference. Yuan Feng, Junlin Lv, Yukun Cao, Xike Xie, S. Kevin Zhou. Arxiv 2025

  71. LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation. Xuan Zhang, Fengzhuo Zhang, Cunxiao Du, Chao Du, Tianyu Pang, Wei Gao, Min Lin. Arxiv 2025

  72. Hierarchical Attention Networks for Document Classification. Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alexander J. Smola, Eduard H. Hovy. Arxiv 2016

  73. Neural Tangent Kernel: Convergence and Generalization in Neural Networks. Arthur Jacot, Cl{'{e}}ment Hongler, Franck Gabriel. Arxiv 2018

  74. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G. Carbonell, Quoc Viet Le, Ruslan Salakhutdinov. Arxiv 2019

  75. {BERT}: Pre-training of Deep Bidirectional Transformers for Language Understanding. Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. Arxiv 2019

  76. HiPPO: Recurrent Memory with Optimal Polynomial Projections. Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, Christopher R{'{e}}. Arxiv 2020

  77. Language Models are Few-Shot Learners. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert{-}Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei. Arxiv 2020

  78. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. Arxiv 2020

  79. Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling. Chuhan Wu, Fangzhao Wu, Tao Qi, Yongfeng Huang. Arxiv 2021

  80. Nystr{"{o}}mformer: {A} Nystr{"{o}}m-based Algorithm for Approximating Self-Attention. Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh. Arxiv 2021

  81. {GLM}: General Language Model Pretraining with Autoregressive Blank Infilling. Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, Jie Tang. Arxiv 2022

  82. {OPT:} Open Pre-trained Transformer Language Models. Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona T. Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer. Arxiv 2022

  83. {BLOOM:} {A} 176B-Parameter Open-Access Multilingual Language Model. Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilic, Daniel Hesslow, Roman Castagn{'{e}}, Alexandra Sasha Luccioni, Fran{\c{c}}ois Yvon, Matthias Gall{'{e}}, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Beno{^{\i}}t Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan{-}Major, Iz Beltagy, Huu Nguyen, Lucile Saulnier, Samson Tan, Pedro Ortiz Suarez, Victor Sanh, Hugo Lauren{\c{c}}on, Yacine Jernite, Julien Launay, Margaret Mitchell, Colin Raffel, Aaron Gokaslan, Adi Simhi, Aitor Soroa, Alham Fikri Aji, Amit Alfassy, Anna Rogers, Ariel Kreisberg Nitzav, Canwen Xu, Chenghao Mou, Chris Emezue, Christopher Klamm, Colin Leong, Daniel van Strien, David Ifeoluwa Adelani, et al.. Arxiv 2022

  84. Efficiently Modeling Long Sequences with Structured State Spaces. Albert Gu, Karan Goel, Christopher R{'{e}}. Arxiv 2022

  85. NTK-ALiBi: Long Text Extrapolation of ALiBi Position Encoding through Interpolation. * *. Arxiv 2023

  86. LLaMA: Open and Efficient Foundation Language Models. Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie{-}Anne Lachaux, Timoth{'{e}}e Lacroix, Baptiste Rozi{`{e}}re, Naman Goyal, Eric Hambro, Faisal Azhar, Aur{'{e}}lien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. Arxiv 2023

  87. Position Interpolation Improves ALiBi Extrapolation. Faisal Al{-}Khateeb, Nolan Dey, Daria Soboleva, Joel Hestness. Arxiv 2023

  88. Efficient Prompting via Dynamic In-Context Learning. Wangchunshu Zhou, Yuchen Eleanor Jiang, Ryan Cotterell, Mrinmaya Sachan. Arxiv 2023

  89. {RWKV:} Reinventing RNNs for the Transformer Era. Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Stella Biderman, Huanqi Cao, Xin Cheng, Michael Chung, Leon Derczynski, Xingjian Du, Matteo Grella, Kranthi Kiran GV, Xuzheng He, Haowen Hou, Przemyslaw Kazienko, Jan Kocon, Jiaming Kong, Bartlomiej Koptyra, Hayden Lau, Jiaju Lin, Krishna Sri Ipsit Mantri, Ferdinand Mom, Atsushi Saito, Guangyu Song, Xiangru Tang, Johan S. Wind, Stanislaw Wozniak, Zhenyuan Zhang, Qinghua Zhou, Jian Zhu, Rui{-}Jie Zhu. Arxiv 2023

  90. {GQA:} Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. Joshua Ainslie, James Lee{-}Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebr{'{o}}n, Sumit Sanghai. Arxiv 2023

  91. Baichuan 2: Open Large-scale Language Models. Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng Liu, Guangwei Ai, Guosheng Dong, Haizhou Zhao, Hang Xu, Haoze Sun, Hongda Zhang, Hui Liu, Jiaming Ji, Jian Xie, Juntao Dai, Kun Fang, Lei Su, Liang Song, Lifeng Liu, Liyun Ru, Luyao Ma, Mang Wang, Mickel Liu, MingAn Lin, Nuolan Nie, Peidong Guo, Ruiyang Sun, Tao Zhang, Tianpeng Li, Tianyu Li, Wei Cheng, Weipeng Chen, Xiangrong Zeng, Xiaochuan Wang, Xiaoxi Chen, Xin Men, Xin Yu, Xuehai Pan, Yanjun Shen, Yiding Wang, Yiyu Li, Youxin Jiang, Yuchen Gao, Yupeng Zhang, Zenan Zhou, Zhiying Wu. Arxiv 2023

  92. Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence. Bo Peng, Daniel Goldstein, Quentin Anthony, Alon Albalak, Eric Alcaide, Stella Biderman, Eugene Cheah, Xingjian Du, Teddy Ferdinan, Haowen Hou, Przemysław Kazienko, Kranthi Kiran GV, Jan Kocoń, Bartłomiej Koptyra, Satyapriya Krishna, Ronald McClelland Jr., Jiaju Lin, Niklas Muennighoff, Fares Obeid, Atsushi Saito, Guangyu Song, Haoqin Tu, Cahya Wirawan, Stanisław Woźniak, Ruichong Zhang, Bingchen Zhao, Qihang Zhao, Peng Zhou, Jian Zhu, Rui-Jie Zhu. Arxiv 2024

  93. Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use. Yuhan Chen, Ang Lv, Ting{-}En Lin, Changyu Chen, Yuchuan Wu, Fei Huang, Yongbin Li, Rui Yan. Arxiv 2024

  94. QUEST: Query-Aware Sparsity for Efficient Long-Context {LLM} Inference. Jiaming Tang, Yilong Zhao, Kan Zhu, Guangxuan Xiao, Baris Kasikci, Song Han. Arxiv 2024

  95. RecurrentGemma: Moving Past Transformers for Efficient Open Language Models. Aleksandar Botev, Soham De, Samuel L. Smith, Anushan Fernando, George{-}Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, L{'{e}}onard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivi{`{e}}re, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Armand Joulin, Noah Fiedel, Evan Senter, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, David Budden, Arnaud Doucet, Sharad Vikram, Adam Paszke, Trevor Gale, Sebastian Borgeaud, Charlie Chen, Andy Brock, Antonia Paterson, Jenny Brennan, Meg Risdal, Raj Gundluru, Nesh Devanathan, Paul Mooney, Nilay Chauhan, Phil Culliton, Luiz GUStavo Martins, Elisa Bandy, David Huntsperger, Glenn Cameron, Arthur Zucker, Tris Warkentin, Ludovic Peran, Minh Giang, Zoubin Ghahramani, Cl{'{e}}ment Farabet, Koray Kavukcuoglu, Demis Hassabis, Raia Hadsell, Yee Whye Teh, Nando de Frietas. Arxiv 2024

  96. SnapKV: LLM Knows What You are Looking for Before Generation. Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, Deming Chen. Arxiv 2024

  97. PyramidInfer: Pyramid {KV} Cache Compression for High-throughput {LLM} Inference. Dongjie Yang, Xiaodong Han, Yan Gao, Yao Hu, Shilin Zhang, Hai Zhao. Arxiv 2024

  98. HiRoPE: Length Extrapolation for Code Models Using Hierarchical Position. Kechi Zhang, Ge Li, Huangzhao Zhang, Zhi Jin. Arxiv 2024

  99. DAPE V2: Process Attention Score as Feature Map for Length Extrapolation. Chuanyang Zheng, Yihang Gao, Han Shi, Jing Xiong, Jiankai Sun, Jingyao Li, Minbin Huang, Xiaozhe Ren, Michael K. Ng, Xin Jiang, Zhenguo Li, Yu Li. Arxiv 2024

  100. LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models. Chi Han, Qifan Wang, Hao Peng, Wenhan Xiong, Yu Chen, Heng Ji, Sinong Wang. Arxiv 2024

  101. Model Tells You What to Discard: Adaptive {KV} Cache Compression for {LLM}s. Suyu Ge, Yunan Zhang, Liyuan Liu, Minjia Zhang, Jiawei Han, Jianfeng Gao. Arxiv 2024

  102. LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models. Zhiyuan Hu, Yuliang Liu, Jinman Zhao, Suyuchen Wang, Yan Wang, Wei Shen, Qing Gu, Anh Tuan Luu, See{-}Kiong Ng, Zhiwei Jiang, Bryan Hooi. Arxiv 2024

  103. LongHeads: Multi-Head Attention is Secretly a Long Context Processor. Yi Lu, Xin Zhou, Wei He, Jun Zhao, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang. Arxiv 2024

  104. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. Albert Gu, Tri Dao. Arxiv 2024

  105. DeepSeek-V2: {A} Strong, Economical, and Efficient Mixture-of-Experts Language Model. DeepSeek{-}AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, Hao Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Li, Hui Qu, J. L. Cai, Jian Liang, Jianzhong Guo, Jiaqi Ni, Jiashi Li, Jin Chen, Jingyang Yuan, Junjie Qiu, Junxiao Song, Kai Dong, Kaige Gao, Kang Guan, Lean Wang, Lecong Zhang, Lei Xu, Leyi Xia, Liang Zhao, Liyue Zhang, Meng Li, Miaojun Wang, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingming Li, Ning Tian, Panpan Huang, Peiyi Wang, Peng Zhang, Qihao Zhu, Qinyu Chen, Qiushi Du, R. J. Chen, R. L. Jin, Ruiqi Ge, Ruizhe Pan, Runxin Xu, Ruyi Chen, S. S. Li, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shaoqing Wu, Shengfeng Ye, Shirong Ma, Shiyu Wang, Shuang Zhou, Shuiping Yu, Shunfeng Zhou, Size Zheng, Tao Wang, Tian Pei, Tian Yuan, Tianyu Sun, W. L. Xiao, Wangding Zeng, Wei An, Wen Liu, Wenfeng Liang, Wenjun Gao, Wentao Zhang, X. Q. Li, Xiangyue Jin, Xianzu Wang, Xiao Bi, Xiaodong Liu, Xiaohan Wang, Xiaojin Shen, Xiaokang Chen, Xiaosha Chen, Xiaotao Nie, Xiaowen Sun. Arxiv 2024

  106. Can Mamba Learn How To Learn? {A} Comparative Study on In-Context Learning Tasks. Jongho Park, Jaeseung Park, Zheyang Xiong, Nayoung Lee, Jaewoong Cho, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos. Arxiv 2024

  107. You Only Cache Once: Decoder-Decoder Architectures for Language Models. Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei. Arxiv 2024

  108. Zamba: A Compact 7B SSM Hybrid Model. Paolo Glorioso, Quentin Anthony, Yury Tokpanov, James Whittington, Jonathan Pilault, Adam Ibrahim, Beren Millidge. Arxiv 2024

  109. Qwen2.5-1M Technical Report. An Yang, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoyan Huang, Jiandong Jiang, Jianhong Tu, Jianwei Zhang, Jingren Zhou, Junyang Lin, Kai Dang, Kexin Yang, Le Yu, Mei Li, Minmin Sun, Qin Zhu, Rui Men, Tao He, Weijia Xu, Wenbiao Yin, Wenyuan Yu, Xiafei Qiu, Xingzhang Ren, Xinlong Yang, Yong Li, Zhiying Xu, Zipeng Zhang. Arxiv 2025

  110. RazorAttention: Efficient {KV} Cache Compression Through Retrieval Heads. Hanlin Tang, Yang Lin, Jing Lin, Qingsen Han, Danning Ke, Shikuan Hong, Yiwu Yao, Gongyi Wang. Arxiv 2025

  111. LightTransfer: Your Long-Context {LLM} is Secretly a Hybrid Model with Effortless Adaptation. Xuan Zhang, Fengzhuo Zhang, Cunxiao Du, Chao Du, Tianyu Pang, Wei Gao, Min Lin. Arxiv 2025

  112. Unshackling Context Length: An Efficient Selective Attention Approach through Query-Key Compression. Haoyu Wang, Tong Teng, Tianyu Guo, An Xiao, Duyu Tang, Hanting Chen, Yunhe Wang. Arxiv 2025.

  113. Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs. Tao Ji, Bin Guo, Yuanbin Wu, Qipeng Guo, Lixing Shen, Zhan Chen, Xipeng Qiu, Qi Zhang, Tao Gui. Arxiv 2025.         GitHub Repo stars

  114. SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention. Hong Yankun, Li Xing, Zhen Hui-Ling, Yu Xianzhi, Liu Wulong, Yuan Mingxuan. Arxiv 2025.

  115. Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference. Yaohua Tang, Zhicheng Hu, Kun Cheng, Fan Mo, Qiheng Lv, Hua Wang, Zhi Chen. Arxiv 2025.

  116. DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance. Xuanfan Ni, Liyan Xu, Chenyang Lyu, Longyue Wang, Mo Yu, Lemao Liu, Fandong Meng, Jie Zhou, Piji Li. Arxiv 2025.

  117. KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse. Jingbo Yang, Bairu Hou, Wei Wei, Yujia Bao, Shiyu Changi. Arxiv 2025.         GitHub Repo stars

  118. FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference. Bingzhe Zhao, Ke Cheng, Aomufei Yuan, Yuxuan Tian, Ruiguang Zhong, Chengchen Hu, Tong Yang, Lian Yu. Arxiv 2025.

  119. CoKV: Optimizing KV Cache Allocation via Cooperative Game. Qiheng Sun, Hongwei Zhang, Haocheng Xia, Jiayao Zhang, Jinfei Liu, Kui Ren. Arxiv 2025.         GitHub Repo stars

  120. MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference. Zhongwei Wan, Hui Shen, Xin Wang, Che Liu, Zheda Mai, Mi Zhang. NAACL 2025.         GitHub Repo stars

  121. FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference. Xunhao Lai, Jianqiao Lu, Yao Luo, Yiyuan Ma, Xun Zhou. ICLR 2025 Oral.

  122. WeightedKV: Attention Scores Weighted Key-Value Cache Merging for Large Language Models. Jian Yuan, Ziwei He, Haoli Bai, Jingwen Leng, Bo Jiang. ICASSP 2025.

  123. Dialogue Without Limits: Constant-Sized KV Caches for Extended Responses in LLMs. Ravi Ghadia, Avinash Kumar, Gaurav Jain, Prashant Nair, Poulami Das. Arxiv 2025.

  124. KVCrush: Key value cache size-reduction using similarity in head-behaviour. Gopi Krishna Jha, Sameh Gobriel, Liubov Talamanova, Alexander Kozlov, Nilesh Jain. Arxiv 2025.

  125. EliteKV: Scalable KV Cache Compression via RoPE Frequency Selection and Joint Low-Rank Projection. Yuhao Zhou, Sirui Song, Boyang Liu, Zhiheng Xi, Senjie Jin, Xiaoran Fan, Zhihao Zhang, Wei Li, Xuanjing Huang. Arxiv 2025.

  126. Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM Serving. Qihui Zhou, Peiqi Yin, Pengfei Zuo, James Cheng. Arxiv 2025.

  127. Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression. Nathan Godey, Alessio Devoto, Yu Zhao, Simone Scardapane, Pasquale Minervini, Éric de la Clergerie, Benoît Sagot. Arxiv 2025.         GitHub Repo stars

  128. TokenButler: Token Importance is Predictable. Yash Akhauri, Ahmed F AbouElhamayed, Yifei Gao, Chi-Chih Chang, Nilesh Jain, Mohamed S. Abdelfattah. Arxiv 2025.         GitHub Repo stars

  129. Slim attention: cut your context memory in half without loss of accuracy -- K-cache is all you need for MHA. Nils Graef, Andrew Wasielewski. Arxiv 2025.         GitHub Repo stars

  130. LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference. Guangtao Wang, Shubhangi Upasani, Chen Wu, Darshan Gandhi, Jonathan Li, Changran Hu, Bo Li, Urmish Thakker. ICLR 2025.

  131. KV-Distill: Nearly Lossless Learnable Context Compression for LLMs. Vivek Chari, Guanghui Qin, Benjamin Van Durme. Arxiv 2025.         GitHub Repo stars

  132. Radar: Fast Long-Context Decoding for Any Transformer. Yongchang Hao, Mengyao Zhai, Hossein Hajimirsadeghi, Sepidehsadat Hosseini, Frederick Tung. ICLR 2025.

  133. PowerAttention: Exponentially Scaling of Receptive Fields for Effective Sparse Attention. Lida Chen, Dong Xu, Chenxin An, Xintao Wang, Yikai Zhang, Jiangjie Chen, Zujie Liang, Feng Wei, Jiaqing Liang, Yanghua Xiao, Wei Wang. Arxiv 2025.         GitHub Repo stars

  134. Cost-Optimal Grouped-Query Attention for Long-Context LLMs. Yingfa Chen, Yutong Wu, Xu Han, Zhiyuan Liu, Maosong Sun. Arxiv 2025.         GitHub Repo stars

  135. ZeroMerge: Parameter-Free KV Cache Compression for Memory-Efficient Long-Context LLMs. Xin Liu, Pei Liu, Guoming Tang. Arxiv 2025. GitHub Repo stars

  136. Exploring the Limits of KV Cache Compression in Visual Autoregressive Transformers. Bo Chen, Xiaoyu Li, Yekun Ke, Yingyu Liang, Zhenmei Shi, Zhao Song. Arxiv 2025.

  137. SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs. Shibo Jie, Yehui Tang, Kai Han, Zhi-Hong Deng, Jing Han. Arxiv 2025.

  138. KVShare: Semantic-Aware Key-Value Cache Sharing for Efficient Large Language Model Inference. Huan Yang, Renji Zhang, Deyu Zhang. Arxiv 2025.

  139. xKV: Cross-Layer SVD for KV-Cache Compression. Chi-Chih Chang, Chien-Yu Lin, Yash Akhauri, Wei-Cheng Lin, Kai-Chiang Wu, Luis Ceze, Mohamed S. Abdelfattah. Arxiv 2025. GitHub Repo stars

  140. WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference. Youhui Zuo, Sibo Wei, Chen Zhang, Zhuorui Liu, Wenpeng Lu, Dawei Song. Arxiv 2025. GitHub Repo stars

  141. BitDecoding: Unlocking Tensor Cores for Long-Context LLMs Decoding with Low-Bit KV Cache. Dayou Du, Shijie Cao, Jianyi Cheng, Ting Cao, Mao Yang. Arxiv 2025. GitHub Repo stars

  142. Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization. Minsu Kim, Seongmin Hong, RyeoWook Ko, Soongyu Choi, Hunjong Lee, Junsoo Kim, Joo-Young Kim, Jongse Park. Arxiv 2025.

  143. LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation. Han Chen, Zicong Jiang, Zining Zhang, Bingsheng He, Pingyi Luo, Mian Lu, Yuqiang Chen. ICLR 2025. GitHub Repo stars

  144. Cocktail: Chunk-Adaptive Mixed-Precision Quantization for Long-Context LLM Inference. Wei Tao, Bin Zhang, Xiaoyang Qu, Jiguang Wan, Jianzong Wang. DATE 2025.

  145. PromptDistill: Query-based Selective Token Retention in Intermediate Layers for Efficient Large Language Model Inference. Weisheng Jin, Maojia Song, Tej Deep Pala, Yew Ken Chia, Amir Zadeh, Chuan Li, Soujanya Poria. Arxiv 2025.

  146. SQuat: Subspace-orthogonal KV Cache Quantization. Hao Wang, Ligong Han, Kai Xu, Akash Srivastava. Arxiv 2025.

  147. Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving. Wei Gao, Xinyu Zhou, Peng Sun, Tianwei Zhang, Yonggang Wen. MLSys 2025. GitHub Repo stars

  148. SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching. Yuxuan Zhu, Ali Falahati, David H. Yang, Mohammad Mohammadi Amiri. Arxiv 2025.

  149. LagKV: Lag-Relative Information of the KV Cache Tells Which Tokens Are Important. Manlai Liang, JiaMing Zhang, Xiong Li, Jinlong Li. Arxiv 2025. GitHub Repo stars

  150. FlowKV: A Disaggregated Inference Framework with Low-Latency KV Cache Transfer and Load-Aware Scheduling. Weiqing Li, Guochao Jiang, Xiangyong Ding, Zhangcheng Tao, Chuzhan Hao, Chenfeng Xu, Yuewei Zhang, Hao Wang. Arxiv 2025.

  151. Apt-Serve: Adaptive Request Scheduling on Hybrid Cache for Scalable LLM Inference Serving. Shihong Gao, Xin Zhang, Yanyan Shen, Lei Chen. Arxiv 2025.

  152. KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference. Yuxuan Tian, Zihan Wang, Yebo Peng, Aomufei Yuan, Zhiming Wang, Bairen Yi, Xin Liu, Yong Cui, Tong Yang. Arxiv 2025.

  153. MOM: Memory-Efficient Offloaded Mini-Sequence Inference for Long Context Language Models. Junyang Zhang, Tianyi Zhu, Cheng Luo, Anima Anandkumar. Arxiv 2025. GitHub Repo stars

  154. CAOTE: KV Caching through Attention Output Error based Token Eviction. Raghavv Goel, Junyoung Park, Mukul Gagrani, Dalton Jones, Matthew Morse, Harper Langston, Mingu Lee, Chris Lott. Arxiv 2025.

  155. SlimPipe: Memory-Thrifty and Efficient Pipeline Parallelism for Long-Context LLM Training. Zhouyang Li, Yuliang Liu, Wei Zhang, Tailing Yuan, Bin Chen, Chengru Song, Di Zhang. Arxiv 2025.

  156. FreqKV: Frequency Domain Key-Value Compression for Efficient Context Window Extension. Jushi Kai, Boyi Zeng, Yixuan Wang, Haoli Bai, Bo Jiang, Zhouhan Lin. Arxiv 2025.

  157. dKV-Cache: The Cache for Diffusion Language Models. Xinyin Ma, Runpeng Yu, Gongfan Fang, Xinchao Wang. Arxiv 2025. GitHub Repo stars

  158. PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs. Tengxuan Liu, Shiyao Li, Jiayi Yang, Tianchen Zhao, Feng Zhou, Xiaohui Song, Guohao Dai, Shengen Yan, Huazhong Yang, Yu Wang. Arxiv 2025. GitHub Repo stars

  159. TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization. Dingyu Yao, Bowen Shen, Zheng Lin, Wei Liu, Jian Luan, Bin Wang, Weiping Wang. Arxiv 2025. GitHub Repo stars

  160. R-KV: Redundancy-aware KV Cache Compression for Training-Free Reasoning Models Acceleration. Zefan Cai, Wen Xiao, Hanshi Sun, Cheng Luo, Yikai Zhang, Ke Wan, Yucheng Li, Yeyang Zhou, Li-Wen Chang, Jiuxiang Gu, Zhen Dong, Anima Anandkumar, Abedelkadir Asi, Junjie Hu. Arxiv 2025. GitHub Repo stars

  161. ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration. Xianglong Yan, Zhiteng Li, Tianao Zhang, Linghe Kong, Yulun Zhang, Xiaokang Yang. Arxiv 2025. GitHub Repo stars

  162. VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models. Ce Zhang, Kaixin Ma, Tianqing Fang, Wenhao Yu, Hongming Zhang, Zhisong Zhang, Yaqi Xie, Katia Sycara, Haitao Mi, Dong Yu. Arxiv 2025.

  163. KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction. Jang-Hyun Kim, Jinuk Kim, Sangwoo Kwon, Jae W. Lee, Sangdoo Yun, Hyun Oh Song. Arxiv 2025. GitHub Repo stars

  164. Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference. Donghyeon Joo, Helya Hosseini, Ramyad Hadidi, Bahar Asgari. Arxiv 2025. GitHub Repo stars

  165. Lookahead Q-Cache: Achieving More Consistent KV Cache Eviction via Pseudo Query. Yixuan Wang, Shiyu Ji, Yijun Liu, Yuzhuang Xu, Yang Xu, Qingfu Zhu, Wanxiang Che. Arxiv 2025.

  166. Hardware-Efficient Attention for Fast Decoding. Ted Zadouri, Hubert Strauss, Tri Dao. Arxiv 2025. GitHub Repo stars

  167. Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion. Zhanqiu Hu, Jian Meng, Yash Akhauri, Mohamed S. Abdelfattah, Jae-sun Seo, Zhiru Zhang, Udit Gupta. Arxiv 2025.

  168. AhaKV: Adaptive Holistic Attention-Driven KV Cache Eviction for Efficient Inference of Large Language Models. Yifeng Gu, Zicong Jiang, Jianxiu Jin, Kailing Guo, Ziyang Zhang, Xiangmin Xu. Arxiv 2025.

  169. Inference-Time Hyper-Scaling with KV Cache Compression. Adrian Łańcucki, Konrad Staniszewski, Piotr Nawrot, Edoardo M. Ponti. Arxiv 2025.

  170. TaDA: Training-free recipe for Decoding with Adaptive KV Cache Compression and Mean-centering. Vinay Joshi, Pratik Prabhanjan Brahma, Zicheng Liu, Emad Barsoum. Arxiv 2025.

  171. Homogeneous Keys, Heterogeneous Values: Exploiting Local KV Cache Asymmetry for Long-Context LLMs. Wanyun Cui, Mingwei Xu. Arxiv 2025.

  172. Paged Attention Meets FlexAttention: Unlocking Long-Context Efficiency in Deployed Inference. Thomas Joshi, Herman Saini, Neil Dhillon, Antoni Viros i Martin, Kaoutar El Maghraoui. Arxiv 2025.

  173. KVmix: Gradient-Based Layer Importance-Aware Mixed-Precision Quantization for KV Cache. Fei Li, Song Liu, Weiguo Wu, Shiqiang Nie, Jinyu Wang. Arxiv 2025.

  174. Efficient Long-Context LLM Inference via KV Cache Clustering. Jie Hu, Shengnan Wang, Yutong He, Ping Gong, Jiawei Yi, Juncheng Zhang, Youhui Bai, Renhai Chen, Gong Zhang, Cheng Li, Kun Yuan. Arxiv 2025.

  175. Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache. Xiaoran Liu, Siyang He, Qiqi Wang, Ruixiao Li, Yuerong Song, Zhigeng Liu, Linlin Li, Qun Liu, Zengfeng Huang, Qipeng Guo, Ziwei He, Xipeng Qiu. Arxiv 2025.

  176. Latent Multi-Head Attention for Small Language Models. Sushant Mehta, Raj Dandekar, Rajat Dandekar, Sreedath Panat. Arxiv 2025.

  177. Multipole Attention for Efficient Long Context Reasoning. Coleman Hooper, Sebastian Zhao, Luca Manolache, Sehoon Kim, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami. Arxiv 2025. GitHub Repo stars

  178. Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization. Guanghui Song, Dongping Liao, Yiren Zhao, Kejiang Ye, Cheng-zhong Xu, Xitong Gao. Arxiv 2025.

  179. Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?. Adithya Bhaskar, Alexander Wettig, Tianyu Gao, Yihe Dong, Danqi Chen. Arxiv 2025. GitHub Repo stars

  180. LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning. Haoyue Zhang, Hualei Zhang, Xiaosong Ma, Jie Zhang, Song Guo. Arxiv 2025.

  181. CommVQ: Commutative Vector Quantization for KV Cache Compression. Junyan Li, Yang Zhang, Muhammad Yusuf Hassan, Talha Chafekar, Tianle Cai, Zhile Ren, Pengsheng Guo, Foroozan Karimzadeh, Colorado Reed, Chong Wang, Chuang Gan. Arxiv 2025. GitHub Repo stars

  182. X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression. Guihong Li, Mehdi Rezagholizadeh, Mingyu Yang, Vikram Appia, Emad Barsoum. Arxiv 2025.

  183. OmniKV: Dynamic Context Selection for Efficient Long-Context LLMs Jitai Hao, Yuke Zhu, Tian Wang, Jun Yu, Xin Xin, Bo Zheng, Zhaochun Ren, Sheng Guo. ICLR 2025.

  184. XAttention: Block Sparse Attention with Antidiagonal Scoring. Ruyi Xu, Guangxuan Xiao, Haofeng Huang, Junxian Guo, Song Han. Arxiv 2025. GitHub Repo stars

  185. The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs. Piotr Nawrot, Robert Li, Renjie Huang, Sebastian Ruder, Kelly Marchisio, Edoardo M. Ponti. Arxiv 2025. GitHub Repo stars

  186. Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing. Piotr Piękos, Róbert Csordás, Jürgen Schmidhuber. Arxiv 2025. GitHub Repo stars

  187. Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs. Woomin Song, Seunghyuk Oh, Sangwoo Mo, Jaehyung Kim, Sukmin Yun, Jung-Woo Ha, Jinwoo Shin. ICLR 2024. GitHub Repo stars

  188. Sparsified State-Space Models are Efficient Highway Networks. Woomin Song, Jihoon Tack, Sangwoo Mo, Seunghyuk Oh, Jinwoo Shin. TMLR 2025. GitHub Repo stars

  189. Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers. Woomin Song, Sai Muralidhar Jayanthi, Srikanth Ronanki, Kanthashree Mysore Sathyendra, Jinwoo Shin, Aram Galstyan, Shubham Katiyar, Sravan Babu Bodapati. Arxiv 2025.

  190. Multi-head Temporal Latent Attention. Keqi Deng, Philip C. Woodland. Arxiv 2025. GitHub Repo stars

  191. Scale-invariant Attention. Ben Anson, Xi Wang, Laurence Aitchison. Arxiv 2025.

  192. SageAttention2++: A More Efficient Implementation of SageAttention2. Jintao Zhang, Xiaoming Xu, Jia Wei, Haofeng Huang, Pengle Zhang, Chendong Xiang, Jun Zhu, Jianfei Chen. Arxiv 2025. GitHub Repo stars

  193. HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference. Ping Gong, Jiawei Yi, Shengnan Wang, Juncheng Zhang, Zewen Jin, Ouxiang Zhou, Ruibo Liu, Guanbin Xu, Youhui Bai, Bowen Ye, Kun Yuan, Tong Yang, Gong Zhang, Renhai Chen, Feng Wu, Cheng Li. Arxiv 2025. GitHub Repo stars

  194. Rectified Sparse Attention. Yutao Sun, Tianzhu Ye, Li Dong, Yuqing Xia, Jian Chen, Yizhao Gao, Shijie Cao, Jianyong Wang, Furu Wei. Arxiv 2025. GitHub Repo stars

  195. SeerAttention-R: Sparse Attention Adaptation for Long Reasoning. Yizhao Gao, Shuming Guo, Shijie Cao, Yuqing Xia, Yu Cheng, Lei Wang, Lingxiao Ma, Yutao Sun, Tianzhu Ye, Li Dong, Hayden Kwok-Hay So, Yu Hua, Ting Cao, Fan Yang, Mao Yang. Arxiv 2025. GitHub Repo stars

  196. Lag-Relative Sparse Attention In Long Context Training. Manlai Liang, Wanyi Huang, Mandi Liu, Huaijun Li, Jinlong Li. Arxiv 2025.

  197. DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration. Hanzhi Zhang, Heng Fan, Kewei Sha, Yan Huang, Yunhe Feng. Arxiv 2025. GitHub Repo stars

  198. GTA: Grouped-head latenT Attention. Luoyang Sun, Jiwen Jiang, Cheng Deng, Xinjian Wu, Haifeng Zhang, Lei Chen, Lionel Ni, Jun Wang. Arxiv 2025.

  199. Fast and Simplex: 2-Simplicial Attention in Triton. Aurko Roy, Timothy Chou, Sai Surya Duvvuri, Sijia Chen, Jiecao Yu, Xiaodong Wang, Manzil Zaheer, Rohan Anil. Arxiv 2025.

  200. Mitigating Posterior Salience Attenuation in Long-Context LLMs with Positional Contrastive Decoding. Zikai Xiao, Ziyang Wang, Wen Ma, Yan Zhang, Wei Shen, Yan Wang, Luqi Gong, Zuozhu Liu. Arxiv 2025.

  201. Long-Short Alignment for Effective Long-Context Modeling in LLMs. Tianqi Du, Haotian Huang, Yifei Wang, Yisen Wang. Arxiv 2025. GitHub Repo stars

  202. Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences. Stas Bekman, Samyam Rajbhandari, Michael Wyatt, Jeff Rasley, Tunji Ruwase, Zhewei Yao, Aurick Qiao, Yuxiong He. Arxiv 2025. GitHub Repo stars

  203. Long-Context Generalization with Sparse Attention. Pavlo Vasylenko, Marcos Treviso, André F. T. Martins. Arxiv 2025. GitHub Repo stars

Hybrid Architecture

  1. C4AI Command R7B: A 7 Billion Parameter Multilingual Model. Cohere, Cohere For AI. Arxiv 2024

  2. Jamba: A hybrid transformer-mamba language model. Opher Lieber and Barak Lenz and Hofit Bata and Gal Cohen and Jhonathan Osin and Itay Dalmedigos and Erez Safahi and Shaked Meirom and Yonatan Belinkov and Shai Shalev-Shwartz and Omri Abend and Raz Alon and Tomer Asida and Amir Bergman and Roman Glozman and Michael Gokhman and Avashalom Manevich and Nir Ratner and Noam Rozen and Erez Shwartz and Mor Zusman and Yoav Shoham. Arxiv 2024

  3. Hymba: A hybrid-head architecture for small language models. Xin Dong and Yonggan Fu and Shizhe Diao and Wonmin Byeon and Zijia Chen and Ameya Sunil Mahabaleshwarkar and Shih-Yang Liu and Matthijs Van Keirsbilck and Min-Hung Chen and Yoshi Suhara and Yingyan Lin and Jan Kautz and Pavlo Molchanov. Arxiv 2024

  4. Zamba: A compact 7b ssm hybrid model. Paolo Glorioso and Quentin Anthony and Yury Tokpanov and James Whittington and Jonathan Pilault and Adam Ibrahim and Beren Millidge. Arxiv 2024

  5. Goldfinch: High performance rwkv/transformer hybrid with linear pre-fill and extreme kv-cache compression. Daniel Goldstein and Fares Obeid and Eric Alcaide and Guangyu Song and Eugene Cheah. Arxiv 2024

  6. Gemma 2: Improving open language models at a practical size. Gemma Team and Morgane Riviere and Shreya Pathak and Pier Giuseppe Sessa and Cassidy Hardin and Surya Bhupatiraju and Léonard Hussenot and Thomas Mesnard and Bobak Shahriari and Alexandre Ramé and Johan Ferret and Peter Liu and Pouya Tafti and Abe Friesen and Michelle Casbon and Sabela Ramos and Ravin Kumar and Charline Le Lan and Sammy Jerome and Anton Tsitsulin and Nino Vieillard and Piotr Stanczyk and Sertan Girgin and Nikola Momchev and Matt Hoffman and Shantanu Thakoor and Jean-Bastien Grill and Behnam Neyshabur and Olivier Bachem and Alanna Walton and Aliaksei Severyn and Alicia Parrish and Aliya Ahmad and Allen Hutchison and Alvin Abdagic and Amanda Carl and Amy Shen and Andy Brock and Andy Coenen and Anthony Laforge and Antonia Paterson and Ben Bastian and Bilal Piot and Bo Wu and Brandon Royal and Charlie Chen and Chintu Kumar and Chris Perry and Chris Welty and Christopher A. Choquette-Choo and Danila Sinopalnikov and David Weinberger and Dimple Vijaykumar and Dominika Rogozińska and Dustin Herbison and Elisa Bandy and Emma Wang and Eric Noland and Erica Moreira and Evan Senter and Evgenii Eltyshev and Francesco Visin and Gabriel Rasskin and Gary Wei and Glenn Cameron and Gus Martins and Hadi Hashemi and Hanna Klimczak-Plucińska and Harleen Batra and Harsh Dhand and Ivan Nardini and Jacinda Mein and Jack Zhou and James Svensson and Jeff Stanway and Jetha Chan and Jin Peng Zhou and Joana Carrasqueira and Joana Iljazi and Jocelyn Becker and Joe Fernandez and Joost van Amersfoort and Josh Gordon and Josh Lipschultz and Josh Newlan and Ju-yeong Ji and Kareem Mohamed and Kartikeya Badola and Kat Black and Katie Millican and Keelin McDonell and Kelvin Nguyen and Kiranbir Sodhia and Kish Greene and Lars Lowe Sjoesund and Lauren Usui and Laurent Sifre and Lena Heuermann and Leticia Lago and Lilly McNealus and Livio Baldini Soares and Logan Kilpatrick and Lucas Dixon and Luciano Martins and Machel Reid and Manvinder Singh and Mark Iverson and Martin Görner and Mat Velloso and Mateo Wirth and Matt Davidow and Matt Miller and Matthew Rahtz and Matthew Watson and Meg Risdal and Mehran Kazemi and Michael Moynihan and Ming Zhang and Minsuk Kahng and Minwoo Park and Mofi Rahman and Mohit Khatwani and Natalie Dao and Nenshad Bardoliwalla and Nesh Devanathan and Neta Dumai and Nilay Chauhan and Oscar Wahltinez and Pankil Botarda and Parker Barnes and Paul Barham and Paul Michel and Pengchong Jin and Petko Georgiev and Phil Culliton and Pradeep Kuppala and Ramona Comanescu and Ramona Merhej and Reena Jana and Reza Ardeshir Rokni and Rishabh Agarwal and Ryan Mullins and Samaneh Saadat and Sara Mc Carthy and Sarah Cogan and Sarah Perrin and Sébastien M. R. Arnold and Sebastian Krause and Shengyang Dai and Shruti Garg and Shruti Sheth and Sue Ronstrom and Susan Chan and Timothy Jordan and Ting Yu and Tom Eccles and Tom Hennigan and Tomas Kocisky and Tulsee Doshi and Vihan Jain and Vikas Yadav and Vilobh Meshram and Vishal Dharmadhikari and Warren Barkley and Wei Wei and Wenming Ye and Woohyun Han and Woosuk Kwon and Xiang Xu and Zhe Shen and Zhitao Gong and Zichuan Wei and Victor Cotruta and Phoebe Kirk and Anand Rao and Minh Giang and Ludovic Peran and Tris Warkentin and Eli Collins and Joelle Barral and Zoubin Ghahramani and Raia Hadsell and D. Sculley and Jeanine Banks and Anca Dragan and Slav Petrov and Oriol Vinyals and Jeff Dean and Demis Hassabis and Koray Kavukcuoglu and Clement Farabet and Elena Buchatskaya and Sebastian Borgeaud and Noah Fiedel and Armand Joulin and Kathleen Kenealy and Robert Dadashi and Alek Andreev. Arxiv 2024

  7. Jamba-1.5: Hybrid transformer-mamba models at scale. Jamba Team and Barak Lenz and Alan Arazi and Amir Bergman and Avshalom Manevich and Barak Peleg and Ben Aviram and Chen Almagor and Clara Fridman and Dan Padnos and Daniel Gissin and Daniel Jannai and Dor Muhlgay and Dor Zimberg and Edden M Gerber and Elad Dolev and Eran Krakovsky and Erez Safahi and Erez Schwartz and Gal Cohen and Gal Shachaf and Haim Rozenblum and Hofit Bata and Ido Blass and Inbal Magar and Itay Dalmedigos and Jhonathan Osin and Julie Fadlon and Maria Rozman and Matan Danos and Michael Gokhman and Mor Zusman and Naama Gidron and Nir Ratner and Noam Gat and Noam Rozen and Oded Fried and Ohad Leshno and Omer Antverg and Omri Abend and Opher Lieber and Or Dagan and Orit Cohavi and Raz Alon and Ro'i Belson and Roi Cohen and Rom Gilad and Roman Glozman and Shahar Lev and Shaked Meirom and Tal Delbari and Tal Ness and Tomer Asida and Tom Ben Gal and Tom Braude and Uriya Pumerantz and Yehoshua Cohen and Yonatan Belinkov and Yuval Globerson and Yuval Peleg Levy and Yoav Shoham. Arxiv 2024

  8. RecurrentGemma: Moving Past Transformers for Efficient Open Language Models. Aleksandar Botev and Soham De and Samuel L Smith and Anushan Fernando and George-Cristian Muraru and Ruba Haroun and Leonard Berrada and Razvan Pascanu and Pier Giuseppe Sessa and Robert Dadashi and Léonard Hussenot and Johan Ferret and Sertan Girgin and Olivier Bachem and Alek Andreev and Kathleen Kenealy and Thomas Mesnard and Cassidy Hardin and Surya Bhupatiraju and Shreya Pathak and Laurent Sifre and Morgane Rivière and Mihir Sanjay Kale and Juliette Love and Pouya Tafti and Armand Joulin and Noah Fiedel and Evan Senter and Yutian Chen and Srivatsan Srinivasan and Guillaume Desjardins and David Budden and Arnaud Doucet and Sharad Vikram and Adam Paszke and Trevor Gale and Sebastian Borgeaud and Charlie Chen and Andy Brock and Antonia Paterson and Jenny Brennan and Meg Risdal and Raj Gundluru and Nesh Devanathan and Paul Mooney and Nilay Chauhan and Phil Culliton and Luiz Gustavo Martins and Elisa Bandy and David Huntsperger and Glenn Cameron and Arthur Zucker and Tris Warkentin and Ludovic Peran and Minh Giang and Zoubin Ghahramani and Clément Farabet and Koray Kavukcuoglu and Demis Hassabis and Raia Hadsell and Yee Whye Teh and Nando de Frietas. Arxiv 2024

  9. The Zamba2 Suite: Technical Report. Paolo Glorioso and Quentin Anthony and Yury Tokpanov and Anna Golubeva and Vasudev Shyam and James Whittington and Jonathan Pilault and Beren Millidge. Arxiv 2024

  10. You only cache once: Decoder-decoder architectures for language models. Yutao Sun and Li Dong and Yi Zhu and Shaohan Huang and Wenhui Wang and Shuming Ma and Quanlu Zhang and Jianyong Wang and Furu Wei. Arxiv 2024

Workflow Design

Prompt Compression

  1. Prompt Compression for Large Language Models: A Survey. Zongqian Li, Yinhong Liu, Yixuan Su, Nigel Collier. Arxiv 2024.
Hard Prompt Compression
  1. LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models. Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, Lili Qiu. Arxiv 2023. GitHub Repo stars

  2. LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression. Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu. Arxiv 2023. GitHub Repo stars

  3. LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression. Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Rühle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang. Arxiv 2024. GitHub Repo stars

  4. Compressing Context to Enhance Inference Efficiency of Large Language Models. Yucheng Li, Bo Dong, Chenghua Lin, Frank Guerin. Arxiv 2023. GitHub Repo stars

  5. TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning. Shivam Shandilya, Menglin Xia, Supriyo Ghosh, Huiqiang Jiang, Jue Zhang, Qianhui Wu, Victor Rühle. Arxiv 2024.

  6. Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference. Barys Liskavets, Maxim Ushakov, Shuvendu Roy, Mark Klibanov, Ali Etemad, Shane Luke. Arxiv 2024. GitHub Repo stars

  7. AdaComp: Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models. Qianchi Zhang, Hainan Zhang, Liang Pang, Hongwei Zheng, Zhiming Zheng. Arxiv 2024.

  8. Learning to Compress Prompt in Natural Language Formats. Yu-Neng Chuang, Tianwei Xing, Chia-Yuan Chang, Zirui Liu, Xun Chen, Xia Hu. Arxiv 2024.

  9. {TCRA}-{LLM}: Token Compression Retrieval Augmented Large Language Model for Inference Cost Reduction. Junyi Liu, Liangzhi Li, Tong Xiang, Bowen Wang, Yiming Qian. Arxiv 2023

  10. Familiarity-Aware Evidence Compression for Retrieval-Augmented Generation. Dongwon Jung, Qin Liu, Tenghao Huang, Ben Zhou, Muhao Chen. Arxiv 2024

  11. Discrete Prompt Compression With Reinforcement Learning. Hoyoun Jung, Kyung-Joong Kim. Arxiv 2024

  12. CompAct: Compressing Retrieved Documents Actively for Question Answering. Chanwoong Yoon, Taewhoo Lee, Hyeon Hwang, Minbyul Jeong, Jaewoo Kang. Arxiv 2024

  13. EXIT: Context-Aware Extractive Compression for Enhancing Retrieval-Augmented Generation. Taeho Hwang, Sukmin Cho, Soyeong Jeong, Hoyun Song, SeungYoon Han, Jong C. Park. Arxiv 2024. GitHub Repo stars

  14. Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability. Tsz Ting Chung, Leyang Cui, Lemao Liu, Xinting Huang, Shuming Shi, Dit-Yan Yeung. EMNLP 2024.

Soft Prompt Compression
  1. Adapting Language Models to Compress Contexts. Alexis Chevalier, Alexander Wettig, Anirudh Ajith, Danqi Chen. Arxiv 2023. GitHub Repo stars

  2. xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token. Xin Cheng, Xun Wang, Xingxing Zhang, Tao Ge, Si-Qing Chen, Furu Wei, Huishuai Zhang, Dongyan Zhao. Arxiv 2024. GitHub Repo stars

  3. In-context Autoencoder for Context Compression in a Large Language Model. Tao Ge, Hu Jing, Lei Wang, Xun Wang, Si-Qing Chen, Furu Wei. ICLR 2024. GitHub Repo stars

  4. The Power of Scale for Parameter-Efficient Prompt Tuning. Brian Lester, Rami Al-Rfou, Noah Constant. Arxiv 2021

  5. Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models. David Wingate, Mohammad Shoeybi, Taylor Sorensen. Arxiv 2022

  6. Learning to Compress Prompts with Gist Tokens. Jesse Mu, Xiang Lisa Li, Noah Goodman. Arxiv 2024

  7. Unifying Demonstration Selection and Compression for In-Context Learning. Jun Gao, Ziqiang Cao, Wenjie Li. Arxiv 2024

  8. Long Context Compression with Activation Beacon. Peitian Zhang, Zheng Liu, Shitao Xiao, Ninglu Shao, Qiwei Ye, Zhicheng Dou. Arxiv 2024

  9. 500xCompressor: Generalized Prompt Compression for Large Language Models. Zongqian Li, Yixuan Su, Nigel Collier. Arxiv 2024

  10. DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models. Saeed Ranjbar Alvar, Gursimran Singh, Mohammad Akbari, Yong Zhang. Arxiv 2025.

  11. EFPC: Towards Efficient and Flexible Prompt Compression. Yun-Hao Cao, Yangsong Wang, Shuzheng Hao, Zhenxing Li, Chengjun Zhan, Sichao Liu, Yi-Qi Hu. Arxiv 2025.

  12. AttentionRAG: Attention-Guided Context Pruning in Retrieval-Augmented Generation. Yixiong Fang, Tianran Sun, Yuling Shi, Xiaodong Gu. Arxiv 2025.

  13. Limits of KV Cache Compression for Tensor Attention based Autoregressive Transformers. Yifang Chen, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song, Yu Tian. Arxiv 2025.

  14. Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models. Zhihang Liu, Chen-Wei Xie, Pandeng Li, Liming Zhao, Longxiang Tang, Yun Zheng, Chuanbin Liu, Hongtao Xie. CVPR 2025. GitHub Repo stars

  15. Token Dynamics: Towards Efficient and Dynamic Video Token Representation for Video Large Language Models. Haichao Zhang, Zhuowei Li, Dimitris Metaxas, Yun Fu. Arxiv 2025.

  16. A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression. Chenlong Deng, Zhisong Zhang, Kelong Mao, Shuaiyi Li, Xinting Huang, Dong Yu, Zhicheng Dou. Arxiv 2024.

  17. Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers. Haoran You, Connelly Barnes, Yuqian Zhou, Yan Kang, Zhenbang Du, Wei Zhou, Lingzhi Zhang, Yotam Nitzan, Xiaoyang Liu, Zhe Lin, Eli Shechtman, Sohrab Amirghodsi, Yingyan Celine Lin. Arxiv 2024. GitHub Repo stars

  18. Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference. Weizhi Fei, Xueyan Niu, Guoqing Xie, Yingqing Liu, Bo Bai, Wei Han. Arxiv 2025.

  19. Understanding and Improving Information Preservation in Prompt Compression for LLMs. Weronika Łajewska, Momchil Hardalov, Laura Aina, Neha Anna John, Hang Su, Lluís Màrquezu. Arxiv 2025.

  20. Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck. Adrian Bulat, Yassine Ouali, Georgios Tzimiropoulos. Arxiv 2025.

  21. Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-Generation. Weitao Li, Kaiming Liu, Xiangyu Zhang, Xuanyu Lei, Weizhi Ma, Yang Liu. Arxiv 2025. GitHub Repo stars

  22. Saliency-driven Dynamic Token Pruning for Large Language Models. Yao Tao, Yehui Tang, Yun Wang, Mingjian Zhu, Hailin Hu, Yunhe Wang. Arxiv 2025.

  23. Dynamic Compressing Prompts for Efficient Inference of Large Language Models. Jinwu Hu, Wei Zhang, Yufeng Wang, Yu Hu, Bin Xiao, Mingkui Tan, Qing Du. Arxiv 2025. GitHub Repo stars

  24. ACoRN: Noise-Robust Abstractive Compression in Retrieval-Augmented Language Models. Singon Kim, Gunho Jung, Seong-Whan Lee. Arxiv 2025.

  25. MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores. Fengwei Zhou, Jiafei Song, Wenjin Jason Li, Gengjian Xue, Zhikang Zhao, Yichao Lu, Bailin Na. Arxiv 2025.

  26. Token Sequence Compression for Efficient Multimodal Computing. Yasmine Omri, Parth Shroff, Thierry Tambe. Arxiv 2025.

  27. An Empirical Study on Prompt Compression for Large Language Models. Zheng Zhang, Jinyi Li, Yihuai Lan, Xiang Wang, Hao Wang. Arxiv 2025. GitHub Repo stars

  28. Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models. Xuyang Liu, Yiyu Wang, Junpeng Ma, Linfeng Zhang. Arxiv 2025. GitHub Repo stars

  29. Beyond Hard and Soft: Hybrid Context Compression for Balancing Local and Global Information Retention. Huanxuan Liao, Wen Hu, Yao Xu, Shizhu He, Jun Zhao, Kang Liu. Arxiv 2025. GitHub Repo stars

  30. QwenLong-CPRS: Towards ∞-LLMs with Dynamic Context Optimization. Weizhou Shen, Chenliang Li, Fanqi Wan, Shengyi Liao, Shaopeng Lai, Bo Zhang, Yingcheng Shi, Yuning Wu, Gang Fu, Zhansheng Li, Bin Yang, Ji Zhang, Fei Huang, Jingren Zhou, Ming Yan. Arxiv 2025. GitHub Repo stars

  31. Lossless Token Sequence Compression via Meta-Tokens. John Harvill, Ziwei Fan, Hao Wang, Yizhou Sun, Hao Ding, Luke Huan, Anoop Deoras. Arxiv 2025.

  32. Sentinel: Attention Probing of Proxy Models for LLM Context Compression with an Understanding Perspective. Yong Zhang, Yanwen Huang, Ning Cheng, Yang Guo, Yun Zhu, Yanmeng Wang, Shaojun Wang, Jing Xiao. Arxiv 2025. GitHub Repo stars

  33. METok: Multi-Stage Event-based Token Compression for Efficient Long Video Understanding. Mengyue Wang, Shuo Chen, Kristian Kersting, Volker Tresp, Yunpu Ma. Arxiv 2025.

  34. SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression. Yucheng Li, Surin Ahn, Huiqiang Jiang, Amir H. Abdi, Yuqing Yang, Lili Qiu. Arxiv 2025. GitHub Repo stars

Memory-Based

  1. Towards Teachable Reasoning Systems: Using a Dynamic Memory of User Feedback for Continual System Improvement. Bhavana Dalvi Mishra, Oyvind Tafjord, Peter Clark. EMNLP 2022

  2. Augmenting Language Models with Long-Term Memory. Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei. NeurIPS 2023

  3. {MEMORYLLM:} Towards Self-Updatable Large Language Models. Yu Wang, Yifan Gao, Xiusi Chen, Haoming Jiang, Shiyang Li, Jingfeng Yang, Qingyu Yin, Zheng Li, Xian Li, Bing Yin, Jingbo Shang, Julian J. McAuley. ICML 2024

  4. MemoryBank: Enhancing Large Language Models with Long-Term Memory. Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, Yanlin Wang. Arxiv 2023.
            GitHub Repo stars

  5. You Only Read Once (YORO): Learning to Internalize Database Knowledge for Text-to-SQL. Hideo Kobayashi, Wuwei Lan, Peng Shi, Shuaichen Chang, Jiang Guo, Henghui Zhu, Zhiguo Wang, Patrick Ng. NAACL 2025

RAG-Based

  1. {BERT}: Pre-training of Deep Bidirectional Transformers for Language Understanding. Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. ACL 2019

  2. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. Gautier Izacard, Edouard Grave. ACL 2021

  3. RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation. Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian{-}Guang Lou, Weizhu Chen. EMNLP 2023

  4. Query Rewriting in Retrieval-Augmented Large Language Models. Xinbei Ma, Yeyun Gong, Pengcheng He, hai zhao, Nan Duan. EMNLP 2023

  5. {REPLUG}: Retrieval-Augmented Black-Box Language Models. Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Richard James, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih. ACL 2024

  6. {BGE} M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation. Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, Zheng Liu. Arxiv 2024

  7. Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference. Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hallström, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, Nathan Cooper, Griffin Adams, Jeremy Howard, Iacopo Poli. Arxiv 2024

  8. Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning. Giulio Corallo, Orion Weller, Fabio Petroni, Paolo Papotti. Arxiv 2025.

  9. Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention. Emily Xiao, Chin-Jou Li, Yilin Zhang, Graham Neubig, Amanda Bertsch. Arxiv 2025.         GitHub Repo stars

Agent-Based

  1. Re3: Generating Longer Stories With Recursive Reprompting and Revision. Kevin Yang, Yuandong Tian, Nanyun Peng, Dan Klein. EMNLP 2022

  2. Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading. Howard Chen, Ramakanth Pasunuru, Jason Weston, Asli Celikyilmaz. Arxiv 2023.

  3. PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents. Simeng Sun, Yang Liu, Shuohang Wang, Dan Iter, Chenguang Zhu, Mohit Iyyer. EACL 2024.         GitHub Repo stars

  4. Learning to Reason and Memorize with Self-Notes. Jack Lanchantin, Shubham Toshniwal, Jason Weston, arthur szlam, Sainbayar Sukhbaatar. NeurIPS 2023

  5. GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models. Shilong Li, Yancheng He, Hangyu Guo, Xingyuan Bu, Ge Bai, Jie Liu, Jiaheng Liu, Xingwei Qu, Yangguang Li, Wanli Ouyang, Wenbo Su, Bo Zheng. Arxiv 2024.

  6. A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts. Kuang-Huei Lee, Xinyun Chen, Hiroki Furuta, John Canny, Ian Fischer. Arxiv 2024.

  7. RoleAgent: Building, Interacting, and Benchmarking High-quality Role-Playing Agents from Scripts. Jiaheng Liu, Zehao Ni, Haoran Que, Tao Sun, Noah Wang, Jian Yang, JiakaiWang, Hongcheng Guo, Z.Y. Peng, Ge Zhang, Jiayi Tian, Xingyuan Bu, Ke Xu, Wenge Rong, Junran Peng, Zhaoxiang Zhang. NeurIPS 2024

  8. Chain of Agents: Large Language Models Collaborating on Long-Context Tasks. Yusen Zhang, Ruoxi Sun, Yanfei Chen, Tomas Pfister, Rui Zhang, Sercan Ö. Arik. Arxiv 2024.

  9. LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration. Jun Zhao, Can Zu, Hao Xu, Yi Lu, Wei He, Yiwen Ding, Tao Gui, Qi Zhang, Xuanjing Huang. Arxiv 2024.

Evaluation

Long-Context Comprehension

  1. Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks. Chonghua Wang, Haodong Duan, Songyang Zhang, Dahua Lin, Kai Chen. Arxiv 2024. GitHub Repo stars

  2. BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack. Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Ivan Rodkin, Dmitry Sorokin, Artyom Sorokin, Mikhail Burtsev. Arxiv 2024. GitHub Repo stars

  3. DENIAHL: In-Context Features Influence LLM Needle-In-A-Haystack Abilities. Hui Dai, Dan Pechi, Xinyi Yang, Garvit Banga, Raghav Mantri. Arxiv 2024. GitHub Repo stars

  4. Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data. Seiji Maekawa, Hayate Iso, Nikita Bhutani. Arxiv 2024. GitHub Repo stars

  5. LongIns: A Challenging Long-context Instruction-based Exam for LLMs. Shawn Gavin, Tuney Zheng, Jiaheng Liu, Quehry Que, Noah Wang, Jian Yang, Chenchen Zhang, Wenhao Huang, Wenhu Chen, Ge Zhang. Arxiv 2024.

  6. Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs. Runchu Tian, Yanghao Li, Yuepeng Fu, Siyang Deng, Qinyu Luo, Cheng Qian, Shuo Wang, Xin Cong, Zhong Zhang, Yesai Wu, Yankai Lin, Huadong Wang, Xiaojiang Liu. Arxiv 2024. GitHub Repo stars

  7. LIFBench: Evaluating the Instruction Following Performance and Stability of Large Language Models in Long-Context Scenarios. Xiaodong Wu, Minhao Wang, Yichen Liu, Xiaoming Shi, He Yan, Xiangju Lu, Junmin Zhu, Wei Zhang. Arxiv 2024.

  8. Long Range Arena: A Benchmark for Efficient Transformers. Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler. Arxiv 2020

  9. LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion. Zhan Ling, Kang Liu, Kai Yan, Yifan Yang, Weijian Lin, Ting-Han Fan, Lingfeng Shen, Zhengyin Du, Jiecao Chen. Arxiv 2025.

  10. Evaluating Multilingual Long-Context Models for Retrieval and Reasoning. Agrawal, Ameeta and Dang, Andy and Nezhad, Sina Bagheri and Pokharel, Rhitabrat and Scheinberg, Russell. ACL 2024.

  11. M4le: A multi-ability multi-range multi-task multi-domain long-context evaluation benchmark for large language models. Kwan, Wai-Chung and Zeng, Xingshan and Wang, Yufei and Sun, Yusen and Li, Liangyou and Shang, Lifeng and Liu, Qun and Wong, Kam-Fai. ACL 2024.

  12. Michelangelo: Long context evaluations beyond haystacks via latent structure queries. Vodrahalli, Kiran and Ontanon, Santiago and Tripuraneni, Nilesh and Xu, Kelvin and Jain, Sanil and Shivanna, Rakesh and Hui, Jeffrey and Dikkala, Nishanth and Kazemi, Mehran and Fatemi, Bahare and others. Arxiv 2024.

  13. Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models. Amey Hengle, Prasoon Bajpai, Soham Dan, Tanmoy Chakraborty. Arxiv 2024. GitHub Repo stars

  14. Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?. Jonathan Roberts, Kai Han, Samuel Albanie. Arxiv 2024. GitHub Repo stars

  15. NoLiMa: Long-Context Evaluation Beyond Literal Matching. Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Trung Bui, Ryan A. Rossi, Seunghyun Yoon, Hinrich Schütze. Arxiv 2025.

  16. RULER: What’s the Real Context Size of Your Long-Context Language Models?. Hsieh, Cheng-Ping and Sun, Simeng and Kriman, Samuel and Acharya, Shantanu and Rekesh, Dima and Jia, Fei and Ginsburg, Boris. COLM 2024.

  17. S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Model. Lei, Fangyu and Liu, Qian and Huang, Yiming and He, Shizhu and Zhao, Jun and Liu, Kang. NAACL 2024.

  18. Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems. Philippe Laban, Alexander R. Fabbri, Caiming Xiong, Chien-Sheng Wu. Arxiv 2024. GitHub Repo stars

  19. LongHealth: A Question Answering Benchmark with Long Clinical Documents. Lisa Adams, Felix Busch, Tianyu Han, Jean-Baptiste Excoffier, Matthieu Ortala, Alexander Löser, Hugo JWL. Aerts, Jakob Nikolas Kather, Daniel Truhn, Keno Bressem. Arxiv 2024.

  20. Mathhay: An automated benchmark for long-context mathematical reasoning in llms. Wang, Lei and Dong, Shan and Xu, Yuhui and Dong, Hanze and Wang, Yalu and Saha, Amrita and Lim, Ee-Peng and Xiong, Caiming and Sahoo, Doyen. Arxiv 2024.

  21. RepoQA: Evaluating Long Context Code Understanding. Jiawei Liu, Jia Le Tian, Vijay Daita, Yuxiang Wei, Yifeng Ding, Yuhan Katherine Wang, Jun Yang, Lingming Zhang. Arxiv 2024. GitHub Repo stars         Static Badge

  22. Bamboo: A comprehensive benchmark for evaluating long text modeling capacities of large language models. Dong, Zican and Tang, Tianyi and Li, Junyi and Zhao, Wayne Xin and Wen, Ji-Rong. ACL 2024.

  23. Clongeval: A chinese benchmark for evaluating long-context large language models. Qiu, Zexuan and Li, Jingjing and Huang, Shijue and Jiao, Xiaoqi and Zhong, Wanjun and King, Irwin. EMNLP 2024.

  24. Detectiveqa: Evaluating long-context reasoning on detective novels. Xu, Zhe and Ye, Jiasheng and Liu, Xiangyang and Sun, Tianxiang and Liu, Xiaoran and Guo, Qipeng and Li, Linlin and Liu, Qun and Huang, Xuanjing and Qiu, Xipeng. Arxiv 2024.

  25. ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage. Taewhoo Lee, Chanwoong Yoon, Kyochul Jang, Donghyeon Lee, Minju Song, Hyunjae Kim, Jaewoo Kang. Arxiv 2024. GitHub Repo stars

  26. Extending long context evaluation beyond 100k tokens. Zhang, Xinrong and Chen, Yingfa and Hu, Shengding and Xu, Zihang and Chen, Junhao and Hao, Moo and Han, Xu and Thai, Zhen and Wang, Shuo and Liu, Zhiyuan and others. ACL 2024.

  27. Helmet: How to evaluate long-context language models effectively and thoroughly. Yen, Howard and Gao, Tianyu and Hou, Minmin and Ding, Ke and Fleischer, Daniel and Izsak, Peter and Wasserblat, Moshe and Chen, Danqi. ICLR 2025.

  28. L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?. Zecheng Tang and Keyan Zhou and Juntao Li and Baibei Ji and Jianye Hou and Min Zhang. Arxiv 2024.

  29. L-eval: Instituting standardized evaluation for long context language models. An, Chenxin and Gong, Shansan and Zhong, Ming and Zhao, Xingjian and Li, Mukai and Zhang, Jun and Kong, Lingpeng and Qiu, Xipeng. ACL 2024.

  30. Long Input Benchmark for Russian Analysis. Igor Churin, Murat Apishev, Maria Tikhonova, Denis Shevelev, Aydar Bulatov, Yuri Kuratov, Sergej Averkiev, Alena Fenogenova. Arxiv 2024.

  31. Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?. Jinhyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, Kelvin Guu. Arxiv 2024. GitHub Repo stars

  32. LONG2RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall. Qi, Zehan and Xu, Rongwu and Guo, Zhijiang and Wang, Cunxiang and Zhang, Hao and Xu, Wei. ACL 2024.

  33. Longbench: A bilingual, multitask benchmark for long context understanding. Bai, Yushi and Lv, Xin and Zhang, Jiajie and Lyu, Hongchang and Tang, Jiankai and Huang, Zhidian and Du, Zhengxiao and Liu, Xiao and Zeng, Aohan and Hou, Lei and others. ACL 2024.

  34. LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks. Yushi Bai, Shangqing Tu, Jiajie Zhang, Hao Peng, Xiaozhi Wang, Xin Lv, Shulin Cao, Jiazheng Xu, Lei Hou, Yuxiao Dong, Jie Tang, Juanzi Li. Arxiv 2024. GitHub Repo stars

  35. Longcite: Enabling llms to generate fine-grained citations in long-context qa. Zhang, Jiajie and Bai, Yushi and Lv, Xin and Gu, Wanjun and Liu, Danqing and Zou, Minhao and Cao, Shulin and Hou, Lei and Dong, Yuxiao and Feng, Ling and others. Arxiv 2024.

  36. Long-context llms struggle with long in-context learning. Li, Tianle and Zhang, Ge and Do, Quy Duc and Yue, Xiang and Chen, Wenhu. TMLR.

  37. LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory. Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, Dong Yu. Arxiv 2024. GitHub Repo stars

  38. Leave no document behind: Benchmarking long-context llms with extended multi-doc qa. Wang, Minzheng and Chen, Longze and Cheng, Fu and Liao, Shengyi and Zhang, Xinghua and Wu, Bingli and Yu, Haiyang and Xu, Nan and Zhang, Lei and Luo, Run and others. EMNLP 2024.

  39. LooGLE: Can Long-Context Language Models Understand Long Contexts?. Li, Jiaqi and Wang, Mengmeng and Zheng, Zilong and Zhang, Muhan. ACL 2024.

  40. LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K. Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, Yu Wang. Arxiv 2024. GitHub Repo stars

  41. Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation. Kaijian Zou, Muhammad Khalifa, Lu Wang. Arxiv 2024. GitHub Repo stars

  42. Marathon: A race through the realm of long context with large language models. Zhang, Lei and Li, Yunshui and Liu, Ziqiang and Liu, Junhao and Chen, Longze and Luo, Run and Yang, Min and others. ACL 2024.

  43. One Thousand and One Pairs: A "novel" challenge for long-context language models. Marzena Karpinska, Katherine Thai, Kyle Lo, Tanya Goyal, Mohit Iyyer. Arxiv 2024. GitHub Repo stars         Static Badge

  44. Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding. Zhihan Zhang, Yixin Cao, Chenchen Ye, Yunshan Ma, Lizi Liao, Tat-Seng Chua. Arxiv 2024.

  45. Zeroscrolls: A zero-shot benchmark for long text understanding. Shaham, Uri and Ivgi, Maor and Efrat, Avia and Berant, Jonathan and Levy, Omer. EMNLP 2023.

  46. DocFinQA: {A} Long-Context Financial Reasoning Dataset. Varshini Reddy, Rik Koncel{-}Kedziorski, Viet Dac Lai, Michael Krumdick, Charles Lovering, Chris Tanner. ACL 2024

  47. FinTextQA: A Dataset for Long-form Financial Question Answering. Jian Chen, Peilin Zhou, Yining Hua, Yingxin Loh, Kehui Chen, Ziyuan Li, Bing Zhu, Junwei Liang. Arxiv 2024.

  48. Long Code Arena: a Set of Benchmarks for Long-Context Code Models. Bogomolov, Egor and Eliseeva, Aleksandra and Galimzyanov, Timur and Glukhov, Evgeniy and Shapkin, Anton and Tigina, Maria and Golubev, Yaroslav and Kovrigin, Alexander and van Deursen, Arie and Izadi, Maliheh and others. Arxiv 2024.

  49. MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to 200K Tokens. Yongqi Fan, Hongli Sun, Kui Xue, Xiaofan Zhang, Shaoting Zhang, Tong Ruan. Arxiv 2024. GitHub Repo stars

  50. Examining Long-Context Large Language Models for Environmental Review Document Comprehension. Phan, Hung and Acharya, Anurag and Meyur, Rounak and Chaturvedi, Sarthak and Sharma, Shivam and Parker, Mike and Nally, Dan and Jannesari, Ali and Pazdernik, Karl and Halappanavar, Mahantesh and others. Arxiv 2024.

  51. Train short, test long: Attention with linear biases enables input length extrapolation. Ofir Press and Noah A. Smith and Mike Lewis. Arxiv 2022

  52. PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training. Dawei Zhu,Nan Yang,Liang Wang,Yifan Song,Wenhao Wu,Furu Wei,Sujian Li. Arxiv 2023. GitHub Repo stars

  53. Landmark Attention: Random-Access Infinite Context Length for Transformers. Amirkeivan Mohtashami, Martin Jaggi Arxiv 2023. GitHub Repo stars

  54. NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?. Mo Li, Songyang Zhang, Yunxin Liu, Kai Chen. Arxiv 2024. GitHub Repo stars

  55. Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model. Fabbri, Alexander Richard and Li, Irene and She, Tianwei and Li, Suyi and Radev, Dragomir. ACL 2019.

  56. Ms marco: A human-generated machine reading comprehension dataset. Nguyen, Tri and Rosenberg, Mir and Song, Xia and Gao, Jianfeng and Tiwary, Saurabh and Majumder, Rangan and Deng, Li. Arxiv 2016.

  57. U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack. Yunfan Gao, Yun Xiong, Wenlong Wu, Zijing Huang, Bohan Li, Haofen WangYunfan Gao, Yun Xiong, Wenlong Wu, Zijing Huang, Bohan Li, Haofen Wang. Arxiv 2025.         GitHub Repo stars

  58. L2M: Mutual Information Scaling Law for Long-Context Language Modeling. Zhuo Chen, Oriol Mayné i Comas, Zhuotao Jin, Di Luo, Marin Soljačić. Arxiv 2025.

  59. MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly. Zhaowei Wang, Wenhao Yu, Xiyu Ren, Jipeng Zhang, Yu Zhao, Rohit Saxena, Liang Cheng, Ginny Wong, Simon See, Pasquale Minervini, Yangqiu Song, Mark Steedman. Arxiv 2025. GitHub Repo stars

Long-Form Generation

  1. ELI5: Long form question answering. Fan, Angela and Jernite, Yacine and Perez, Ethan and Grangier, David and Weston, Jason and Auli, Michael. Arxiv 2019.

  2. Ms marco: A human-generated machine reading comprehension dataset. Nguyen, Tri and Rosenberg, Mir and Song, Xia and Gao, Jianfeng and Tiwary, Saurabh and Majumder, Rangan and Deng, Li. Arxiv 2016.

  3. Expertqa: Expert-curated questions and attributed answers. Malaviya, Chaitanya and Lee, Subin and Chen, Sihao and Sieber, Elizabeth and Yatskar, Mark and Roth, Dan. NAACL 2024.

  4. Proxyqa: An alternative framework for evaluating long-form text generation with large language models. Tan, Haochen and Guo, Zhijiang and Shi, Zhan and Xu, Lu and Liu, Zhili and Feng, Yunlong and Li, Xiaoguang and Wang, Yasheng and Shang, Lifeng and Liu, Qun and others. ACL 2024

  5. LongGenBench: Long-context Generation Benchmark. Xiang Liu, Peijie Dong, Xuming Hu, Xiaowen Chu. EMNLP 2024.

  6. ASQA: Factoid questions meet long-form answers. Stelmakh, Ivan and Luan, Yi and Dhingra, Bhuwan and Chang, Ming-Wei. EMNLP 2022.

  7. Qasa: advanced question answering on scientific articles. Lee, Yoonjoo and Lee, Kyungjae and Park, Sunghyun and Hwang, Dasol and Kim, Jaehyeon and Lee, Hong-in and Lee, Moontae. PMLR 2023.

  8. CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems. Sara Rosenthal, Avirup Sil, Radu Florian, Salim Roukos. Arxiv 2024. GitHub Repo stars

  9. LONG2RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall. Qi, Zehan and Xu, Rongwu and Guo, Zhijiang and Wang, Cunxiang and Zhang, Hao and Xu, Wei. ACL 2024.

  10. A Benchmark for Long-Form Medical Question Answering. Pedram Hosseini, Jessica M. Sin, Bing Ren, Bryceton G. Thomas, Elnaz Nouri, Ali Farahanchi, Saeed Hassanpour. NeurIPS 2024. GitHub Repo stars

  11. OLAPH: Improving Factuality in Biomedical Long-form Question Answering. Minbyul Jeong, Hyeon Hwang, Chanwoong Yoon, Taewhoo Lee, Jaewoo Kang. Arxiv 2024. GitHub Repo stars

  12. Factscore: Fine-grained atomic evaluation of factual precision in long form text generation. Min, Sewon and Krishna, Kalpesh and Lyu, Xinxi and Lewis, Mike and Yih, Wen-tau and Koh, Pang Wei and Iyyer, Mohit and Zettlemoyer, Luke and Hajishirzi, Hannaneh. EMNLP 2023.

  13. Long-form factuality in large language models. Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. Le. Arxiv 2024. GitHub Repo stars

  14. Large Language Models Still Exhibit Bias in Long Text. Wonje Jeung, Dongjae Jeon, Ashkan Yousefpour, Jonghyun Choi. Arxiv 2024.

  15. Aquamuse: Automatically generating datasets for query-based multi-document summarization. Kulkarni, Sayali and Chammas, Sheide and Zhu, Wan and Sha, Fei and Ie, Eugene. Arxiv 2020.

  16. Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model. Fabbri, Alexander Richard and Li, Irene and She, Tianwei and Li, Suyi and Radev, Dragomir. ACL 2019.

  17. LCFO: Long Context and Long Form Output Dataset and Benchmarking. Marta R. Costa-jussà, Pierre Andrews, Mariano Coria Meglioli, Joy Chen, Joe Chuang, David Dale, Christophe Ropers, Alexandre Mourachko, Eduardo Sánchez, Holger Schwenk, Tuan Tran, Arina Turkatenko, Carleigh Wood. Arxiv 2024.

  18. LongForm: Effective Instruction Tuning with Reverse Instructions. Koksal, Abdullatif and Schick, Timo and Korhonen, Anna and Schutze, Hinrich. EMNLP 2024.

  19. Suri: Multi-constraint Instruction Following for Long-form Text Generation. Chau Minh Pham, Simeng Sun, Mohit Iyyer. EMNLP 2024.         GitHub Repo stars

  20. LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs. Yushi Bai, Jiajie Zhang, Xin Lv, Linzhi Zheng, Siqi Zhu, Lei Hou, Yuxiao Dong, Jie Tang, Juanzi Li. Arxiv 2024. GitHub Repo stars

  21. Language Models can Self-Lengthen to Generate Long Texts. Shanghaoran Quan, Tianyi Tang, Bowen Yu, An Yang, Dayiheng Liu, Bofei Gao, Jianhong Tu, Yichang Zhang, Jingren Zhou, Junyang Lin. Arxiv 2024. GitHub Repo stars

  22. LOT: A story-centric benchmark for evaluating Chinese long text understanding and generation. Guan, Jian and Feng, Zhuoer and Chen, Yamei and He, Ruilin and Mao, Xiaoxi and Fan, Changjie and Huang, Minlie. TACL 2022.

  23. Longlamp: A benchmark for personalized long-form text generation. Kumar, Ishita and Viswanathan, Snigdha and Yerra, Sushrita and Salemi, Alireza and Rossi, Ryan A and Dernoncourt, Franck and Deilamsalehy, Hanieh and Chen, Xiang and Zhang, Ruiyi and Agarwal, Shubham and others. Arxiv 2o24.

  24. DOLOMITES: Domain-Specific Long-Form Methodical Tasks. Chaitanya Malaviya, Priyanka Agrawal, Kuzman Ganchev, Pranesh Srinivasan, Fantine Huot, Jonathan Berant, Mark Yatskar, Dipanjan Das, Mirella Lapata, Chris Alberti. Arxiv 2024.

  25. LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs. Yuhao Wu, Ming Shan Hee, Zhiqing Hu, Roy Ka-Wei Lee. Arxiv 2024. GitHub Repo stars

  26. LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation. Xi Ye, Fangcong Yin, Yinghui He, Joie Zhang, Howard Yen, Tianyu Gao, Greg Durrett, Danqi Chen. Arxiv 2025. GitHub Repo stars         Static Badge

  27. Hellobench: Evaluating long text generation capabilities of large language models. Que, Haoran and Duan, Feiyu and He, Liqun and Mou, Yutao and Zhou, Wangchunshu and Liu, Jiaheng and Rong, Wenge and Wang, Zekun Moore and Yang, Jian and Zhang, Ge and others. Arxiv 2024.         GitHub Repo stars

  28. The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input. Alon Jacovi, Andrew Wang, Chris Alberti, Connie Tao, Jon Lipovetz, Kate Olszewska, Lukas Haas, Michelle Liu, Nate Keating, Adam Bloniarz, Carl Saroufim, Corey Fry, Dror Marcus, Doron Kukliansky, Gaurav Singh Tomar, James Swirhun, Jinwei Xing, Lily Wang, Madhu Gurumurthy, Michael Aaron, Moran Ambar, Rachana Fellinger, Rui Wang, Zizhao Zhang, Sasha Goldshtein, Dipanjan Das. Arxiv 2025. Static Badge

  29. RAPID: Efficient Retrieval-Augmented Long Text Generation with Writing Planning and Information Discovery. Hongchao Gu, Dexun Li, Kuicai Dong, Hao Zhang, Hang Lv, Hao Wang, Defu Lian, Yong Liu, Enhong Chen. Arxiv 2025.

  30. DeFine: A Decomposed and Fine-Grained Annotated Dataset for Long-form Article Generation. Ming Wang, Fang Wang, Minghao Hu, Li He, Haiyang Wang, Jun Zhang, Tianwei Yan, Li Li, Zhunchen Luo, Wei Luo, Xiaoying Bai, Guotong Geng. Arxiv 2025.         GitHub Repo stars

  31. Lost-in-the-Middle in Long-Text Generation: Synthetic Dataset, Evaluation Framework, and Mitigation. Junhao Zhang, Richong Zhang, Fanshuang Kong, Ziyang Miao, Yanhan Ye, Yaowei Zheng. Arxiv 2025.         GitHub Repo stars

  32. Beyond Outlining: Heterogeneous Recursive Planning for Adaptive Long-form Writing with Language Models. Ruibin Xiong, Yimeng Chen, Dmitrii Khizbullin, Jürgen Schmidhuber. Arxiv 2025.         GitHub Repo stars

AI Infrastructure

Training

  1. Mixed precision training. Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, others. Arxiv 2017

  2. Megatron-lm: Training multi-billion parameter language models using model parallelism. Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro. Arxiv 2019

  3. Efficient sequence packing without cross-contamination: Accelerating large language models without impacting performance. Mario Michael Krell, Matej Kosec, Sergio P Perez, Andrew Fitzgibbon. Arxiv 2021

  4. Fptq: Fine-grained post-training quantization for large language models. Qingyuan Li, Yifan Zhang, Liang Li, Peng Yao, Bo Zhang, Xiangxiang Chu, Yerui Sun, Li Du, Yuchen Xie. Arxiv 2023

  5. Striped attention: Faster ring attention for causal transformers. William Brandon, Aniruddha Nrusimha, Kevin Qian, Zachary Ankner, Tian Jin, Zhiye Song, Jonathan Ragan-Kelley. Arxiv 2023

  6. Pytorch fsdp: experiences on scaling fully sharded data parallel. Yanli Zhao, Andrew Gu, Rohan Varma, Liang Luo, Chien-Chin Huang, Min Xu, Less Wright, Hamid Shojanazeri, Myle Ott, Sam Shleifer, others. Arxiv 2023

  7. Deepspeed ulysses: System optimizations for enabling training of extreme long sequence transformer models. Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He. Arxiv 2023

  8. Ring attention with blockwise transformers for near-infinite context. Hao Liu, Matei Zaharia, Pieter Abbeel. Arxiv 2023

  9. Fp8-lm: Training fp8 large language models. Houwen Peng, Kan Wu, Yixuan Wei, Guoshuai Zhao, Yuxiang Yang, Ze Liu, Yifan Xiong, Ziyue Yang, Bolin Ni, Jingcheng Hu, others. Arxiv 2023

  10. Structured packing in llm training improves long context utilization. Konrad Staniszewski, Szymon Tworkowski, Sebastian Jaszczur, Yu Zhao, Henryk Michalewski, {\L}ukasz Kuci{'n}ski, Piotr Mi{\l}o{'s}. Arxiv 2023

  11. Understanding llms: A comprehensive overview from training to inference. Yiheng Liu, Hao He, Tianle Han, Xu Zhang, Mengyuan Liu, Jiaming Tian, Yutong Zhang, Jiaqi Wang, Xiaohui Gao, Tianyang Zhong, others. Arxiv 2024

  12. DeepSeek-V3 Technical Report. DeepSeek-AI, Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Haowei Zhang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Li, Hui Qu, J.L. Cai, Jian Liang, Jianzhong Guo, Jiaqi Ni, Jiashi Li, Jiawei Wang, Jin Chen, Jingchang Chen, Jingyang Yuan, Junjie Qiu, Junlong Li, Junxiao Song, Kai Dong, Kai Hu, Kaige Gao, Kang Guan, Kexin Huang, Kuai Yu, Lean Wang, Lecong Zhang, Lei Xu, Leyi Xia, Liang Zhao, Litong Wang, Liyue Zhang, Meng Li, Miaojun Wang, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingming Li, Ning Tian, Panpan Huang, Peiyi Wang, Peng Zhang, Qiancheng Wang, Qihao Zhu, Qinyu Chen, Qiushi Du, R.J. Chen, R.L. Jin, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, Runxin Xu, Ruoyu Zhang, Ruyi Chen, S.S. Li, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shaoqing Wu, Shengfeng Ye, Shengfeng Ye, Shirong Ma, Shiyu Wang, Shuang Zhou, Shuiping Yu, Shunfeng Zhou, Shuting Pan, T. Wang, Tao Yun, Tian Pei, Tianyu Sun, W.L. Xiao, Wangding Zeng et al. (100 additional authors not shown). Arxiv 2025.         GitHub Repo stars

  13. Longalign: A recipe for long context alignment of large language models. Yushi Bai, Xin Lv, Jiajie Zhang, Yuze He, Ji Qi, Lei Hou, Jie Tang, Yuxiao Dong, Juanzi Li. Arxiv 2024

  14. Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models. Longze Chen, Ziqiang Liu, Wanwei He, Yunshui Li, Run Luo, Min Yang. Arxiv 2024.         GitHub Repo stars

  15. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning. Tri Dao. Arxiv 2023.         GitHub Repo stars

  16. Longskywork: A training recipe for efficiently extending context length in large language models. Liang Zhao, Tianwen Wei, Liang Zeng, Cheng Cheng, Liu Yang, Peng Cheng, Lijie Wang, Chenxia Li, Xuejie Wu, Bo Zhu, others. Arxiv 2024

  17. DataSculpt: Crafting Data Landscapes for Long-Context LLMs through Multi-Objective Partitioning. Keer Lu, Xiaonan Nie, Zheng Liang, Da Pan, Shusen Zhang, Keshi Zhao, Weipeng Chen, Zenan Zhou, Guosheng Dong, Bin Cui, others. Arxiv 2024

  18. How to Train Long-Context Language Models (Effectively). Tianyu Gao, Alexander Wettig, Howard Yen, Danqi Chen. Arxiv 2024.         GitHub Repo stars

  19. SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models. Wei Huang, Haotong Qin, Yangdong Liu, Yawei Li, Xianglong Liu, Luca Benini, Michele Magno, Xiaojuan Qi. Arxiv 2024

  20. Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum. Hadi Pouransari, Chun-Liang Li, Jen-Hao Rick Chang, Pavan Kumar Anasosalu Vasu, Cem Koc, Vaishaal Shankar, Oncel Tuzel. Arxiv 2024

  21. Enhancing training efficiency using packing with flash attention. Achintya Kundu, Rhui Dih Lee, Laura Wynter, Raghu Kiran Ganti, Mayank Mishra. Arxiv 2024

  22. FLUX: fast software-based communication overlap on gpus through kernel fusion. Li-Wen Chang, Wenlei Bao, Qi Hou, Chengquan Jiang, Ningxin Zheng, Yinmin Zhong, Xuanrun Zhang, Zuquan Song, Chengji Yao, Ziheng Jiang, others. Arxiv 2024

  23. Model Parallelism on Distributed Infrastructure: A Literature Review from Theory to LLM Case-Studies. Felix Brakel, Uraz Odyurt, Ana-Lucia Varbanescu. Arxiv 2024

  24. Demystifying Workload Imbalances in Large Transformer Model Training over Variable-length Sequences. Haoyang Li, Fangcheng Fu, Sheng Lin, Hao Ge, Xuanyu Wang, Jiawen Niu, Jie Jiang, Bin Cui. Arxiv 2024

  25. Collage: Light-Weight Low-Precision Strategy for LLM Training. Tao Yu, Gaurav Gupta, Karthick Gopalswamy, Amith Mamidala, Hao Zhou, Jeffrey Huynh, Youngsuk Park, Ron Diamant, Anoop Deoras, Luke Huan. Arxiv 2024

  26. COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training. Haocheng Xi, Han Cai, Ligeng Zhu, Yao Lu, Kurt Keutzer, Jianfei Chen, Song Han. Arxiv 2024

  27. When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training. Haonan Wang, Qian Liu, Chao Du, Tongyao Zhu, Cunxiao Du, Kenji Kawaguchi, Tianyu Pang. Arxiv 2024.         GitHub Repo stars

  28. Efficient training of large language models on distributed infrastructures: a survey. Jiangfei Duan, Shuo Zhang, Zerui Wang, Lijuan Jiang, Wenwen Qu, Qinghao Hu, Guoteng Wang, Qizhen Weng, Hang Yan, Xingcheng Zhang, others. Arxiv 2024

  29. Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention. Jingyang Yuan, Huazuo Gao, Damai Dai, Junyu Luo, Liang Zhao, Zhengyan Zhang, Zhenda Xie, Y. X. Wei, Lean Wang, Zhiping Xiao, Yuqing Wang, Chong Ruan, Ming Zhang, Wenfeng Liang, Wangding Zeng. Arxiv 2025.

  30. Qwen2. 5-1M Technical Report. An Yang, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoyan Huang, Jiandong Jiang, Jianhong Tu, Jianwei Zhang, Jingren Zhou, others. Arxiv 2025

  31. MoBA: Mixture of Block Attention for Long-Context LLMs. Enzhe Lu, Zhejun Jiang, Jingyuan Liu, Yulun Du, Tao Jiang, Chao Hong, Shaowei Liu, Weiran He, Enming Yuan, Yuzhi Wang, Zhiqi Huang, Huan Yuan, Suting Xu, Xinran Xu, Guokun Lai, Yanru Chen, Huabin Zheng, Junjie Yan, Jianlin Su, Yuxin Wu, Neo Y. Zhang, Zhilin Yang, Xinyu Zhou, Mingxing Zhang, Jiezhong Qiu. Arxiv 2025.         GitHub Repo stars

Inference

  1. Speed: Speculative pipelined execution for efficient decoding. Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Hasan Genc, Kurt Keutzer, Amir Gholami, Sophia Shao. Arxiv 2023

  2. vtensor: Flexible virtual tensor management for efficient llm serving. Jiale Xu, Rui Zhang, Cong Guo, Weiming Hu, Zihan Liu, Feiyang Wu, Yu Feng, Shixuan Sun, Changxu Shao, Yuhong Guo, others. Arxiv 2024

  3. Fastdecode: High-throughput gpu-efficient llm serving using heterogeneous pipelines. Jiaao He, Jidong Zhai. Arxiv 2024

  4. KV-Compress: Paged KV-Cache Compression with Variable Compression Rates per Attention Head. Isaac Rehg. Arxiv 2024.         GitHub Repo stars

  5. Magicdec: Breaking the latency-throughput tradeoff for long context generation with speculative decoding. Jian Chen, Vashisth Tiwari, Ranajoy Sadhukhan, Zhuoming Chen, Jinyuan Shi, Ian En-Hsu Yen, Beidi Chen. Arxiv 2024

  6. QAQ: Quality Adaptive Quantization for LLM KV Cache. Shichen Dong, Wen Cheng, Jiayu Qin, Wei Wang. Arxiv 2024

  7. Wkvquant: Quantizing weight and key/value cache for large language models gains more. Yuxuan Yue, Zhihang Yuan, Haojie Duanmu, Sifan Zhou, Jianlong Wu, Liqiang Nie. Arxiv 2024

  8. Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression. Peiyu Liu, Ze-Feng Gao, Wayne Xin Zhao, Yipeng Ma, Tao Wang, Ji-Rong Wen. Arxiv 2024.

  9. ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition. Lu Ye, Ze Tao, Yong Huang, Yang Li. Arxiv 2024.

  10. Memserve: Context caching for disaggregated llm serving with elastic memory pool. Cunchen Hu, Heyang Huang, Junhao Hu, Jiang Xu, Xusheng Chen, Tao Xie, Chenxi Wang, Sa Wang, Yungang Bao, Ninghui Sun, others. Arxiv 2024

  11. Efficient llm inference with i/o-aware partial kv cache recomputation. Chaoyi Jiang, Lei Gao, Hossein Entezari Zarch, Murali Annavaram. Arxiv 2024

  12. GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM. Hao Kang, Qingru Zhang, Souvik Kundu, Geonhwa Jeong, Zaoxing Liu, Tushar Krishna, Tuo Zhao. Arxiv 2024

  13. Scbench: A kv cache-centric analysis of long-context methods. Yucheng Li, Huiqiang Jiang, Qianhui Wu, Xufang Luo, Surin Ahn, Chengruidong Zhang, Amir H Abdi, Dongsheng Li, Jianfeng Gao, Yuqing Yang, others. Arxiv 2024

  14. Mooncake: A kvcache-centric disaggregated architecture for llm serving. Ruoyu Qin, Zheming Li, Weiran He, Mingxing Zhang, Yongwei Wu, Weimin Zheng, Xinran Xu. Arxiv 2024

  15. LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification. Penghui Yang, Cunxiao Du, Fengzhuo Zhang, Haonan Wang, Tianyu Pang, Chao Du, Bo An. Arxiv 2025.         GitHub Repo stars

  16. Long-Context Inference with Retrieval-Augmented Speculative Decoding. Guanzheng Chen, Qilong Feng, Jinjie Ni, Xin Li, Michael Qizhe Shieh. Arxiv 2025.         GitHub Repo stars

  17. Mamba Drafters for Speculative Decoding. Daewon Choi, Seunghyuk Oh, Saket Dingliwal, Jihoon Tack, Kyuyoung Kim, Woomin Song, Seojin Kim, Insu Han, Jinwoo Shin, Aram Galstyan, Shubham Katiyar, Sravan Babu Bodapati. Arxiv 2025

Interpretability

Performance Analysis

  1. Longrope: Extending llm context window beyond 2 million tokens Ding, Yiran and Zhang, Li Lyna and Zhang, Chengruidong and Xu, Yuanyuan and Shang, Ning and Xu, Jiahang and Yang, Fan and Yang, Mao. ICML 2024. GitHub Repo stars
  2. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Team, Gemini and Georgiev, Petko and Lei, Ving Ian and Burnell, Ryan and Bai, Libin and Gulati, Anmol and Tanzer, Garrett and Vincent, Damien and Pan, Zhufeng and Wang, Shibo and others. Arxiv 2024.
  3. RULER: What’s the Real Context Size of Your Long-Context Language Models? Hsieh, Cheng-Ping and Sun, Simeng and Kriman, Samuel and Acharya, Shantanu and Rekesh, Dima and Jia, Fei and Ginsburg, Boris. Arxiv 2024. GitHub Repo stars
  4. Lost in the middle: How language models use long contexts Liu, Nelson F and Lin, Kevin and Hewitt, John and Paranjape, Ashwin and Bevilacqua, Michele and Petroni, Fabio and Liang, Percy. ACL 2024.
  5. Make Your LLM Fully Utilize the Context Shengnan An and Zexiong Ma and Zeqi Lin and Nanning Zheng and Jian-Guang Lou and Weizhu Chen. NeurIPS 2024. GitHub Repo stars
  6. Never Lost in the Middle: Mastering Long-Context Question Answering with Position-Agnostic Decompositional Training He, Junqing and Pan, Kunhao and Dong, Xiaoqun and Song, Zhuoyang and Liu, Yibo and Sun, Qianguo and Liang, Yuxin and Wang, Hao and Zhang, Enming and Zhang, Jiaxing. ACL 2024.
  7. Compression Represents Intelligence Linearly Huang, Yuzhen and Zhang, Jinghan and Shan, Zifei and He, Junxian. COLM 2024. GitHub Repo stars
  8. Can Perplexity Reflect Large Language Model's Ability in Long Text Understanding? Hu, Yutong and Huang, Quzhe and Tao, Mingxu and Zhang, Chen and Feng, Yansong. ICLR 2024.
  9. Do Long-Range Language Models Actually Use Long-Range Context? Sun, Simeng and Krishna, Kalpesh and Mattarella-Micke, Andrew and Iyyer, Mohit. ACL 2021.
  10. Extending context window of large language models via positional interpolation Chen, Shouyuan and Wong, Sherman and Chen, Liangjian and Tian, Yuandong. Arxiv 2023.
  11. What is Wrong with Perplexity for Long-context Language Modeling? Fang, Lizhe and Wang, Yifei and Liu, Zhaoyang and Zhang, Chenheng and Jegelka, Stefanie and Gao, Jinyang and Ding, Bolin and Wang, Yisen. ICLR 2025.
  12. Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach Li, Zhuowan and Li, Cheng and Zhang, Mingyang and Mei, Qiaozhu and Bendersky, Michael. ACL 2024.
  13. Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG Jin, Bowen and Yoon, Jinsung and Han, Jiawei and Arik, Sercan O. Arxiv 2024.
  14. Longrag: Enhancing retrieval-augmented generation with long-context llms Jiang, Ziyan and Ma, Xueguang and Chen, Wenhu. Arxiv 2024. GitHub Repo stars

Model Structure Analysis

  1. Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains. Matthew Tancik, Pratul Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan Barron, Ren Ng. NeurIPS 2020. GitHub Repo stars

  2. In-context Learning and Induction Heads. Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, Chris Olah. Arxiv 2022

  3. YaRN: Efficient Context Window Extension of Large Language Models. Bowen Peng, Jeffrey Quesnelle, Honglu Fan, Enrico Shippole. ICLR 2024. GitHub Repo stars

  4. Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small. Kevin Ro Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, Jacob Steinhardt. ICLR 2023. GitHub Repo stars

  5. Scaling laws of rope-based extrapolation. Xiaoran Liu, Hang Yan, Shuo Zhang, Chenxin An, Xipeng Qiu, Dahua Lin. ICLR 2024. GitHub Repo stars

  6. Base of RoPE Bounds Context Length. Xin Men, Mingyu Xu, Bingning Wang, Qingyu Zhang, Hongyu Lin, Xianpei Han, Weipeng Chen. NeurIPS 2024

  7. LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models. Chi Han, Qifan Wang, Hao Peng, Wenhan Xiong, Yu Chen, Heng Ji, Sinong Wang. NAACL 2024. GitHub Repo stars

  8. Neurons in Large Language Models: Dead, N-gram, Positional. Elena Voita, Javier Ferrando, Christoforos Nalmpantis. ACL Findings 2024

  9. Interpreting and Improving Large Language Models in Arithmetic Calculation. Wei Zhang, Chaoqun Wan, Yonggang Zhang, Yiu-Ming Cheung, Xinmei Tian, Xu Shen, Jieping Ye. ICML 2024

  10. Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning. Yu Fu, Zefan Cai, Abedelkadir Asi, Wayne Xiong, Yue Dong, Wen Xiao. ICLR 2025. GitHub Repo stars

  11. Rope to Nope and Back Again: A New Hybrid Attention Strategy. Bowen Yang, Bharat Venkitesh, Dwarak Talupuru, Hangyu Lin, David Cairuz, Phil Blunsom, Acyr Locatelli. Arxiv 2025

  12. Retrieval Head Mechanistically Explains Long-Context Factuality. Wenhao Wu, Yizhong Wang, Guangxuan Xiao, Hao Peng, Yao Fu. ICLR 2025. GitHub Repo stars

Application

Agent

  1. Agents: An Open-source Framework for Autonomous Language Agents. Wangchunshu Zhou, Yuchen Eleanor Jiang, Long Li, Jialong Wu, Tiannan Wang, Shi Qiu, Jintian Zhang, Jing Chen, Ruipu Wu, Shuai Wang, Shiding Zhu, Jiyu Chen, Wentao Zhang, Ningyu Zhang, Huajun Chen, Peng Cui, Mrinmaya Sachan. Arxiv 2023

  2. ReAct: Synergizing Reasoning and Acting in Language Models. Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, Yuan Cao. ICLR 2023

  3. The Rise and Potential of Large Language Model Based Agents: {A} Survey. Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan Dou, Rongxiang Weng, Wensen Cheng, Qi Zhang, Wenjuan Qin, Yongyan Zheng, Xipeng Qiu, Xuanjing Huang, Tao Gui. Arxiv 2023

  4. Benchmarking Large Language Models As {AI} Research Agents. Qian Huang, Jian Vora, Percy Liang, Jure Leskovec. Arxiv 2023

  5. VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?. Junpeng Liu, Yifan Song, Bill Yuchen Lin, Wai Lam, Graham Neubig, Yuanzhi Li, Xiang Yue. Arxiv 2024

  6. Towards General Computer Control: {A} Multimodal Agent for Red Dead Redemption {II} as a Case Study. Weihao Tan, Ziluo Ding, Wentao Zhang, Boyu Li, Bohan Zhou, Junpeng Yue, Haochong Xia, Jiechuan Jiang, Longtao Zheng, Xinrun Xu, Yifei Bi, Pengjie Gu, Xinrun Wang, B{"{o}}rje F. Karlsson, Bo An, Zongqing Lu. Arxiv 2024

  7. TravelAgent: An {AI} Assistant for Personalized Travel Planning. Aili Chen, Xuyang Ge, Ziquan Fu, Yanghua Xiao, Jiangjie Chen. Arxiv 2024

  8. SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering. John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, Ofir Press. NeurIPS 2024

  9. GPTSwarm: Language Agents as Optimizable Graphs. Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, J{"{u}}rgen Schmidhuber. ICML 2024

  10. SWE-bench: Can Language Models Resolve Real-world Github Issues?. Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, Karthik R. Narasimhan. ICLR 2024

  11. AutoCodeRover: Autonomous Program Improvement. Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, Abhik Roychoudhury. Proceedings of the 33rd {ACM} {SIGSOFT} International Symposium on Software Testing and Analysis, {ISSTA} 2024, Vienna, Austria, September 16-20, 2024 2024

  12. OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments. Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, Tao Yu. NeurIPS 2024

  13. WebArena: {A} Realistic Web Environment for Building Autonomous Agents. Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, Graham Neubig. ICLR 2024

  14. Agentless: Demystifying LLM-based Software Engineering Agents. Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, Lingming Zhang. Arxiv 2024

  15. Symbolic Learning Enables Self-Evolving Agents. Wangchunshu Zhou, Yixin Ou, Shengwei Ding, Long Li, Jialong Wu, Tiannan Wang, Jiamin Chen, Shuai Wang, Xiaohua Xu, Ningyu Zhang, Huajun Chen, Yuchen Eleanor Jiang. Arxiv 2024

  16. MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering. Jun Shern Chan, Neil Chowdhury, Oliver Jaffe, James Aung, Dane Sherburn, Evan Mays, Giulio Starace, Kevin Liu, Leon Maksin, Tejal Patwardhan, Lilian Weng, Aleksander Madry. Arxiv 2024

RAG

  1. How Can Recommender Systems Benefit from Large Language Models: A Survey. Jianghao Lin, Xinyi Dai, Yunjia Xi, Weiwen Liu, Bo Chen, Xiangyang Li, Chenxu Zhu, Huifeng Guo, Yong Yu, Ruiming Tang, Weinan Zhang. 2023
  2. A Comprehensive Survey of Retrieval-Augmented Generation RAG: Evolution, Current Landscape and Future Directions. Shailja Gupta, Rajesh Ranjan, Surya Narayan Singh. 2024
  3. Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG. Bowen Jin, Jinsung Yoon, Jiawei Han, Sercan {"{O}}. Arik. 2024
  4. LitLLM: A Toolkit for Scientific Literature Review. Shubham Agarwal, Issam H. Laradji, Laurent Charlin, Christopher Pal. 2024
  5. SPAR: Personalized Content-Based Recommendation via Long Engagement Attention. Chiyu Zhang, Yifei Sun, Jun Chen, Jie Lei, Muhammad Abdul{-}Mageed, Sinong Wang, Rong Jin, Sem Park, Ning Yao, Bo Long. 2024
  6. ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation. Jianghao Lin, Rong Shan, Chenxu Zhu, Kounianhua Du, Bo Chen, Shigang Quan, Ruiming Tang, Yong Yu, Weinan Zhang. 2024
  7. HyPA-RAG: A Hybrid Parameter Adaptive Retrieval-Augmented Generation System for AI Legal and Policy Applications. Rishi Kalra, Zekun Wu, Ayesha Gulley, Airlie Hilliard, Xin Guan, Adriano S. Koshiyama, Philip C. Treleaven. 2024
  8. In Defense of RAG in the Era of Long-Context Language Models. Tan Yu, Anbang Xu, Rama Akkiraju. 2024
  9. Let long-term interests talk: An disentangled learning model for recommendation based on short-term interests generation. Sirui Duan, Mengya Ouyang, Rong Wang, Qian Li, Yunpeng Xiao. 2025

Chatbot

  1. MemoryBank: Enhancing Large Language Models with Long-Term Memory. Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, Yanlin Wang. Arxiv 2023.

        GitHub Repo stars

  1. Augmenting Language Models with Long-Term Memory. Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei. 2023
  2. Kimi Chat. Moonshot AI. 2023
  3. Character AI. {Character AI}. 2023
  4. I’m Pi, Your personal AI. Inflection. 2023
  5. Prompted LLMs as Chatbot Modules for Long Open-domain Conversation. Gibbeum Lee, Volker Hartmann, Jongho Park, Dimitris Papailiopoulos, Kangwook Lee. 2023
  6. Understanding the Impact of Long-Term Memory on Self-Disclosure with Large Language Model-Driven Chatbots for Public Health Intervention. Eunkyung Jo, Yuin Jeong, SoHyun Park, Daniel A. Epstein, Young{-}Ho Kim. 2024
  7. Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models. Xindi Wang, Mahsa Salmani, Parsa Omidi, Xiangyu Ren, Mehdi Rezagholizadeh, Armaghan Eshaghi. 2024
  8. Memory and New Controls for ChatGPT. OpenAI. 2024

Code

  1. GitHub Copilot. GitHub. 2022
  2. RepoFusion: Training Code Models to Understand Your Repository. Disha Shrivastava, Denis Kocetkov, Harm de Vries, Dzmitry Bahdanau, Torsten Scholak. 2023
  3. Repository-Level Prompt Generation for Large Language Models of Code. Disha Shrivastava, Hugo Larochelle, Daniel Tarlow. 2023
  4. RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation. Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian{-}Guang Lou, Weizhu Chen. 2023
  5. Granite Code Models: A Family of Open Foundation Models for Code Intelligence. Mayank Mishra, Matt Stallone, Gaoyuan Zhang, Yikang Shen, Aditya Prasad, Adriana Meza Soria, Michele Merler, Parameswaran Selvam, Saptha Surendran, Shivdeep Singh, Manish Sethi, Xuan{-}Hong Dang, Pengyuan Li, Kun{-}Lung Wu, Syed Zawad, Andrew Coleman, Matthew White, Mark Lewis, Raju Pavuluri, Yan Koyfman, Boris Lublinsky, Maximilien de Bayser, Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal, Yi Zhou, Chris Johnson, Aanchal Goyal, Hima Patel, S. Yousaf Shah, Petros Zerfos, Heiko Ludwig, Asim Munawar, Maxwell Crouse, Pavan Kapanipathi, Shweta Salaria, Bob Calio, Sophia Wen, Seetharami Seelam, Brian Belgodere, Carlos A. Fonseca, Amith Singhee, Nirmit Desai, David D. Cox, Ruchir Puri, Rameswar Panda. 2024
  6. RepoHyper: Better Context Retrieval Is All You Need for Repository-Level Code Completion. Huy Nhat Phan, Hoang Nhat Phan, Tien N. Nguyen, Nghi D. Q. Bui. 2024
  7. Qwen2.5-Coder Technical Report. Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Kai Dang, An Yang, Rui Men, Fei Huang, Xingzhang Ren, Xuancheng Ren, Jingren Zhou, Junyang Lin. 2024
  8. A Survey on Large Language Models for Code Generation. Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, Sunghun Kim. 2024
  9. StarCoder 2 and The Stack v2: The Next Generation. Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy{-}Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen{-}Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo, Evgenii Zheltonozhskii, Nii Osae Osae Dade, Wenhao Yu, Lucas Krau{\ss}, Naman Jain, Yixuan Su, Xuanli He, Manan Dey, Edoardo Abati, Yekun Chai, Niklas Muennighoff, Xiangru Tang, Muhtasham Oblokulov, Christopher Akiki, Marc Marone, Chenghao Mou, Mayank Mishra, Alex Gu, Binyuan Hui, Tri Dao, Armel Zebaze, Olivier Dehaene, Nicolas Patry, Canwen Xu, Julian J. McAuley, Han Hu, Torsten Scholak, S{'{e}}bastien Paquet, Jennifer Robinson, Carolyn Jane Anderson, Nicolas Chapados, et al.. 2024
  10. Cursor - The AI Code Editor. Anysphere. 2025

NLP Tasks

  1. Longformer: The Long-Document Transformer. Iz Beltagy, Matthew E. Peters, Arman Cohan. Arxiv 2020.

        GitHub Repo stars

  1. Big Bird: Transformers for Longer Sequences. Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. NeurIPS 2020.

        GitHub Repo stars

  1. LongEmbed: Extending Embedding Models for Long Context Retrieval. Dawei Zhu, Liang Wang, Nan Yang, Yifan Song, Wenhao Wu, Furu Wei, Sujian Li. Arxiv 2024.

        GitHub Repo stars

  1. Document-Level Neural Machine Translation with Hierarchical Attention Networks. Lesly Miculicich, Dhananjay Ram, Nikolaos Pappas, James Henderson. 2018
  2. Improving the Transformer Translation Model with Document-Level Context. Jiacheng Zhang, Huanbo Luan, Maosong Sun, Feifei Zhai, Jingfang Xu, Min Zhang, Yang Liu. 2018
  3. HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization. Xingxing Zhang, Furu Wei, Ming Zhou. 2019
  4. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter J. Liu. 2020
  5. G-Transformer for Document-Level Machine Translation. Guangsheng Bao, Yue Zhang, Zhiyang Teng, Boxing Chen, Weihua Luo. 2021
  6. LongT5: Efficient Text-To-Text Transformer for Long Sequences. Mandy Guo, Joshua Ainslie, David C. Uthus, Santiago Onta{~{n}}{'{o}}n, Jianmo Ni, Yun{-}Hsuan Sung, Yinfei Yang. 2022
  7. Large Language Models for Information Retrieval: A Survey. Yutao Zhu, Huaying Yuan, Shuting Wang, Jiongnan Liu, Wenhan Liu, Chenlong Deng, Zhicheng Dou, Ji{-}Rong Wen. 2023
  8. Improving Long Context Document-Level Machine Translation. Christian Herold, Hermann Ney. 2023
  9. Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents. Michael G{"{u}}nther, Jackmin Ong, Isabelle Mohr, Alaeddine Abdessalem, Tanguy Abel, Mohammad Kalim Akram, Susana Guzman, Georgios Mastrapas, Saba Sturua, Bo Wang, Maximilian Werk, Nan Wang, Han Xiao. 2023
  10. Document-Level Machine Translation with Large Language Models. Longyue Wang, Chenyang Lyu, Tianbo Ji, Zhirui Zhang, Dian Yu, Shuming Shi, Zhaopeng Tu. 2023
  11. Benchmarking and Improving Long-Text Translation with Large Language Models. Longyue Wang, Zefeng Du, Wenxiang Jiao, Chenyang Lyu, Jianhui Pang, Leyang Cui, Kaiqiang Song, Derek F. Wong, Shuming Shi, Zhaopeng Tu. 2024
  12. A Paradigm Shift: The Future of Machine Translation Lies with Large Language Models. Chenyang Lyu, Zefeng Du, Jitao Xu, Yitao Duan, Minghao Wu, Teresa Lynn, Alham Fikri Aji, Derek F. Wong, Longyue Wang. 2024
  13. Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT. Jon Saad{-}Falcon, Daniel Y. Fu, Simran Arora, Neel Guha, Christopher R{'{e}}. 2024
  14. Improving Text Embeddings with Large Language Models. Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei. 2024
  15. A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods. Hanlei Jin, Yang Zhang, Dan Meng, Jun Wang, Jinghua Tan. 2024
  16. BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation. Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, Zheng Liu. 2024
  17. New Embedding Models and API Updates. OpenAI. 2024
  18. [A study of extractive summarization of long documents incorporating local topic and hierarchical information.] Ting Wang, Chuan Yang, Maoyang Zou, Jiaying Liang, Dong Xiang, Wenjie Yang, Hongyang Wang, Jia Li. 2024 [MISSING]
  19. Leveraging Long-Context Large Language Models for Multi-Document Understanding and Summarization in Enterprise Applications. Aditi S. Godbole, Jabin Geevarghese George, Smita Shandilya. 2024

Multimodal Tasks

  1. Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts. Aditya Sharma, Michael Saxon, William Yang Wang. Arxiv 2024.

        Static Badge

  1. Many-Shot In-Context Learning in Multimodal Foundation Models. Yixing Jiang, Jeremy Irvin, Ji Hun Wang, Muhammad Ahmed Chaudhry, Jonathan H. Chen, Andrew Y. Ng. Arxiv 2024.

        GitHub Repo stars

  1. LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models. Shangqing Tu, Yucheng Wang, Daniel Zhang-Li, Yushi Bai, Jifan Yu, Yuhao Wu, Lei Hou, Huiqin Liu, Zhiyuan Liu, Bin Xu, Juanzi Li. Arxiv 2025.

        GitHub Repo stars

  1. MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly. Zhaowei Wang, Wenhao Yu, Xiyu Ren, Jipeng Zhang, Yu Zhao, Rohit Saxena, Liang Cheng, Ginny Wong, Simon See, Pasquale Minervini, Yangqiu Song, Mark Steedman. Arxiv 2025.

        GitHub Repo stars

  1. EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture. Jiaqi Xu, Xinyi Zou, Kunzhe Huang, Yunkuo Chen, Bo Liu, MengLi Cheng, Xing Shi, Jun Huang. Arxiv 2024. GitHub Repo stars

  2. VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos. Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal. Arxiv 2024.

  3. PostDoc: Generating Poster from a Long Multimodal Document Using Deep Submodular Optimization. Vijay Jaisankar, Sambaran Bandyopadhyay, Kalp Vyas, Varre Chaitanya, Shwetha Somasundaram. Arxiv 2024.

  4. Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies. Hung-Ting Su, Chun-Tong Chao, Ya-Ching Hsu, Xudong Lin, Yulei Niu, Hung-Yi Lee, Winston H. Hsu. Arxiv 2024. GitHub Repo stars         Static Badge

  5. Towards Event-oriented Long Video Understanding. Yifan Du, Kun Zhou, Yuqi Huo, Yifan Li, Wayne Xin Zhao, Haoyu Lu, Zijia Zhao, Bingning Wang, Weipeng Chen, Ji-Rong Wen. Arxiv 2024. GitHub Repo stars

  6. An End-to-End Speech Summarization Using Large Language Model. Hengchao Shang, Zongyao Li, Jiaxin Guo, Shaojun Li, Zhiqiang Rao, Yuanchang Luo, Daimeng Wei, Hao Yang. Arxiv 2024.

  7. KeyVideoLLM: Towards Large-scale Video Keyframe Selection. Hao Liang, Jiapeng Li, Tianyi Bai, Chong Chen, Conghui He, Bin Cui, Wentao Zhang. Arxiv 2024.

  8. OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding. Tiancheng Zhao, Qianqian Zhang, Kyusong Lee, Peng Liu, Lu Zhang, Chunxin Fang, Jiajia Liao, Kelei Jiang, Yibo Ma, Ruochen Xu. Arxiv 2024.

  9. MATE: Meet At The Embedding -- Connecting Images with Long Texts. Young Kyun Jang, Junmo Kang, Yong Jae Lee, Donghyun Kim. Arxiv 2024.

  10. mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models. Jiabo Ye, Haiyang Xu, Haowei Liu, Anwen Hu, Ming Yan, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou. Arxiv 2024. GitHub Repo stars

  11. LongVILA: Scaling Long-Context Visual Language Models for Long Videos. Fuzhao Xue, Yukang Chen, Dacheng Li, Qinghao Hu, Ligeng Zhu, Xiuyu Li, Yunhao Fang, Haotian Tang, Shang Yang, Zhijian Liu, Ethan He, Hongxu Yin, Pavlo Molchanov, Jan Kautz, Linxi Fan, Yuke Zhu, Yao Lu, Song Han. Arxiv 2024. GitHub Repo stars

  12. DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework. Zhifei Xie, Daniel Tang, Dingwei Tan, Jacques Klein, Tegawend F. Bissyand, Saad Ezzini. Arxiv 2024.

  13. Bridging Episodes and Semantics: A Novel Framework for Long-Form Video Understanding. Gueter Josmy Faure, Jia-Fong Yeh, Min-Hung Chen, Hung-Ting Su, Winston H. Hsu, Shang-Hong Lai. ECCV 2024 Workshop. GitHub Repo stars         Static Badge

  14. VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges. Yuxuan Wang, Cihang Xie, Yang Liu, Zilong Zheng. Arxiv 2024. GitHub Repo stars         Static Badge

  15. Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation. Nithin Rao Koluguri, Travis Bartley, Hainan Xu, Oleksii Hrinchuk, Jagadeesh Balam, Boris Ginsburg, Georg Kucsko. Arxiv 2024.

  16. LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture. Xidong Wang, Dingjie Song, Shunian Chen, Chen Zhang, Benyou Wang. Arxiv 2024. GitHub Repo stars

  17. VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models. Jiapeng Wang, Chengyu Wang, Kunzhe Huang, Jun Huang, Lianwen Jin. Arxiv 2024.

  18. Rethinking Visual Dependency in Long-Context Reasoning for Large Vision-Language Models. Yucheng Zhou, Zhi Rao, Jun Wan, Jianbing Shen. Arxiv 2024.

  19. SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation. Yining Hong, Beide Liu, Maxine Wu, Yuanhao Zhai, Kai-Wei Chang, Lingjie Li, Kevin Lin, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, Yingnian Wu, Lijuan Wang. Arxiv 2024. GitHub Repo stars         Static Badge

  20. LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation. Weiquan Huang, Aoqi Wu, Yifan Yang, Xufang Luo, Yuqing Yang, Liang Hu, Qi Dai, Xiyang Dai, Dongdong Chen, Chong Luo, Lili Qiu. NeurIPS 2024. GitHub Repo stars

  21. ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos. Tanveer Hannan, Md Mohaiminul Islam, Jindong Gu, Thomas Seidl, Gedas Bertasius. Arxiv 2024. GitHub Repo stars

  22. T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs. Shukang Yin, Chaoyou Fu, Sirui Zhao, Yunhang Shen, Chunjiang Ge, Yan Yang, Zuwei Long, Yuhan Dai, Tong Xu, Xing Sun, Ran He, Caifeng Shan, Enhong Chen. Arxiv 2024. GitHub Repo stars

  23. Owl-1: Omni World Model for Consistent Long Video Generation. Yuanhui Huang, Wenzhao Zheng, Yuan Gao, Xin Tao, Pengfei Wan, Di Zhang, Jie Zhou, Jiwen Lu. Arxiv 2024. GitHub Repo stars

  24. VCA: Video Curious Agent for Long Video Understanding. Zeyuan Yang, Delin Chen, Xueyang Yu, Maohao Shen, Chuang Gan. Arxiv 2024.

  25. Enhancing Multi-Text Long Video Generation Consistency without Tuning: Time-Frequency Analysis, Prompt Alignment, and Theory. Xingyao Li, Fengzhuo Zhang, Jiachun Pan, Yunlong Hou, Vincent Y. F. Tan, Zhuoran Yang. Arxiv 2024.

  26. ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding. Xiao Wang, Qingyi Si, Jianlong Wu, Shiyu Zhu, Li Cao, Liqiang Nie. Arxiv 2024. GitHub Repo stars

  27. LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token. Shaolei Zhang, Qingkai Fang, Zhe Yang, Yang Feng. Arxiv 2024. GitHub Repo stars

  28. Temporal Preference Optimization for Long-Form Video Understanding. Rui Li, Xiaohan Wang, Yuhui Zhang, Zeyu Wang, Serena Yeung-Levy. Arxiv 2025. GitHub Repo stars         Static Badge

  29. Latent Swap Joint Diffusion for Long-Form Audio Generation. Yusheng Dai, Chenxi Wang, Chang Li, Chen Wang, Jun Du, Kewei Li, Ruoyu Wang, Jiefeng Ma, Lei Sun, Jianqing Gao. Arxiv 2025. Static Badge

  30. MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation. Sihyun Yu, Meera Hahn, Dan Kondratyuk, Jinwoo Shin, Agrim Gupta, José Lezama, Irfan Essa, David Ross, Jonathan Huang. Arxiv 2025.

  31. VideoRoPE: What Makes for Good Video Rotary Position Embedding?. Xilin Wei, Xiaoran Liu, Yuhang Zang, Xiaoyi Dong, Pan Zhang, Yuhang Cao, Jian Tong, Haodong Duan, Qipeng Guo, Jiaqi Wang, Xipeng Qiu, Dahua Lin. Arxiv 2025. GitHub Repo stars

  32. Adaptive Keyframe Sampling for Long Video Understanding. Xi Tang, Jihao Qiu, Lingxi Xie, Yunjie Tian, Jianbin Jiao, Qixiang Ye. CVPR 2025. GitHub Repo stars

  33. Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing. Yudong Liu, Jingwei Sun, Yueqian Lin, Jingyang Zhang, Ming Yin, Qinsi Wang, Jianyi Zhang, Hai Li, Yiran Chen. Arxiv 2025.

  34. Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding. Weiyu Guo, Ziyang Chen, Shaoguang Wang, Jianxiang He, Yijie Xu, Jinhui Ye, Ying Sun, Hui Xiong. Arxiv 2025.

  35. AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding. Xiao Wang, Qingyi Si, Jianlong Wu, Shiyu Zhu, Li Cao, Liqiang Nie. Arxiv 2025. GitHub Repo stars

  36. Atlas: Multi-Scale Attention Improves Long Context Image Modeling. Kumar Krishna Agrawal, Long Lian, Longchao Liu, Natalia Harguindeguy, Boyi Li, Alexander Bick, Maggie Chung, Trevor Darrell, Adam Yala. Arxiv 2025. GitHub Repo stars

  37. Multimodal Long Video Modeling Based on Temporal Dynamic Context. Haoran Hao, Jiaming Han, Yiyuan Zhang, Xiangyu Yue. Arxiv 2025. GitHub Repo stars

  38. Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding. Xiaoyi Zhang, Zhaoyang Jia, Zongyu Guo, Jiahao Li, Bin Li, Houqiang Li, Yan Lu. Arxiv 2025.

  39. DynTok: Dynamic Compression of Visual Tokens for Efficient and Effective Video Understanding. Hongzhi Zhang, Jingyuan Zhang, Xingguang Ji, Qi Wang, Fuzheng Zhang. Arxiv 2025.

  40. EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models. Yantai Yang, Yuhao Wang, Zichen Wen, Luo Zhongwei, Chang Zou, Zhipeng Zhang, Chuan Wen, Linfeng Zhang. Arxiv 2025.

  41. InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding. Minsoo Kim, Kyuhong Shim, Jungwook Choi, Simyung Chang. Arxiv 2025.

  42. MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context Inference. Kunxi Li, Zhonghua Jiang, Zhouzhou Shen, Zhaode Wang, Chengfei Lv, Shengyu Zhang, Fan Wu, Fei Wu. Arxiv 2025.

  43. Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens. Zeyuan Yang, Xueyang Yu, Delin Chen, Maohao Shen, Chuang Gan. Arxiv 2025. GitHub Repo stars

Specific Domains

  1. Abstractive Text Summarization by Incorporating Reader Comments. Shen Gao, Xiuying Chen, Piji Li, Zhaochun Ren, Lidong Bing, Dongyan Zhao, Rui Yan. 2019
  2. A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges. Yuqi Nie, Yaxuan Kong, Xiaowen Dong, John M. Mulvey, H. Vincent Poor, Qingsong Wen, Stefan Zohren. 2024
  3. MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to 200K Tokens. Yongqi Fan, Hongli Sun, Kui Xue, Xiaofan Zhang, Shaoting Zhang, Tong Ruan. 2024
  4. Promises and pitfalls of artificial intelligence for legal applications. Sayash Kapoor, Peter Henderson, Arvind Narayanan. 2024
  5. Leveraging Long-Context Large Language Models for Multi-Document Understanding and Summarization in Enterprise Applications. Aditi S. Godbole, Jabin Geevarghese George, Smita Shandilya. 2024
  6. DocFinQA: A Long-Context Financial Reasoning Dataset. Varshini Reddy, Rik Koncel{-}Kedziorski, Viet Dac Lai, Michael Krumdick, Charles Lovering, Chris Tanner. 2024
  7. Evaluating and Training Long-Context Large Language Models for Question Answering on Scientific Papers. Lukas Hilgert, Danni Liu, Jan Niehues. 2024
  8. LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents. Ahmed Masry, Amir Hajian. 2024

Future Directions

Long CoT

  1. When More is Less: Understanding Chain-of-Thought Length in LLMs. Yuyang Wu, Yifei Wang, Tianqi Du, Stefanie Jegelka, Yisen Wang. Arxiv 2025.

  2. LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!. Dacheng Li, Shiyi Cao, Tyler Griggs, Shu Liu, Xiangxi Mo, Shishir G. Patil, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica. Arxiv 2025.         GitHub Repo stars

  3. Monte Carlo Tree Diffusion for System 2 Planning. Jaesik Yoon, Hyeonseo Cho, Doojin Baek, Yoshua Bengio, Sungjin Ahn. Arxiv 2025.

  4. Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning. Qifan Yu, Zhenyu He, Sijie Li, Xun Zhou, Jun Zhang, Jingjing Xu, Di He. Arxiv 2025.         GitHub Repo stars

  5. CoT-Valve: Length-Compressible Chain-of-Thought Tuning. Xinyin Ma, Guangnian Wan, Runpeng Yu, Gongfan Fang, Xinchao Wang. Arxiv 2025.         GitHub Repo stars

  6. Efficient Long-Decoding Inference with Reasoning-Aware Attention Sparsity. Junhao Hu, Wenrui Huang, Weidong Wang, Zhenwen Li, Tiancheng Hu, Zhixia Liu, Xusheng Chen, Tao Xie, Yizhou Shan. Arxiv 2025.

  7. DRT: Deep Reasoning Translation via Long Chain-of-Thought. Jiaan Wang, Fandong Meng, Yunlong Liang, Jie Zhou. Arxiv 2024.         GitHub Repo stars

  8. Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs. Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu. Arxiv 2024.

  9. O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?. Zhen Huang, Haoyang Zou, Xuefeng Li, Yixiu Liu, Yuxiang Zheng, Ethan Chern, Shijie Xia, Yiwei Qin, Weizhe Yuan, Pengfei Liu. Arxiv 2024.         GitHub Repo stars

  10. OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning. Yuxiang Zhang, Yuqi Yang, Jiangming Shu, Yuhang Wang, Jinlin Xiao, Jitao Sang. Arxiv 2024.         GitHub Repo stars

  11. Dynamic Chain-of-Thought: Towards Adaptive Deep Reasoning. Libo Wang. Arxiv 2025.         GitHub Repo stars

  12. SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities. Fengqing Jiang, Zhangchen Xu, Yuetai Li, Luyao Niu, Zhen Xiang, Bo Li, Bill Yuchen Lin, Radha Poovendran. Arxiv 2025.         Static Badge

  13. Leveraging Constrained Monte Carlo Tree Search to Generate Reliable Long Chain-of-Thought for Mathematical Reasoning. Qingwen Lin, Boyan Xu, Zijian Li, Zhifeng Hao, Keli Zhang, Ruichu Cai. Arxiv 2025.

  14. Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?. Zhiyuan Zeng, Qinyuan Cheng, Zhangyue Yin, Yunhua Zhou, Xipeng Qiu. Arxiv 2025.

  15. TokenSkip: Controllable Chain-of-Thought Compression in LLMs. Heming Xia, Yongqi Li, Chak Tou Leong, Wenjie Wang, Wenjie Li. Arxiv 2025.         GitHub Repo stars

  16. LightThinker: Thinking Step-by-Step Compression. Jintian Zhang, Yuqi Zhu, Mengshu Sun, Yujie Luo, Shuofei Qiao, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang. Arxiv 2025.         GitHub Repo stars

  17. Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning. Wenkai Yang, Shuming Ma, Yankai Lin, Furu Wei. Arxiv 2025.

  18. Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?. Yancheng He, Shilong Li, Jiaheng Liu, Weixun Wang, Xingyuan Bu, Ge Zhang, Zhongyuan Peng, Zhaoxiang Zhang, Zhicheng Zheng, Wenbo Su, Bo Zheng. Arxiv 2025.         GitHub Repo stars

  19. Towards Widening The Distillation Bottleneck for Reasoning Models. Huifeng Yin, Yu Zhao, Minghao Wu, Xuanfan Ni, Bo Zeng, Hao Wang, Tianqi Shi, Liangying Shao, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang. Arxiv 2025.

  20. What's Behind PPO's Collapse in Long-CoT? Value Optimization Holds the Secret. Yufeng Yuan, Yu Yue, Ruofei Zhu, Tiantian Fan, Lin Yan. Arxiv 2025.

  21. MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving. Ruida Wang, Rui Pan, Yuxin Li, Jipeng Zhang, Yizhen Jia, Shizhe Diao, Renjie Pi, Junjie Hu, Tong Zhang. Arxiv 2025.

  22. START: Self-taught Reasoner with Tools. Chengpeng Li, Mingfeng Xue, Zhenru Zhang, Jiaxi Yang, Beichen Zhang, Xiang Wang, Bowen Yu, Binyuan Hui, Junyang Lin, Dayiheng Liu. Arxiv 2025.

  23. L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning. Pranjal Aggarwal, Sean Welleck. Arxiv 2025.

  24. InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models. Yuchen Yan, Yongliang Shen, Yang Liu, Jin Jiang, Mengdi Zhang, Jian Shao, Yueting Zhuang. Arxiv 2025.

  25. Attention Reveals More Than Tokens: Training-Free Long-Context Reasoning with Attention-guided Retrieval. Yuwei Zhang, Jayanth Srinivasa, Gaowen Liu, Jingbo Shang. Arxiv 2025.

  26. "Well, Keep Thinking": Enhancing LLM Reasoning with Adaptive Injection Decoding. Hyunbin Jin, Je Won Yeom, Seunghyun Bae, Taesup Kim. Arxiv 2025.

  27. Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond. Liang Wen, Yunke Cai, Fenrui Xiao, Xin He, Qi An, Zhenyu Duan, Yimin Du, Junchen Liu, Lifu Tang, Xiaowei Lv, Haosheng Zou, Yongchao Deng, Shousheng Jia, Xiangzheng Zhang. Arxiv 2025.

        GitHub Repo stars

  1. Unlocking General Long Chain-of-Thought Reasoning Capabilities of Large Language Models via Representation Engineering. Xinyu Tang, Xiaolei Wang, Zhihao Lv, Yingqian Min, Wayne Xin Zhao, Binbin Hu, Ziqi Liu, Zhiqiang Zhang. Arxiv 2025. GitHub Repo stars

  2. Large Reasoning Models in Agent Scenarios: Exploring the Necessity of Reasoning Capabilities. Xueyang Zhou, Guiyao Tie, Guowen Zhang, Weidong Wang, Zhigang Zuo, Di Wu, Duanfeng Chu, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong. Arxiv 2025.

  3. PENCIL: Long Thoughts with Short Memory. Chenxiao Yang, Nathan Srebro, David McAllester, Zhiyuan Li. Arxiv 2025.

  4. Long Is More Important Than Difficult for Training Reasoning Models. Si Shen, Fei Huang, Zhixiao Zhao, Chang Liu, Tiansheng Zheng, Danhao Zhu. Arxiv 2025.

  5. SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild. Weihao Zeng, Yuzhen Huang, Qian Liu, Wei Liu, Keqing He, Zejun Ma, Junxian He. Arxiv 2025. GitHub Repo stars

  6. TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers' Guidance. Jingxian Xu, Mengyu Zhou, Weichang Liu, Hanbing Liu, Shi Han, Dongmei Zhang. Arxiv 2025.

  7. SKIntern: Internalizing Symbolic Knowledge for Distilling Better CoT Capabilities into Small Language Models. Huanxuan Liao, Shizhu He, Yupu Hao, Xiang Li, Yuanzhe Zhang, Jun Zhao, Kang Liu. COLING 2025. GitHub Repo stars

  8. ReTool: Reinforcement Learning for Strategic Tool Use in LLMs. Jiazhan Feng, Shijue Huang, Xingwei Qu, Ge Zhang, Yujia Qin, Baoquan Zhong, Chengquan Jiang, Jinxin Chi, Wanjun Zhong. Arxiv 2025. GitHub Repo stars

  9. Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models. Yule Liu, Jingyi Zheng, Zhen Sun, Zifan Peng, Wenhan Dong, Zeyang Sha, Shiwen Cui, Weiqiang Wang, Xinlei He. Arxiv 2025.

  10. THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models. Xiao Pu, Michael Saxon, Wenyue Hua, William Yang Wang. Arxiv 2025.

  11. Dynamic Early Exit in Reasoning Models. Chenxu Yang, Qingyi Si, Yongjie Duan, Zheliang Zhu, Chenyu Zhu, Zheng Lin, Li Cao, Weiping Wang. Arxiv 2025.

  12. Process Reward Models That Think. Muhammad Khalifa, Rishabh Agarwal, Lajanugen Logeswaran, Jaekyeom Kim, Hao Peng, Moontae Lee, Honglak Lee, Lu Wang. Arxiv 2025. GitHub Repo stars

  13. AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization. Haotian Luo, Haiying He, Yibo Wang, Jinluan Yang, Rui Liu, Naiqiang Tan, Xiaochun Cao, Dacheng Tao, Li Shen. Arxiv 2025. GitHub Repo stars

  14. Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs. Jinyan Su, Jennifer Healey, Preslav Nakov, Claire Cardie. Arxiv 2025.

  15. Llama-Nemotron: Efficient Reasoning Models. Jinyan Su, Jennifer Healey, Preslav Nakov, Claire Cardie. Arxiv 2025.

  16. RM-R1: Reward Modeling as Reasoning. Xiusi Chen, Gaotang Li, Ziqi Wang, Bowen Jin, Cheng Qian, Yu Wang, Hongru Wang, Yu Zhang, Denghui Zhang, Tong Zhang, Hanghang Tong, Heng Ji. Arxiv 2025. GitHub Repo stars

  17. Think on your Feet: Adaptive Thinking via Reinforcement Learning for Social Agents. Minzheng Wang, Yongbin Li, Haobo Wang, Xinghua Zhang, Nan Xu, Bingli Wu, Fei Huang, Haiyang Yu, Wenji Mao. Arxiv 2025. GitHub Repo stars

  18. DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models. Yuxuan Jiang, Dawei Li, Frank Ferraro. Arxiv 2025.

  19. Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning. Minwu Kim, Anubhav Shrestha, Safal Shrestha, Aadim Nepal, Keith Ross. Arxiv 2025. GitHub Repo stars

  20. ThinkSwitcher: When to Think Hard, When to Think Fast. Guosheng Liang, Longguang Zhong, Ziyi Yang, Xiaojun Quan. Arxiv 2025.

  21. Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning. Jinghui Lu, Haiyang Yu, Siliang Xu, Shiwei Ran, Guozhi Tang, Siqi Wang, Bin Shan, Teng Fu, Hao Feng, Jingqun Tang, Han Wang, Can Huang. Arxiv 2025.

  22. Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space. Zhen Zhang, Xuehai He, Weixiang Yan, Ao Shen, Chenyang Zhao, Shuohang Wang, Yelong Shen, Xin Eric Wang. Arxiv 2025. GitHub Repo stars

  23. ThinkLess: A Training-Free Inference-Efficient Method for Reducing Reasoning Redundancy. Gengyang Li, Yifeng Gao, Yuming Li, Yunfang Wu. Arxiv 2025.

  24. Learn to Reason Efficiently with Adaptive Length-based Reward Shaping. Wei Liu, Ruochen Zhou, Yiyun Deng, Yuzhen Huang, Junteng Liu, Yuntian Deng, Yizhe Zhang, Junxian He. Arxiv 2025. GitHub Repo stars

  25. When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning. Xiaoyun Zhang, Jingqing Ruan, Xing Ma, Yawen Zhu, Haodong Zhao, Hao Li, Jiansong Chen, Ke Zeng, Xunliang Cai. Arxiv 2025.

  26. QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning. Fanqi Wan, Weizhou Shen, Shengyi Liao, Yingcheng Shi, Chenliang Li, Ziyi Yang, Ji Zhang, Fei Huang, Jingren Zhou, Ming Yan. Arxiv 2025. GitHub Repo stars

  27. Amplify Adjacent Token Differences: Enhancing Long Chain-of-Thought Reasoning with Shift-FFN. Yao Xu, Mingyu Xu, Fangyu Lei, Wangtao Sun, Xiangrong Zeng, Bingning Wang, Guang Liu, Shizhu He, Jun Zhao, Kang Liu. Arxiv 2025. GitHub Repo stars

  28. Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning. Michael Hassid, Gabriel Synnaeve, Yossi Adi, Roy Schwartz. Arxiv 2025.

  29. ARM: Adaptive Reasoning Model. Siye Wu, Jian Xie, Yikai Zhang, Aili Chen, Kai Zhang, Yu Su, Yanghua Xiao. Arxiv 2025. GitHub Repo stars

  30. Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic Lens. Xixian Yong, Xiao Zhou, Yingying Zhang, Jinlin Li, Yefeng Zheng, Xian Wu. Arxiv 2025. GitHub Repo stars

  31. AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time. Junyu Zhang, Runpei Dong, Han Wang, Xuying Ning, Haoran Geng, Peihao Li, Xialin He, Yutong Bai, Jitendra Malik, Saurabh Gupta, Huan Zhang. Arxiv 2025. GitHub Repo stars

  32. A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings. Xiaoang Xu, Shuo Wang, Xu Han, Zhenghao Liu, Huijia Wu, Peipei Li, Zhiyuan Liu, Maosong Sun, Zhaofeng He. Arxiv 2025. GitHub Repo stars

  33. AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models. Feng Luo, Yu-Neng Chuang, Guanchu Wang, Hoang Anh Duy Le, Shaochen Zhong, Hongyi Liu, Jiayi Yuan, Yang Sui, Vladimir Braverman, Vipin Chaudhary, Xia Hu. Arxiv 2025.

  34. Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning. Mingyang Song, Mao Zheng. Arxiv 2025. GitHub Repo stars

  35. Self-Route: Automatic Mode Switching via Capability Estimation for Efficient Reasoning. Yang He, Xiao Ding, Bibo Cai, Yufei Zhang, Kai Xiong, Zhouhao Sun, Bing Qin, Ting Liu. Arxiv 2025.

  36. Adaptive Deep Reasoning: Triggering Deep Thinking When Needed. Yunhao Wang, Yuhao Zhang, Tinghao Yu, Can Xu, Feng Zhang, Fengzong Lian. Arxiv 2025.

  37. Thinking Fast and Right: Balancing Accuracy and Reasoning Length with Adaptive Rewards. Jinyan Su, Claire Cardie. Arxiv 2025.

  38. Route to Reason: Adaptive Routing for LLM and Reasoning Strategy Selection. Zhihong Pan, Kai Zhang, Yuze Zhao, Yupeng Han. Arxiv 2025. GitHub Repo stars

  39. TL;DR: Too Long, Do Re-weighting for Effcient LLM Reasoning Compression. Zhong-Zhi Li, Xiao Liang, Zihao Tang, Lei Ji, Peijie Wang, Haotian Xu, Xing W, Haizhen Huang, Weiwei Deng, Ying Nian Wu, Yeyun Gong, Zhijiang Guo, Xiao Liu, Fei Yin, Cheng-Lin Liu. Arxiv 2025. GitHub Repo stars

  40. Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning. Chen Qian, Dongrui Liu, Haochen Wen, Zhen Bai, Yong Liu, Jing Shao. Arxiv 2025. GitHub Repo stars

  41. Long or short CoT? Investigating Instance-level Switch of Large Reasoning Models. Ruiqi Zhang, Changyi Xiao, Yixin Cao. Arxiv 2025.

  42. Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models. Soumya Suvra Ghosal, Souradip Chakraborty, Avinash Reddy, Yifu Lu, Mengdi Wang, Dinesh Manocha, Furong Huang, Mohammad Ghavamzadeh, Amrit Singh Bedi. Arxiv 2025.

  43. Kinetics: Rethinking Test-Time Scaling Laws. Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng, Yang Zhou, Emma Strubell, Beidi Chen. Arxiv 2025. GitHub Repo stars

  44. Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning. Violet Xiang, Chase Blagden, Rafael Rafailov, Nathan Lile, Sang Truong, Chelsea Finn, Nick Haber. Arxiv 2025.

  45. Through the Valley: Path to Effective Long CoT Training for Small Language Models. Renjie Luo, Jiaxi Li, Chen Huang, Wei Lu. Arxiv 2025.

  46. Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency. Chenlong Wang, Yuanning Feng, Dongping Chen, Zhaoyang Chu, Ranjay Krishna, Tianyi Zhou. Arxiv 2025.

  47. AdapThink: Adaptive Thinking Preferences for Reasoning Language Model. Xu Wan, Wei Wang, Wenyue Xu, Wotao Yin, Jie Song, Mingyang Sun. Arxiv 2025.

  48. OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling. Zengzhi Wang, Fan Zhou, Xuefeng Li, Pengfei Liu. Arxiv 2025. GitHub Repo stars

  49. AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length Control. Ruosen Li, Ziming Luo, Quan Zhang, Ruochen Li, Ben Zhou, Ali Payani, Xinya Du. Arxiv 2025.

  50. Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model. Bowen Ding, Yuhan Chen, Futing Wang, Lingfeng Ming, Tao Lin. Arxiv 2025. GitHub Repo stars

Acknowledgments

Please contact us if We miss your names in the list, I will add you back ASAP!

Contributors

Star History

Star History Chart

About

A Comprehensive Survey on Long Context Language Modeling

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 12