Awesome research works for on-device AI systems

A curated list of research works on efficient on-device AI systems, methods, and applications for mobile and edge devices.

Note: Some of the works are designed for inference acceleration on cloud/server infrastructure, which has much higher computational resources, but I also include them here if they can be potentially generalized to on-device inference use cases.

Attention Operation Acceleration

[MLSys 2025] MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
[MLSys 2025] TurboAttention: Efficient attention approximation for High Throughputs LLMs
[ASPLOS 2023] FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
[NeurIPS 2022] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

LLM Inference Acceleration on Mobile SoCs

Hardware-aware Quantization

Compiler-based ML Optimization

Inference Acceleration using Heterogeneous Computing Processors (e.g., CPU, GPU, NPU, etc.)

Adaptive Inference for Optimized Resource Utilization

On-device Training, Model Adaptation

Profilers

By Conference (2025~)

MLSys 2025

ASPLOS 2025

[Fast On-device LLM Inference with NPUs]
Energy-aware Scheduling and Input Buffer Overflow Prevention for Energy-harvesting Systems
Generalizing Reuse Patterns for Efficient DNN on Microcontrollers
Nazar: Monitoring and Adapting ML Models on Mobile Devices

EuroSys 2025

Flex: Fast, Accurate DNN Inference on Low-Cost Edges Using Heterogeneous Accelerator Execution
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge

SOSP 2025

MobiSys 2025

ARIA: Optimizing Vision Foundation Model Inference on Heterogeneous Mobile Processors for Augmented Reality

MobiCom 2025

Preprint 2025

HeteroLLM: Accelerating Large Language Model Inference on Mobile SoCs with Heterogeneous AI Accelerators

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome research works for on-device AI systems

Attention Operation Acceleration

LLM Inference Acceleration on Mobile SoCs

Hardware-aware Quantization

Compiler-based ML Optimization

Inference Acceleration using Heterogeneous Computing Processors (e.g., CPU, GPU, NPU, etc.)

Adaptive Inference for Optimized Resource Utilization

On-device Training, Model Adaptation

Profilers

By Conference (2025~)

About

Uh oh!

Releases

Packages

Contributors 2

jeho-lee/Awesome-On-Device-AI-Systems

Folders and files

Latest commit

History

Repository files navigation

Awesome research works for on-device AI systems

Attention Operation Acceleration

LLM Inference Acceleration on Mobile SoCs

Hardware-aware Quantization

Compiler-based ML Optimization

Inference Acceleration using Heterogeneous Computing Processors (e.g., CPU, GPU, NPU, etc.)

Adaptive Inference for Optimized Resource Utilization

On-device Training, Model Adaptation

Profilers

By Conference (2025~)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages