Awesome-On-Device-AI-Inference 🔍📲

This list highlights academic work focused on running AI models efficiently on resource-constrained mobile devices, including (1) edge devices (e.g., NVIDIA Jetson), (2) smartphones (e.g., Snapdragon/Exynos), (3) and microcontrollers for energy-harvesting or batteryless IoT devices, with a primary focus on research conducted for edge devices & smartphones. This repo references Awesome-On-Device-AI-Systems by Jeho Lee.

A-1. Single-DNN Inference on Single Mobile Processors

General DNN inference

[MLSys 2025] AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration (Paper)
- LLM; Desktop & edge devices; GPU
[MLSys 2025] MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices (Paper)
- Attention-based NN; Edge devices; NPU
[MLSys 2025] Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking (Paper)
- LLM; Smartphones; NPU (Simulation)
[ASPLOS 2025] Fast On-device LLM Inference with NPUs (llm.npu) (Paper)
- LLM; Smartphones; NPU
[IEEE TMC 2025] NeuroBalancer: Balancing System Frequencies With Punctual Laziness for Timely and Energy-Efficient DNN Inferences (Paper)
- CNN; Smartphones; GPU
[ASPLOS 2024] SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile (Paper)
- CNN, Transformer, and LLM; Smartphones; GPU
[MobiCom 2024] FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices (Paper)
- CNN; Smartphones; CPU
[MobiCom 2024] Mobile Foundation Model as Firmware (Paper)
- Foundation model; Edge devices & smartphones; CPU or GPU
[MobiSys 2024] Empowering In-Browser Deep Learning Inference on Edge Devices with Just-in-Time Kernel Optimization (Paper)
- Transformer; Smartphones & laptops; (Web)GPU
[ASPLOS 2023] STI: Turbocharge NLP Inference at the Edge via Elastic Pipelining (Paper)
- NLP (BERT); Edge devices; CPU or GPU
[MobiCom 2022] Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs (Paper)
- CNN; Smartphones; GPU
[MobiCom 2022] NeuLens: Spatial-based Dynamic Acceleration of Convolutional Neural Networks on Edge (Paper)
- CNN; Edge devices; GPU
[MICRO 2022] GCD2: A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs (Paper)
- CNN and GAN; Smartphones; DSP (NPU)
[MobiCom 2021] AsyMo: Scalable and Efficient Deep-Learning Inference on Asymmetric Mobile CPUs (Paper)
- CNN and RNN; Smartphones; CPU
[PLDI 2021] DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion (Paper)
- CNN and Transformer; Smartphones; CPU or GPU

Application-specific optimization

[MobiSys 2025] ARIA: Optimizing Vision Foundation Model Inference on Heterogeneous Mobile Processors for Augmented Reality (To Appear)
- Vision foundation model for augmented reality; Smartphones
[AAAI 2025] E4: Energy-Efficient DNN Inference for Edge Video Analytics Via Early Exiting and DVFS (Paper)
- Video analytics; Edge devices; GPU
[MobiCom 2024] Panopticus: Omnidirectional 3D Object Detection on Resource-constrained Edge Devices (Paper)
- 3D object detection; Edge devices; GPU
[MobiSys 2023] OmniLive: Super-Resolution Enhanced 360° Video Live Streaming for Mobile Devices (Paper)
- Video super-resolution; Smartphones; GPU
[IEEE TMC 2023] NAWQ-SR: A Hybrid-Precision NPU Engine for Efficient On-Device Super-Resolution (Paper)
- Single-image super-resolution; Smartphones; NPU
[EuroSys 2022] LiteReconfig: Cost and Content Aware Reconfiguration of Video Object Detection Systems for Mobile GPUs (Paper)
- Video analytics; Edge devices; GPU
[UbiComp 2022] Efficient On-Device Visual Question Answering (Paper)
- Visual question answering; Edge devices & smartphones; GPU
[MobiCom 2021] Flexible High-Resolution Object Detection on Edge Devices with Tunable Latency (Paper)
- Object detection; Edge devices; GPU
[MobiCom 2020] NEMO: enabling neural-enhanced video streaming on commodity mobile devices (Paper)
- Video super-resolution; Smartphones; GPU

A-2. Single-DNN Inference on Heterogeneous Mobile Processors

General DNN inference

[EuroSys 2025] Flex: Fast, Accurate DNN Inference on Low-Cost Edges Using Heterogeneous Accelerator Execution (Paper)
- Smartphones; CPU + GPU + NPU (TPU/DSP)
[arXiv 2025] HeteroLLM: Accelerating Large Language Model Inference on Mobile SoCs with Heterogeneous AI Accelerators (Paper)
- LLM; Smartphones; CPU + GPU + NPU
[arXiv 2024] PowerInfer-2: Fast Large Language Model Inference on a Smartphone (Paper)
- LLM; Smartphones; CPU + NPU
[IEEE TMC 2024] Thermal-Aware Scheduling for Deep Learning on Mobile Devices with NPU (Paper)
- CNN; Smartphones; GPU + NPU
[ICDE 2023] EdgeNN: Efficient Neural Network Inference for CPU-GPU Integrated Edge Devices (Paper)
- CNN; Edge devices; CPU + GPU
[ATC 2023] Decentralized Application-Level Adaptive Scheduling for Multi-Instance DNNs on Open Mobile Devices (Paper)
- CNN; Smartphones; CPU + GPU
[IPSN 2021] Efficient Execution of Deep Neural Networks on Mobile Devices with NPU (Paper)
- CNN; Smartphones; CPU + NPU
[EuroSys 2019] µLayer: Low Latency On-Device Inference Using Cooperative Single-Layer Acceleration and Processor-Friendly Quantization (Paper)
- CNN; Smartphones; CPU + GPU

Application-specific optimization

[MobiCom 2024] Perceptual-Centric Image Super-Resolution using Heterogeneous Processors on Mobile Devices (Paper)
- Single-image super-resolution; Smartphones; GPU + NPU
[ICDE 2024] COUPLE: Orchestrating Video Analytics on Heterogeneous Mobile Processors (Paper)
- Video analytics (object detection); Smartphones; GPU + DSP (NPU)
[IPSN 2023] PointSplit: Towards On-device 3D Object Detection with Heterogeneous Low-power Accelerators (Paper)
- 3D object detection; Edge devices; GPU + NPU
[MobiCom 2019] MobiSR: Efficient On-Device Super-Resolution through Heterogeneous Mobile Processors (Paper)
- Single-image super-resolution; Smartphones; CPU + GPU + DSP (NPU)

A-3. Single-DNN Inference across Multiple Mobile Devices

[INFOCOM 2024] Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference (Paper)
- Transformer; Multiple edge devices; CPU + GPU

B-1. Multi-DNN Inference on Single Mobile Processors

General DNN inference

[IEEE TMC 2024] SwapNet: Efficient Swapping for DNN Inference on Edge AI Devices Beyond the Memory Budget (Paper)
- CNN; Edge devices; GPU
[MobiSys 2024] Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs (Paper)
- Edge devices; GPU
[MICRO 2023] Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads (Paper)
- CNN, Attention-based NN, and NLP; From smartphones to data centers; NPU (Simulation)
[SenSys 2023] Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU (Paper)
- Edge devices; GPU
[HPCA 2021] Layerweaver: Maximizing Resource Utilization of Neural Processing Units via Layer-Wise Scheduling (Paper)
- NPU (Simulation)
[MobiCom 2021] LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision (Paper)
- Edge devices & smartphones; CPU or GPU
[PerCom 2021] MASA: Responsive Multi-DNN Inference on the Edge (Paper)
- CNN; Edge devices (Raspberry Pi); CPU

Application-specific optimization

[MobiCom 2020] Heimdall: Mobile GPU Coordination Platform for Augmented Reality Applications (Paper)
- Augmented reality; Smartphones; GPU

B-2. Multiple DNN Inference on Heterogeneous Mobile Processors

General DNN inference

[PPoPP 2024] Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous SoCs (HaX-CoNN) (Paper)
- Edge devices; GPU + DLA (NPU)
[SEC 2024] Elastic Execution of Multi-Tenant DNNs on Heterogeneous Edge MPSoCs (Paper)
- Smartphones; CPU + GPU + DSP (NPU)
[MobiSys 2023] NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors (Paper)
- Smartphones; CPU + GPU + DSP (NPU)
[MobiSys 2022] Band: Coordinated Multi-DNN Inference on Heterogeneous Mobile Processors (Paper)
- Smartphones; CPU + GPU + DSP + NPU
[MobiSys 2022] CoDL: efficient CPU-GPU co-execution for deep learning inference on mobile devices (Paper)
- Smartphones; CPU + GPU
[SenSys 2022] BlastNet: Exploiting Duo-Blocks for Cross-Processor Real-Time DNN Inference (Paper)
- Edge devices; CPU + GPU
[ACM TACO 2021] SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms (Paper)
- Smartphones, edge devices, and desktop computers; CPU + GPU + DSP (NPU)
[RTSS 2019] Pipelined Data-Parallel CPU/GPU Scheduling for Multi-DNN Real-Time Inference (Paper)
- Edge devices (NVIDIA TX2) and desktop computers (Intel x86 Xeon); CPU + GPU

C. Single-DNN Inference on Microcontrollers

[SenSys 2025] Lupe: Integrating the Top-down Approach with DNN Execution on Ultra-Low-Power Devices (To Appear)
- Ultra-low-power MCU (MSP430 series)
[SenSys 2024] Intermittent Inference: Trading a 1% Accuracy Loss for a 1.9 x Throughput Speedup (Paper)
- High-performance MCU (ARM Cortex-M series)
[SenSys 2024] Fast-Inf: Ultra-Fast Embedded Intelligence on the Batteryless Edge (Paper)
- Ultra-low-power MCU (MSP430 series)
[MobiSys 2023] HarvNet: Resource-Optimized Operation of Multi-Exit Deep Neural Networks on Energy Harvesting Devices (Paper)
- Ultra-low-power MCU (MSP430 series)
[ASPLOS 2023] Space-Efficient TREC for Enabling Deep Learning on Microcontrollers (Paper)
- High-performance MCU (ARM Cortex-M series)
[MLSys 2021] MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers (Paper)
- High-performance MCU (ARM Cortex-M series)

Challenges

Mobile AI Workshop

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome-On-Device-AI-Inference 🔍📲

A-1. Single-DNN Inference on Single Mobile Processors

A-2. Single-DNN Inference on Heterogeneous Mobile Processors

A-3. Single-DNN Inference across Multiple Mobile Devices

B-1. Multi-DNN Inference on Single Mobile Processors

B-2. Multiple DNN Inference on Heterogeneous Mobile Processors

C. Single-DNN Inference on Microcontrollers

Challenges

About

Uh oh!

Releases

Packages

iamseonghoon/Awesome-On-Device-AI-Inference

Folders and files

Latest commit

History

Repository files navigation

Awesome-On-Device-AI-Inference 🔍📲

A-1. Single-DNN Inference on Single Mobile Processors

A-2. Single-DNN Inference on Heterogeneous Mobile Processors

A-3. Single-DNN Inference across Multiple Mobile Devices

B-1. Multi-DNN Inference on Single Mobile Processors

B-2. Multiple DNN Inference on Heterogeneous Mobile Processors

C. Single-DNN Inference on Microcontrollers

Challenges

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages