Skip to content

iamseonghoon/Awesome-On-Device-AI-Inference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 

Repository files navigation

Awesome-On-Device-AI-Inference 🔍📲

This list highlights academic work focused on running AI models efficiently on resource-constrained mobile devices, including (1) edge devices (e.g., NVIDIA Jetson), (2) smartphones (e.g., Snapdragon/Exynos), (3) and microcontrollers for energy-harvesting or batteryless IoT devices, with a primary focus on research conducted for edge devices & smartphones. This repo references Awesome-On-Device-AI-Systems by Jeho Lee.

A-1. Single-DNN Inference on Single Mobile Processors

General DNN inference

  • [MLSys 2025] AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration (Paper)
    • LLM; Desktop & edge devices; GPU
  • [MLSys 2025] MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices (Paper)
    • Attention-based NN; Edge devices; NPU
  • [MLSys 2025] Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking (Paper)
    • LLM; Smartphones; NPU (Simulation)
  • [ASPLOS 2025] Fast On-device LLM Inference with NPUs (llm.npu) (Paper)
    • LLM; Smartphones; NPU
  • [IEEE TMC 2025] NeuroBalancer: Balancing System Frequencies With Punctual Laziness for Timely and Energy-Efficient DNN Inferences (Paper)
    • CNN; Smartphones; GPU
  • [ASPLOS 2024] SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile (Paper)
    • CNN, Transformer, and LLM; Smartphones; GPU
  • [MobiCom 2024] FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices (Paper)
    • CNN; Smartphones; CPU
  • [MobiCom 2024] Mobile Foundation Model as Firmware (Paper)
    • Foundation model; Edge devices & smartphones; CPU or GPU
  • [MobiSys 2024] Empowering In-Browser Deep Learning Inference on Edge Devices with Just-in-Time Kernel Optimization (Paper)
    • Transformer; Smartphones & laptops; (Web)GPU
  • [ASPLOS 2023] STI: Turbocharge NLP Inference at the Edge via Elastic Pipelining (Paper)
    • NLP (BERT); Edge devices; CPU or GPU
  • [MobiCom 2022] Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs (Paper)
    • CNN; Smartphones; GPU
  • [MobiCom 2022] NeuLens: Spatial-based Dynamic Acceleration of Convolutional Neural Networks on Edge (Paper)
    • CNN; Edge devices; GPU
  • [MICRO 2022] GCD2: A Globally Optimizing Compiler for Mapping DNNs to Mobile DSPs (Paper)
    • CNN and GAN; Smartphones; DSP (NPU)
  • [MobiCom 2021] AsyMo: Scalable and Efficient Deep-Learning Inference on Asymmetric Mobile CPUs (Paper)
    • CNN and RNN; Smartphones; CPU
  • [PLDI 2021] DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion (Paper)
    • CNN and Transformer; Smartphones; CPU or GPU

Application-specific optimization

  • [MobiSys 2025] ARIA: Optimizing Vision Foundation Model Inference on Heterogeneous Mobile Processors for Augmented Reality (To Appear)
    • Vision foundation model for augmented reality; Smartphones
  • [AAAI 2025] E4: Energy-Efficient DNN Inference for Edge Video Analytics Via Early Exiting and DVFS (Paper)
    • Video analytics; Edge devices; GPU
  • [MobiCom 2024] Panopticus: Omnidirectional 3D Object Detection on Resource-constrained Edge Devices (Paper)
    • 3D object detection; Edge devices; GPU
  • [MobiSys 2023] OmniLive: Super-Resolution Enhanced 360° Video Live Streaming for Mobile Devices (Paper)
    • Video super-resolution; Smartphones; GPU
  • [IEEE TMC 2023] NAWQ-SR: A Hybrid-Precision NPU Engine for Efficient On-Device Super-Resolution (Paper)
    • Single-image super-resolution; Smartphones; NPU
  • [EuroSys 2022] LiteReconfig: Cost and Content Aware Reconfiguration of Video Object Detection Systems for Mobile GPUs (Paper)
    • Video analytics; Edge devices; GPU
  • [UbiComp 2022] Efficient On-Device Visual Question Answering (Paper)
    • Visual question answering; Edge devices & smartphones; GPU
  • [MobiCom 2021] Flexible High-Resolution Object Detection on Edge Devices with Tunable Latency (Paper)
    • Object detection; Edge devices; GPU
  • [MobiCom 2020] NEMO: enabling neural-enhanced video streaming on commodity mobile devices (Paper)
    • Video super-resolution; Smartphones; GPU

A-2. Single-DNN Inference on Heterogeneous Mobile Processors

General DNN inference

  • [EuroSys 2025] Flex: Fast, Accurate DNN Inference on Low-Cost Edges Using Heterogeneous Accelerator Execution (Paper)
    • Smartphones; CPU + GPU + NPU (TPU/DSP)
  • [arXiv 2025] HeteroLLM: Accelerating Large Language Model Inference on Mobile SoCs with Heterogeneous AI Accelerators (Paper)
    • LLM; Smartphones; CPU + GPU + NPU
  • [arXiv 2024] PowerInfer-2: Fast Large Language Model Inference on a Smartphone (Paper)
    • LLM; Smartphones; CPU + NPU
  • [IEEE TMC 2024] Thermal-Aware Scheduling for Deep Learning on Mobile Devices with NPU (Paper)
    • CNN; Smartphones; GPU + NPU
  • [ICDE 2023] EdgeNN: Efficient Neural Network Inference for CPU-GPU Integrated Edge Devices (Paper)
    • CNN; Edge devices; CPU + GPU
  • [ATC 2023] Decentralized Application-Level Adaptive Scheduling for Multi-Instance DNNs on Open Mobile Devices (Paper)
    • CNN; Smartphones; CPU + GPU
  • [IPSN 2021] Efficient Execution of Deep Neural Networks on Mobile Devices with NPU (Paper)
    • CNN; Smartphones; CPU + NPU
  • [EuroSys 2019] µLayer: Low Latency On-Device Inference Using Cooperative Single-Layer Acceleration and Processor-Friendly Quantization (Paper)
    • CNN; Smartphones; CPU + GPU

Application-specific optimization

  • [MobiCom 2024] Perceptual-Centric Image Super-Resolution using Heterogeneous Processors on Mobile Devices (Paper)
    • Single-image super-resolution; Smartphones; GPU + NPU
  • [ICDE 2024] COUPLE: Orchestrating Video Analytics on Heterogeneous Mobile Processors (Paper)
    • Video analytics (object detection); Smartphones; GPU + DSP (NPU)
  • [IPSN 2023] PointSplit: Towards On-device 3D Object Detection with Heterogeneous Low-power Accelerators (Paper)
    • 3D object detection; Edge devices; GPU + NPU
  • [MobiCom 2019] MobiSR: Efficient On-Device Super-Resolution through Heterogeneous Mobile Processors (Paper)
    • Single-image super-resolution; Smartphones; CPU + GPU + DSP (NPU)

A-3. Single-DNN Inference across Multiple Mobile Devices

  • [INFOCOM 2024] Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference (Paper)
    • Transformer; Multiple edge devices; CPU + GPU

B-1. Multi-DNN Inference on Single Mobile Processors

General DNN inference

  • [IEEE TMC 2024] SwapNet: Efficient Swapping for DNN Inference on Edge AI Devices Beyond the Memory Budget (Paper)
    • CNN; Edge devices; GPU
  • [MobiSys 2024] Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs (Paper)
    • Edge devices; GPU
  • [MICRO 2023] Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads (Paper)
    • CNN, Attention-based NN, and NLP; From smartphones to data centers; NPU (Simulation)
  • [SenSys 2023] Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU (Paper)
    • Edge devices; GPU
  • [HPCA 2021] Layerweaver: Maximizing Resource Utilization of Neural Processing Units via Layer-Wise Scheduling (Paper)
    • NPU (Simulation)
  • [MobiCom 2021] LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision (Paper)
    • Edge devices & smartphones; CPU or GPU
  • [PerCom 2021] MASA: Responsive Multi-DNN Inference on the Edge (Paper)
    • CNN; Edge devices (Raspberry Pi); CPU

Application-specific optimization

  • [MobiCom 2020] Heimdall: Mobile GPU Coordination Platform for Augmented Reality Applications (Paper)
    • Augmented reality; Smartphones; GPU

B-2. Multiple DNN Inference on Heterogeneous Mobile Processors

General DNN inference

  • [PPoPP 2024] Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous SoCs (HaX-CoNN) (Paper)
    • Edge devices; GPU + DLA (NPU)
  • [SEC 2024] Elastic Execution of Multi-Tenant DNNs on Heterogeneous Edge MPSoCs (Paper)
    • Smartphones; CPU + GPU + DSP (NPU)
  • [MobiSys 2023] NN-Stretch: Automatic Neural Network Branching for Parallel Inference on Heterogeneous Multi-Processors (Paper)
    • Smartphones; CPU + GPU + DSP (NPU)
  • [MobiSys 2022] Band: Coordinated Multi-DNN Inference on Heterogeneous Mobile Processors (Paper)
    • Smartphones; CPU + GPU + DSP + NPU
  • [MobiSys 2022] CoDL: efficient CPU-GPU co-execution for deep learning inference on mobile devices (Paper)
    • Smartphones; CPU + GPU
  • [SenSys 2022] BlastNet: Exploiting Duo-Blocks for Cross-Processor Real-Time DNN Inference (Paper)
    • Edge devices; CPU + GPU
  • [ACM TACO 2021] SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms (Paper)
    • Smartphones, edge devices, and desktop computers; CPU + GPU + DSP (NPU)
  • [RTSS 2019] Pipelined Data-Parallel CPU/GPU Scheduling for Multi-DNN Real-Time Inference (Paper)
    • Edge devices (NVIDIA TX2) and desktop computers (Intel x86 Xeon); CPU + GPU

C. Single-DNN Inference on Microcontrollers

  • [SenSys 2025] Lupe: Integrating the Top-down Approach with DNN Execution on Ultra-Low-Power Devices (To Appear)
    • Ultra-low-power MCU (MSP430 series)
  • [SenSys 2024] Intermittent Inference: Trading a 1% Accuracy Loss for a 1.9 x Throughput Speedup (Paper)
    • High-performance MCU (ARM Cortex-M series)
  • [SenSys 2024] Fast-Inf: Ultra-Fast Embedded Intelligence on the Batteryless Edge (Paper)
    • Ultra-low-power MCU (MSP430 series)
  • [MobiSys 2023] HarvNet: Resource-Optimized Operation of Multi-Exit Deep Neural Networks on Energy Harvesting Devices (Paper)
    • Ultra-low-power MCU (MSP430 series)
  • [ASPLOS 2023] Space-Efficient TREC for Enabling Deep Learning on Microcontrollers (Paper)
    • High-performance MCU (ARM Cortex-M series)
  • [MLSys 2021] MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers (Paper)
    • High-performance MCU (ARM Cortex-M series)

Challenges

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published