Towards Efficient Privacy-Preserving Machine Learning: A Systematic Review from Protocol, Model, and System Perspectives

🎉 Welcome! This repository contains the relevant papers mentioned in our survey paper Towards Efficient Privacy-Preserving Machine Learning: A Systematic Review from Protocol, Model, and System Perspectives.

🔧 This survey and repository will be continuously updated and refined to reflect the latest advancements. If you find any missing papers that are relevant to our survey or repository, we warmly welcome you to raise a pull request. We also welcome any suggestions and corrections to help improve the quality and coverage.

💡 If you find our work helpful, welcome to cite the survey and share it with others.

👀 Introduction

Privacy-preserving machine learning (PPML) based on cryptographic protocols has emerged as a promising paradigm to protect user data privacy in cloud-based machine learning services. While it achieves formal privacy protection, PPML often incurs significant efficiency and scalability costs due to orders of magnitude overhead compared to the plaintext counterpart. Therefore, there has been a considerable focus on mitigating the efficiency gap for PPML. In this survey, we provide a comprehensive and systematic review of recent PPML studies with a focus on cross-level optimizations. Specifically, we categorize existing papers into protocol level, model level, and system level, and review progress at each level. We also provide qualitative and quantitative comparisons of existing works with technical insights, based on which we discuss future research directions and highlight the necessity of integrating optimizations across protocol, model, and system levels.

📚 Table of Contents

👀 Introduction
🔒 Protocol-Level Optimization
🤖 Model-Level Optimization
⚙️ System-Level Optimization
- Compiler
- GPU Optimization
📌 Citation and Feedback

🔒 Protocol-Level Optimization

Linear Layer Optimization

OT-based protocols:

[CCS 2020] Cryptflow2: Practical 2-party secure inference [paper] [code]
[CCS 2020] Delphi: A cryptographic inference system for neural networks [paper]
[S&P 2021] SiRNN: A math library for secure rnn inference [paper] [code]
[NeurIPS 2023] Copriv: Network/protocol co-optimization for communication-efficient private inference [paper]
[arXiv 2023] Ciphergpt: Secure two-party gpt inference [paper]
[ICCAD 2024] PrivQuant: Communication-Efficient Private Inference with Quantized Network/Protocol Co-Optimization [paper]
[arXiv 2024] EQO: Exploring Ultra-Efficient Private Inference with Winograd-Based Protocol and Quantization Co-Optimization [paper]

HE-based protocols:

[Security 2018] GAZELLE: A low latency framework for secure neural network inference [paper]
[CCS 2018] Secure outsourced matrix computation and application to neural networks [paper]
[Nature Communications 2022] Secure human action recognition by encrypted neural network inference [paper]
[ICML 2022] Low-complexity deep convolutional neural networks on fully homomorphic encryption using multiplexed parallel convolutions [paper]
[Security 2022] Cheetah: Lean and fast secure Two-Party deep neural network inference [paper] [code]
[NeurIPS 2022] Iron: Private inference on transformers [paper]
[ICCAD 2023] Falcon: Accelerating homomorphically encrypted convolutions for efficient private mobile network inference [paper]
[TIFS 2023] Optimized privacy-preserving cnn inference with fully homomorphic encryption [paper] [code]
[S&P 2024] Bolt: Privacy-preserving, accurate and efficient inference for transformers [paper] [code]
[CCS 2024] NeuJeans: Private Neural Network Inference with Joint Optimization of Convolution and FHE Bootstrapping [paper]
[CCS 2024] Rhombus: Fast homomorphic matrix-vector multiplication for secure two-party inference [paper]
[NeurIPS 2024] PrivCirNet: Efficient Private Inference via Block Circulant Transformation [paper] [code]
[NDSS 2025] Bumblebee: Secure two-party inference framework for large transformers [paper] [code]
[ACL 2025] Powerformer: Efficient privacy-preserving transformer with batch rectifier-power max function and optimized homomorphic attention [paper]
[Security 2025] Breaking the layer barrier: Remodeling private Transformer inference with hybrid CKKS and MPC
[CCS 2025] Lodia: Towards Optimal Sparse Matrix-Vector Multiplication for Batched Fully Homomorphic Encryption [paper]

Replicated SS (RSS)-based protocols and functional secret sharing (FSS)-based protocols:

[CCS 2018] ABY3 A Mixed Protocol Framework for Machine Learning [paper]
[arXiv 2022] Llama: A low latency math library for secure inference [paper]
[arXiv 2023] Sigma: Secure gpt inference with function secret sharing [paper] [code]
[arXiv 2024] Puma: Secure inference of llama-7b in five minutes [paper] [code]

Non-Linear Layer Optimization

Secret Sharing-based Protocols:

[NDSS 2015] ABY-A framework for efficient mixed-protocol secure two-party computation [paper] [code]
[CCS 2017] Oblivious neural network predictions via minionn transformations [paper]
[Security 2019] XONN:XNOR-based oblivious deep neural network inference [[paper] (https://www.usenix.org/system/files/sec19-riazi.pdf)]
[CCS 2020] Cryptflow2: Practical 2-party secure inference [paper] [code]
[S&P 2021] SiRNN: A math library for secure rnn inference [paper] [code]
[CCS 2021] Coinn: Crypto/ml codesign for oblivious inference via neural networks [paper] [code]
[DAC 2022] ABNN2 secure two-party arbitrary-bitwidth quantized neural network predictions [paper]
[arXiv 2025] Privacy-Preserving Inference for Quantized BERT Models [paper]
[IEEE TDSC 2025] Antelope: Fast and Secure Neural Network Inference [paper]

HE-based protocols:

[NutMic 2019] Chimera: a unified framework for B/FV, TFHE and HEAAN fully homomorphic encryption and predictions for deep learning [paper]
[CSCML 2019] Simulating homomorphic evaluation of deep learning predictions [paper]
[TDSC 2021] Minimax approximation of sign function by composite polynomial for homomorphic comparison [paper]
[CSCML 2021] Programmable bootstrapping enables efficient homomorphic inference of deep neural networks [paper]
[ePrint 2021] REDsec: Running encrypted discretized neural networks in seconds [paper]
[IEEE Access 2022] Optimization of homomorphic comparison algorithm on rns-ckks scheme [paper]
[CSCML 2023] Deep neural networks for encrypted inference with tfhe [paper]

Graph-Level Techniques

Interactive Protocols with SS-HE Conversion:

[Security 2020] DELPHI: A Cryptographic Inference Service for Neural Networks [paper]
[Security 2025] Breaking the layer barrier: Remodeling private Transformer inference with hybrid CKKS and MPC

Non-Interactive Protocols with Embedded Bootstrapping Components:

[TIFS 2023] Optimized privacy-preserving cnn inference with fully homomorphic encryption [paper] [code]
[NeurIPS 2024] PrivCirNet: Efficient Private Inference via Block Circulant Transformation [paper]

Non-Interactive Protocols with Level Consumption Reduction:

[CF 2019] nGraph-HE: A Graph Compiler for Deep Learning on Homomorphically Encrypted Data [paper] [code]
[ICML 2022] Low-complexity deep convolutional neural networks on fully homomorphic encryption using multiplexed parallel convolutions [paper]

🤖 Model-Level Optimization

Linear Layer Optimization

[CCS 2021] COINN: Crypto/ML Codesign for Oblivious Inference via Neural Networks [paper] [code]
[ASIA CCS 2022] Hunter: HE-Friendly Structured Pruning for Efficient Privacy-Preserving Deep Learning [paper]
[arXiv 2022] Efficient ML Models for Practical Secure Inference [paper]
[NeurIPS 2023] CoPriv: Network/Protocol Co-Optimization for Communication-Efficient Private Inference [paper]
[ICCV 2023] MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention [paper] [code]
[NeurIPS 2024] PrivCirNet: Efficient Private Inference via Block Circulant Transformation [paper] [code]
[arXiv 2024] AERO: Softmax-Only LLMs for Efficient Private Inference [paper]
[arXiv 2024] EQO: Exploring Ultra-Efficient Private Inference with Winograd-Based Protocol and Quantization Co-Optimization [paper]

Non-Linear ReLU and GeLU Optimization

[Security 2020] DELPHI: A Cryptographic Inference Service for Neural Networks [paper]
[NeurIPS 2020] CryptoNAS: Private Inference on a ReLU Budget [paper]
[ICML 2021] DeepReDuce: ReLU Reduction for Fast Private Inference [paper]
[ICLR 2020] Safenet: A secure, accurate and fast neural network inference [paper]
[IEEE Security & Privacy] Sphynx: A Deep Neural Network Design for Private Inference [paper]
[NeurIPS 2021] Circa: Stochastic ReLUs for Private Deep Learning [paper]
[ICML 2022] Selective Network Linearization for Efficient Private Inference [paper] [code]
[arXiv 2022] AESPA: Accuracy Preserving Low-degree Polynomial Activation for Fast Private Inference [paper]
[arXiv 2023] RRNet: Towards ReLU-Reduced Neural Network for Two-party Computation Based Private Inference [paper]
[ACL 2022] THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption [paper]
[ICLR 2023] MPCFormer: fast, performant and private Transformer inference with MPC [paper] [code]
[ICCV 2023] MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention [paper] [code]
[arXiv 2023] PRIVIT: VISION TRANSFORMERS FOR FAST PRIVATE INFERENCE [paper] [code]
[NeurIPS 2023] CoPriv: Network/Protocol Co-Optimization for Communication-Efficient Private Inference [paper]
[ICCV 2023] AutoReP: Automatic ReLU Replacement for Fast Private Network Inference [paper] [code]
[ICLR 2023] Learning to Linearize Deep Neural Networks for Secure and Efficient Private Inference [paper]
[DAC 2023] PASNet: Polynomial Architecture Search Framework for Two-party Computation-based Secure Neural Network Deployment [paper] [code]
[arXiv 2023] Securing Neural Networks with Knapsack Optimization [paper] [code]
[arXiv 2023] LLMs Can Understand Encrypted Prompt: Towards Privacy-Computing Friendly Transformers [paper]
[arXiv 2023] Optimized Layerwise Approximation for Efficient Private Inference on Fully Homomorphic Encryption [paper]
[Security 2024] Fast and Private Inference of Deep Neural Networks by Co-designing Activation Functions [paper] [code]
[ICCAD 2023] RNA-ViT: Reduced-Dimension Approximate Normalized Attention Vision Transformers for Latency Efficient Private Inference [paper]
[arXiv 2024] AERO: Softmax-Only LLMs for Efficient Private Inference [paper]
[ICML 2024] Seesaw: Compensating for Nonlinear Reduction with Linear Computations for Private Inference [paper]
[TMLR 2024] DeepReShape: Redesigning Neural Networks for Efficient Private Inference [paper]
[ICML 2024] Ditto: Quantization-aware Secure Inference of Transformers upon MPC [paper]

Non-Linear Softmax Optimization

[ACL 2022] THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption [paper]
[ICLR 2023] MPCFormer: fast, performant and private Transformer inference with MPC [paper] [code]
[ICCV 2023] MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention [paper] [code]
[arXiv 2023] PRIVIT: VISION TRANSFORMERS FOR FAST PRIVATE INFERENCE [paper] [code]
[ICCV 2023] SAL-ViT: Towards Latency Efficient Private Inference on ViT using Selective Attention Search with a Learnable Softmax Approximation [paper]
[ICCAD 2023] RNA-ViT: Reduced-Dimension Approximate Normalized Attention Vision Transformers for Latency Efficient Private Inference [paper]
[Journal of Cryptographic Engineering 2025] MLFormer: a high performance MPC linear inference framework for transformers [paper]
[arXiv 2024] MPC-Minimized Secure LLM Inference [paper]
[arXiv 2024] Power-Softmax: Towards Secure LLM Inference over Encrypted Data [paper]
[ICML 2024] Converting Transformers to Polynomial Form for Secure Inference Over Homomorphic Encryption [paper]
[ACL 2024] SecFormer: Fast and Accurate Privacy-Preserving Inference for Transformer Models via SMPC [paper] [code]
[arXiv 2025] MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference [paper]
[ICLR 2025] CipherPrune: Efficient and Scalable Private Transformer Inference [paper]

PPML-Friendly Quantization Optimization

[CCS 2021] COINN: Crypto/ML Codesign for Oblivious Inference via Neural Networks [paper] [code]
[Security 2019] XONN: XNOR-based Oblivious Deep Neural Network Inference [paper]
[ICCAD 2024] PrivQuant: Communication-Efficient Private Inference with Quantized Network/Protocol Co-Optimization [paper]
[arXiv 2024] EQO: Exploring Ultra-Efficient Private Inference with Winograd-Based Protocol and Quantization Co-Optimization [paper]
[ICML 2024] Ditto: Quantization-aware Secure Inference of Transformers upon MPC [paper]
[DAC 2024] FastQuery: Communication-efficient Embedding Table Query for Private LLMs inference
[arXiv 2025] Privacy-Preserving Inference for Quantized BERT Models [paper]

⚙ System-Level Optimization

Compiler

[CF 2019] nGraph-HE: A Graph Compiler for Deep Learning on Homomorphically Encrypted Data [paper] [code]
[WAHC 2019] nGraph-HE2: A High-Throughput Framework for Neural Network Inference on Encrypted Data [paper] [code]
[PLDI 2019] CHET: An Optimizing Compiler for Fully Homomorphic Neural Network Inferencing [paper]
[PLDI 2020] Optimizing homomorphic evaluation circuits by program synthesis and term rewriting [paper] [code]
[PLDI 2020] EVA: An Encrypted Vector Arithmetic Language and Compiler for Efficient Homomorphic Computation [paper] [code]
[WAHC 2020] Concete: Concrete Operates on Ciphertexts Rapidly by Extending TFHE [paper] [code]
[ArXiv 2021] A General Purpose Transpiler for Fully Homomorphic Encryption [paper] [code]
[PLDI 2021] Porcupine: A Synthesizing Compiler for Vectorized Homomorphic Encryption [paper]
[CGO 2022] HECATE: Performance-Aware Scale Optimization for Homomorphic Encryption Compiler [paper] [code]
[PoPETs 2023] HElayers: A Tile Tensors Framework for Large Neural Networks on Encrypted Data [paper] [code]
[ASPLOS 2023] Coyote: A Compiler for Vectorizing Encrypted Arithmetic Circuits [paper] [code]
[Security 2023] ELASM: Error-Latency-Aware Scale Management for Fully Homomorphic Encryption [paper]
[Security 2023] HECO: Fully Homomorphic Encryption Compiler [paper] [code]
[PLDI 2024] A Tensor Compiler with Automatic Data Packing for Simple and Efficient Fully Homomorphic Encryption [paper] [code]
[Security 2024] DaCapo: Automatic Bootstrapping Management for Efficient Fully Homomorphic Encryption [paper] [code]
[CGO 2025] ANT-ACE: An FHE Compiler Framework for Automating Neural Network Inference [paper] [code]
[ASPLOS 2025] HALO: Loop-aware Bootstrapping Management for Fully Homomorphic Encryption [paper]
[ASPLOS 2025] ReSBM: Region-based Scale and Minimal-Level Bootstrapping Management for FHE via Min-Cut [paper]
[ASPLOS 2025] Orion: A Fully Homomorphic Encryption Framework for Deep Learning [paper] [code]
[ePrint 2025] Bridging Usability and Performance: A Tensor Compiler for Autovectorizing Homomorphic Encryption [paper]

GPU Optimization

[ICML 2025] EncryptedLLM: Privacy-Preserving Large Language Model Inference via GPU-Accelerated Fully Homomorphic Encryption [paper]
[ISCA 2025] Neo: Towards Efficient Fully Homomorphic Encryption Acceleration using Tensor Core [paper]
[HPCA 2025] WarpDrive: GPU-Based Fully Homomorphic Encryption Acceleration Leveraging Tensor and CUDA Cores [paper]
[HPCA 2025] Anaheim: Architecture and Algorithms for Processing Fully Homomorphic Encryption in Memory [paper]
[TCHES 2025] VeloFHE: GPU Acceleration for FHEW and TFHE Bootstrapping [paper]
[TCHES 2025] GPU Acceleration for FHEW/TFHE Bootstrapping [paper]
[PoPETS 2025] Hardware-Accelerated Encrypted Execution of General-Purpose Applications [paper] [video]
[arXiv 2025] CAT: A GPU-Accelerated FHE Framework with Its Application to High-Precision Private Dataset Query [paper] [code]
[PACT 2024] BoostCom: Towards Efficient Universal Fully Homomorphic Encryption by Boosting the Word-wise Comparisons [paper]
[NIPS Safe Generative AI Workshop 2024] Privacy-Preserving Large Language Model Inference via GPU-Accelerated Fully Homomorphic Encryption [paper] [code]
[TDSC 2024] Phantom: A CUDA-Accelerated Word-Wise Homomorphic Encryption Library [paper] [code]
[arXiv 2024] Cheddar: A Swift Fully Homomorphic Encryption Library for CUDA GPUs [paper] [code]
[MICRO 2023] GME: Gpu-based microarchitectural extensions to accelerate homomorphic encryption [paper]
[HPCA 2023] TensorFHE: Achieving Practical Computation on Encrypted Data Using GPGPU [paper] [code]
[IPDPS 2023] Towards Faster Fully Homomorphic Encryption Implementation with Integer and Floating-point Computing Power of GPUs [paper]
[TPDS 2023] HE-Booster: An Efficient Polynomial Arithmetic Acceleration on GPUs for Fully Homomorphic Encryption [paper]
[WAHC 2023] GPU Acceleration of High-Precision Homomorphic Computation Utilizing Redundant Representation [paper] [code]
[Access 2023] Homomorphic Encryption on GPU [paper] [code]
[TC 2022] CARM: CUDA-Accelerated RNS Multiplication in Word-Wise Homomorphic Encryption Schemes for Internet of Things [paper]
[IPDPS 2022] Accelerating Encrypted Computing on Intel GPUs [paper]
[TCHES 2021] Over 100x Faster Bootstrapping in Fully Homomorphic Encryption through Memory-centric Optimization with GPUs [paper] [code]
[Access 2021] Accelerating Fully Homomorphic Encryption Through Architecture-Centric Analysis and Optimization [paper]

📌 Citation and Feedback

If you find this survey or repository helpful, welcome to cite our work for continued research in this area! We also warmly welcome feedback, suggestions, or contributions to improve this survey and keep the repository up to date.

Feel free to open an issue or pull request.

Below is the bibtex of this PPML survey:

@misc{zeng2025towards,
  title={Towards Efficient Privacy-Preserving Machine Learning: A Systematic Review from Protocol, Model, and System Perspectives}, 
  author={Wenxuan Zeng and Tianshi Xu and Yi Chen and Yifan Zhou and Mingzhe Zhang and Jin Tan and Cheng Hong and Meng Li},
  year={2025},
  eprint={2507.14519},
  archivePrefix={arXiv},
  primaryClass={cs.CR},
  url={https://arxiv.org/abs/2507.14519}, 
}

Below is the bibtex of the PPML papers published by our lab:

@article{xu2025blb,
  title={Breaking the Layer Barrier: Remodeling Private Transformer Inference with Hybrid CKKS and MPC},
  author={Xu, Tianshi and Lu, Wen-jie and Yu, Jiangrui and Chen, Yi and Lin, Chenqi and Wang, Runsheng and Li, Meng},
  journal={USENIX Security Symposium},
  year={2025}
}

@inproceedings{zhang2025flash,
  title={FLASH: An Efficient Hardware Accelerator Leveraging Approximate and Sparse FFT for Homomorphic Encryption},
  author={Zhang, Tengyu and Xue, Yufei and Liang, Ling and Gu, Zhen and Wang, Yuan and Wang, Runsheng and Huang, Ru and Li, Meng},
  booktitle={2025 Design, Automation \& Test in Europe Conference (DATE)},
  pages={1--7},
  year={2025},
  organization={IEEE}
}

@article{xu2024privcirnet,
  title={Privcirnet: Efficient private inference via block circulant transformation},
  author={Xu, Tianshi and Wu, Lemeng and Wang, Runsheng and Li, Meng},
  journal={Advances in Neural Information Processing Systems},
  volume={37},
  pages={111802--111831},
  year={2024}
}

@inproceedings{xu2024privquant,
  title={PrivQuant: Communication-Efficient Private Inference with Quantized Network/Protocol Co-Optimization},
  author={Xu, Tianshi and Zhong, Shuzhang and Zeng, Wenxuan and Wang, Runsheng and Li, Meng},
  booktitle={Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design},
  pages={1--9},
  year={2024}
}

@inproceedings{yu2024flexhe,
  title={FlexHE: A flexible Kernel Generation Framework for Homomorphic Encryption-Based Private Inference},
  author={Yu, Jiangrui and Zeng, Wenxuan and Xu, Tianshi and Chen, Renze and Liang, Yun and Wang, Runsheng and Huang, Ru and Li, Meng},
  booktitle={Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design},
  pages={1--9},
  year={2024}
}

@inproceedings{lin2024fastquery,
  title={FastQuery: Communication-efficient Embedding Table Query for Private LLMs inference},
  author={Lin, Chenqi and Xu, Tianshi and Yang, Zebin and Wang, Runsheng and Huang, Ru and Li, Meng},
  booktitle={Proceedings of the 61st ACM/IEEE Design Automation Conference},
  pages={1--6},
  year={2024}
}

@article{zeng2023copriv,
  title={Copriv: Network/protocol co-optimization for communication-efficient private inference},
  author={Zeng, Wenxuan and Li, Meng and Yang, Haichuan and Lu, Wen-jie and Wang, Runsheng and Huang, Ru},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  pages={78906--78925},
  year={2023}
}

@inproceedings{zeng2023mpcvit,
  title={Mpcvit: Searching for accurate and efficient mpc-friendly vision transformer with heterogeneous attention},
  author={Zeng, Wenxuan and Li, Meng and Xiong, Wenjie and Tong, Tong and Lu, Wen-jie and Tan, Jin and Wang, Runsheng and Huang, Ru},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={5052--5063},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
2pc_framework.png		2pc_framework.png
README.md		README.md
he_compiler.png		he_compiler.png
he_linear.png		he_linear.png
model_optimization.png		model_optimization.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Towards Efficient Privacy-Preserving Machine Learning: A Systematic Review from Protocol, Model, and System Perspectives

👀 Introduction

📚 Table of Contents

🔒 Protocol-Level Optimization

Linear Layer Optimization

Non-Linear Layer Optimization

Graph-Level Techniques

🤖 Model-Level Optimization

Linear Layer Optimization

Non-Linear ReLU and GeLU Optimization

Non-Linear Softmax Optimization

PPML-Friendly Quantization Optimization

⚙ System-Level Optimization

Compiler

GPU Optimization

📌 Citation and Feedback

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

PKU-SEC-Lab/Awesome-PPML-Papers

Folders and files

Latest commit

History

Repository files navigation

Towards Efficient Privacy-Preserving Machine Learning: A Systematic Review from Protocol, Model, and System Perspectives

👀 Introduction

📚 Table of Contents

🔒 Protocol-Level Optimization

Linear Layer Optimization

Non-Linear Layer Optimization

Graph-Level Techniques

🤖 Model-Level Optimization

Linear Layer Optimization

Non-Linear ReLU and GeLU Optimization

Non-Linear Softmax Optimization

PPML-Friendly Quantization Optimization

⚙ System-Level Optimization

Compiler

GPU Optimization

📌 Citation and Feedback

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Packages