Scanning Trojaned Models Using Out-of-Distribution Samples

Official PyTorch implementation of "Scanning Trojaned Models Using Out-of-Distribution Samples" (NeurIPS 2024) by Hossein Mirzaei, Ali Ansari* , Bahar Dibaei Nia*, Mojtaba Nafez†, Moein Madadi†, Sepehr Rezaee†, Zeinab Sadat Taghavi, Arad Maleki, Kian Shamsaie, Mahdi Hajialilue, Jafar Habibi, Mohammad Sabokrou, Mohammad Hossein Rohban

TRODO is a new trojan (backdoor) scanning method for deep neural networks that identifies "blind spots" where trojaned classifiers mistakenly classify out-of-distribution (OOD) samples as in-distribution (ID). By adversarially shifting OOD samples closer to ID, TRODO detects trojans without assumptions about the attack type and is effective even against adversarially trained trojans and in scenarios lacking training data, offering a versatile and accurate detection strategy.

Requirements

The current version requires the following python and CUDA versions:

python 3.7+
CUDA 11.1+

Additionally, the list of the packages used for this implementation is available in the requirements.txt file. To install them, use the following command:

pip install -r requirements.txt

Demo

TRODO Benchmark (Note: Although seperate notebooks are provided for different attack settings, the hyperparameters are fixed and the method is attack and label-mapping agnostic.)
- This notebook is designed to replicate and analyze the results of TRODO on our crafted benchmark, specifically for models backdoored with All to One mapping. Note that models are not trained adversarially.
TrojAI Benchmark:
- This notebook is a demonstration of TRODO performance on TrojAI Benchmark.

Citation

Please cite our work if you use the codebase:

@inproceedings{
mirzaei2022scan,
title={Scanning Trojaned Models Using Out-of-Distribution Samples},
author={Hossein Mirzaei, Ali Ansari, Bahar Dibaei Nia, Mojtaba Nafez, Moein Madadi, Sepehr Rezaee, Zeinab Sadat Taghavi, Arad Maleki, Kian Shamsaie, Mahdi Hajialilue, Jafar Habibi, Mohammad Sabokrou, Mohammad Hossein Rohban},
booktitle={Advances in Neural Information Processing Systems},
year={2024},
url={https://neurips.cc/virtual/2024/poster/93781}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Notebooks		Notebooks
figures		figures
src		src
README.md		README.md
TRODO.ipynb		TRODO.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scanning Trojaned Models Using Out-of-Distribution Samples

Requirements

Demo

Citation

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

rohban-lab/TRODO

Folders and files

Latest commit

History

Repository files navigation

Scanning Trojaned Models Using Out-of-Distribution Samples

Requirements

Demo

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages