Mimic Blocker: Self-Supervised Adversarial Training for Voice Conversion Defense with Pretrained Feature Extractors
Official Implementation of the Interspeech 2025 paper Mimic Blocker: Self-Supervised Adversarial Training for Voice Conversion Defense with Pretrained Feature Extractors
Voice conversion (VC) enables natural speech synthesis with minimal data; however, it poses security risks, e.g., identity theft and privacy breaches. To address this, we propose Mimic Blocker, an active defense mechanism that prevents VC models from extracting speaker characteristics while preserving audio quality. Our method employs adversarial training, an audio quality preservation strategy, and an attack strategy. It relies on only publicly available pretrained feature extractors, which ensures model-agnostic protection. Furthermore, it enables self-supervised learning using only the original speaker's speech. Experimental results demonstrate that our method achieves robust defense performance in both white-box and black-box scenarios. Notably, the proposed approach maintains audio quality by generating noise imperceptible to human listeners, thereby enabling protection while retaining natural voice characteristics in practical applications.
You can find our Demo here
Mimic-Blocker/
│
├── requirements.txt
│
├── FreeVC/ (utils.py updated)
│
├── TriAAN-VC/ (src/vocoder.py, config/base.yaml updated)
│
├── data/
│ ├── VCTK-Corpus-0.92/
│ ├── {FreeVC/TriAAN-VC}_original/
│ ├── {FreeVC/TriAAN-VC}_noise_{wavlm/hubert}/
│ ├── {FreeVC/TriAAN-VC}_noisy_style_{wavlm/hubert}/
│ ├── {FreeVC/TriAAN-VC}_test_pairs_{wavlm/hubert}.txt
│ ├── {FreeVC/TriAAN-VC}_test_noisy_pairs_{wavlm/hubert}.txt
│ ├── train.txt
│ ├── val.txt
│ └── test.txt
│
├── model/
│ ├── checkpoints/
│ │ └── generator_{wavlm/hubert}.pth
│ ├── inference.py
│ ├── main.py
│ ├── model.py
│ ├── train.py
│ └── single_audio_inference.py
│
├── VC_inference/
│ ├── Freevc_inference.py
│ └── TriAANVC_inference.py
│
├── evaluation/
│ ├── pretrained models/
│ └── evaluation.py
│
└── data_processing/
├── test_split.py
└── train_test_split.py
git clone https://github.com/yugwangyeol/Mimic-Blocker.git
cd Mimic-Blocker
git clone https://github.com/OlaWod/FreeVC.git
-
Download freevc.pth and put it under directory 'checkpoints/'
-
Download WavLM-Large and put it under directory 'wavlm/'
-
Rename the folder 'configs' to 'logs'
git clone https://github.com/winddori2002/TriAAN-VC.git
-
Download model-cpc-split.pth and put it under directory 'checkpoints/'
-
Download cpc.pt and put it under directory 'cpc/'
-
Download vocoder.pkl and put it under directory 'vocoder/'
Modify the code in 'FreeVC' and 'TriAAN-VC'
FreeVC/utils.py
# Original code (line 24)
checkpoint = torch.load('wavlm/WavLM-Large.pt')
# Modified
checkpoint = torch.load('FreeVC/wavlm/WavLM-Large.pt')
TriAAN-VC/src/vocoder.py
# Original code (line 20)
checkpoint = './vocoder/vocoder.pkl'
# Modified
checkpoint = 'TriAAN-VC/vocoder/vocoder.pkl'
TriAAN-VC/config/base.yaml
# Original code (line 1~8)
data_path: ./base_data
wav_path: ./vctk/wav48_silence_trimmed
txt_path: ./vctk/txt
spk_info_path: ./vctk/speaker-info.txt
converted_path:
vocoder_path: ./vocoder
cpc_path: ./cpc
n_uttr:
# Modified
data_path: TriAAN-VC/base_data
wav_path: TriAAN-VC/vctk/wav48_silence_trimmed
txt_path: TriAAN-VC/vctk/txt
spk_info_path: TriAAN-VC/vctk/speaker-info.txt
converted_path:
vocoder_path: TriAAN-VC/vocoder
cpc_path: TriAAN-VC/cpc
n_uttr:
pip install -r requirements.txt
import torch
import torchaudio
torchaudio.datasets.VCTK_092(root="data", download=True)
python data_processing/train_test_split.py
- Generate
train.txt
,val.txt
, andtext.txt
based on the VCTK speakers
python model/single_audio_inference.py --input_path </path/to/wavs> --output_path </path/to/outputdir> --model_path </path/to/pretrained_model>
input_path
: Path to a single.wav
file to which adversarial noise will be added.- Download pretrained models and put it under directory 'model/checkpoints/'
Use train.txt
to train the adversarial noise generator and generate checkpoints.
python model/main.py --feature_extractor wavlm
- You can choose the
feature_extractor
:- If
wavlm
is selected, the model will generategenerator_wavlm.pth
. - If
hubert
is selected, the model will generategenerator_hubert.pth
.
- If
Extract valid (x, t)
pairs based on ASV results for VC input generation.
# FreeVC model (default)
python data_processing/test_split.py --model FreeVC
# TriAAN-VC model
python data_processing/test_split.py --model TriAAN-VC
- Saves
(x, t)
pairs todata/{model}_test_pairs.txt
- Saves
F(x, t)
todata/{model}_original
Add adversarial noise to x
to generate x'
for VC input.
# FreeVC model (default)
python model/inference.py --model FreeVC --feature_extractor wavlm
# TriAAN-VC model
python model/inference.py --model TriAAN-VC --feature_extractor wavlm
- Takes
x
fromdata/{model}_test_pairs.txt
- Saves
x'
todata/{model}_noisy_style_{feature_extractor}
- Saves
(x', t)
pairs todata/{model}_test_noisy_pairs_{feature_extractor}.txt
Note on feature extractor (default: wavlm
)
- FreeVC:
wavlm
: White-box scenariohubert
: Black-box scenario
- TriAAN-VC: Always Black-box (both
wavlm
andhubert
)
Generate F(x', t)
by feeding (x', t)
into the VC model.
# FreeVC
python VC_inference/FreeVC_inference.py --feature_extractor wavlm
# TriAAN-VC
python VC_inference/TriAANVC_inference.py --feature_extractor wavlm
- Takes
(x', t)
fromdata/{model}_test_noisy_pairs_{feature_extractor}.txt
- Saves
F(x', t)
todata/{model}_noise_{feature_extractor}
- Appends
F(x', t)
path to the 3rd column ofdata/{model}_test_noisy_pairs_{feature_extractor}.txt
Evaluate performance using PESQ/STOI/ASR/PSR metrics.
# FreeVC model (default)
python evaluation/evaluation.py --model FreeVC --feature_extractor wavlm
# TriAAN-VC model
python evaluation/evaluation.py --model TriAAN-VC --feature_extractor wavlm
- Retrieves:
x
fromdata/{model}_test_pairs.txt
x'
,F(x', t)
fromdata/{model}_test_noisy_pairs_{feature_extractor}.txt
@article{2025mimicblocekr,
title={Mimic Blocker: Self-Supervised Adversarial Training for Voice Conversion Defense with Pretrained Feature Extractors},
author={Yu, Gwang Yeol and Lee, Jun Hyeok and Kim, Seo Ryeong and Lee, Ji Min},
journal={},
year={2025}
}