Mimic Blocker: Self-Supervised Adversarial Training for Voice Conversion Defense with Pretrained Feature Extractors

Official Implementation of the Interspeech 2025 paper Mimic Blocker: Self-Supervised Adversarial Training for Voice Conversion Defense with Pretrained Feature Extractors

Voice conversion (VC) enables natural speech synthesis with minimal data; however, it poses security risks, e.g., identity theft and privacy breaches. To address this, we propose Mimic Blocker, an active defense mechanism that prevents VC models from extracting speaker characteristics while preserving audio quality. Our method employs adversarial training, an audio quality preservation strategy, and an attack strategy. It relies on only publicly available pretrained feature extractors, which ensures model-agnostic protection. Furthermore, it enables self-supervised learning using only the original speaker's speech. Experimental results demonstrate that our method achieves robust defense performance in both white-box and black-box scenarios. Notably, the proposed approach maintains audio quality by generating noise imperceptible to human listeners, thereby enabling protection while retaining natural voice characteristics in practical applications.

Demo

You can find our Demo here

Project Structure

Mimic-Blocker/
│
├── requirements.txt
│
├── FreeVC/ (utils.py updated)                
│
├── TriAAN-VC/ (src/vocoder.py, config/base.yaml updated)
│
├── data/
│   ├── VCTK-Corpus-0.92/                     
│   ├── {FreeVC/TriAAN-VC}_original/          
│   ├── {FreeVC/TriAAN-VC}_noise_{wavlm/hubert}/ 
│   ├── {FreeVC/TriAAN-VC}_noisy_style_{wavlm/hubert}/ 
│   ├── {FreeVC/TriAAN-VC}_test_pairs_{wavlm/hubert}.txt 
│   ├── {FreeVC/TriAAN-VC}_test_noisy_pairs_{wavlm/hubert}.txt 
│   ├── train.txt                             
│   ├── val.txt                                
│   └── test.txt                               
│
├── model/
│   ├── checkpoints/                          
│   │   └── generator_{wavlm/hubert}.pth
│   ├── inference.py                          
│   ├── main.py                               
│   ├── model.py                              
│   ├── train.py                              
│   └── single_audio_inference.py             
│
├── VC_inference/
│   ├── Freevc_inference.py                   
│   └── TriAANVC_inference.py                 
│
├── evaluation/  
│   ├── pretrained models/                    
│   └── evaluation.py                         
│
└── data_processing/ 
    ├── test_split.py                         
    └── train_test_split.py

Pre-requisites

1. Clone the repository

git clone https://github.com/yugwangyeol/Mimic-Blocker.git
cd Mimic-Blocker

git clone https://github.com/OlaWod/FreeVC.git

Download freevc.pth and put it under directory 'checkpoints/'
Download WavLM-Large and put it under directory 'wavlm/'
Rename the folder 'configs' to 'logs'

git clone https://github.com/winddori2002/TriAAN-VC.git

Download model-cpc-split.pth and put it under directory 'checkpoints/'
Download cpc.pt and put it under directory 'cpc/'
Download vocoder.pkl and put it under directory 'vocoder/'

Modify the code in 'FreeVC' and 'TriAAN-VC'

FreeVC/utils.py


# Original code (line 24)
checkpoint = torch.load('wavlm/WavLM-Large.pt')

# Modified
checkpoint = torch.load('FreeVC/wavlm/WavLM-Large.pt')

TriAAN-VC/src/vocoder.py


# Original code (line 20)
checkpoint = './vocoder/vocoder.pkl'

# Modified
checkpoint = 'TriAAN-VC/vocoder/vocoder.pkl'

TriAAN-VC/config/base.yaml


# Original code (line 1~8)
data_path:       ./base_data
wav_path:        ./vctk/wav48_silence_trimmed
txt_path:        ./vctk/txt
spk_info_path:   ./vctk/speaker-info.txt
converted_path: 
vocoder_path:    ./vocoder
cpc_path:        ./cpc
n_uttr:

# Modified
data_path:       TriAAN-VC/base_data
wav_path:        TriAAN-VC/vctk/wav48_silence_trimmed
txt_path:        TriAAN-VC/vctk/txt
spk_info_path:   TriAAN-VC/vctk/speaker-info.txt
converted_path: 
vocoder_path:    TriAAN-VC/vocoder
cpc_path:        TriAAN-VC/cpc
n_uttr:

2. Install requirements

pip install -r requirements.txt

3. Download VCTK Dataset

import torch
import torchaudio

torchaudio.datasets.VCTK_092(root="data", download=True)

4. Split Dataset (Train/Val/Test)

python data_processing/train_test_split.py

Generate train.txt, val.txt, and text.txt based on the VCTK speakers

Inference

python model/single_audio_inference.py --input_path </path/to/wavs> --output_path </path/to/outputdir> --model_path </path/to/pretrained_model>

input_path : Path to a single .wav file to which adversarial noise will be added.
Download pretrained models and put it under directory 'model/checkpoints/'

Training

Use train.txt to train the adversarial noise generator and generate checkpoints.

python model/main.py --feature_extractor wavlm

You can choose the feature_extractor:
- If wavlm is selected, the model will generate generator_wavlm.pth.
- If hubert is selected, the model will generate generator_hubert.pth.

Evaluation

1. Generate Test Pairs for VC

Extract valid (x, t) pairs based on ASV results for VC input generation.

# FreeVC model (default)
python data_processing/test_split.py --model FreeVC

# TriAAN-VC model
python data_processing/test_split.py --model TriAAN-VC

Saves (x, t) pairs to data/{model}_test_pairs.txt
Saves F(x, t) to data/{model}_original

2. Generate Noisy Styles `(x → x')`

Add adversarial noise to x to generate x' for VC input.

# FreeVC model (default)
python model/inference.py --model FreeVC --feature_extractor wavlm

# TriAAN-VC model
python model/inference.py --model TriAAN-VC --feature_extractor wavlm

Takes x from data/{model}_test_pairs.txt
Saves x' to data/{model}_noisy_style_{feature_extractor}
Saves (x', t) pairs to data/{model}_test_noisy_pairs_{feature_extractor}.txt

Note on feature extractor (default: wavlm)

FreeVC:
- wavlm: White-box scenario
- hubert: Black-box scenario
TriAAN-VC: Always Black-box (both wavlm and hubert)

3. Convert Noisy Pairs with VC Model

Generate F(x', t) by feeding (x', t) into the VC model.

# FreeVC
python VC_inference/FreeVC_inference.py --feature_extractor wavlm

# TriAAN-VC
python VC_inference/TriAANVC_inference.py --feature_extractor wavlm

Takes (x', t) from data/{model}_test_noisy_pairs_{feature_extractor}.txt
Saves F(x', t) to data/{model}_noise_{feature_extractor}
Appends F(x', t) path to the 3rd column of data/{model}_test_noisy_pairs_{feature_extractor}.txt

4. Evaluate Defense Performance

Evaluate performance using PESQ/STOI/ASR/PSR metrics.

# FreeVC model (default)
python evaluation/evaluation.py --model FreeVC --feature_extractor wavlm

# TriAAN-VC model
python evaluation/evaluation.py --model TriAAN-VC --feature_extractor wavlm

Retrieves:
- x from data/{model}_test_pairs.txt
- x', F(x', t) from data/{model}_test_noisy_pairs_{feature_extractor}.txt

Ciations

@article{2025mimicblocekr,
  title={Mimic Blocker: Self-Supervised Adversarial Training for Voice Conversion Defense with Pretrained Feature Extractors},
  author={Yu, Gwang Yeol and Lee, Jun Hyeok and Kim, Seo Ryeong and Lee, Ji Min},
  journal={},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Mimic Blocker: Self-Supervised Adversarial Training for Voice Conversion Defense with Pretrained Feature Extractors

Demo

Project Structure

Pre-requisites

1. Clone the repository

2. Install requirements

3. Download VCTK Dataset

4. Split Dataset (Train/Val/Test)

Inference

Training

Evaluation

1. Generate Test Pairs for VC

2. Generate Noisy Styles `(x → x')`

3. Convert Noisy Pairs with VC Model

4. Evaluate Defense Performance

Ciations

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
VC_inference		VC_inference
data		data
data_processing		data_processing
docs		docs
evaluation		evaluation
model		model
README.md		README.md
requirements.txt		requirements.txt

2junhyeok/MimicBlocker

Folders and files

Latest commit

History

Repository files navigation

Mimic Blocker: Self-Supervised Adversarial Training for Voice Conversion Defense with Pretrained Feature Extractors

Demo

Project Structure

Pre-requisites

1. Clone the repository

2. Install requirements

3. Download VCTK Dataset

4. Split Dataset (Train/Val/Test)

Inference

Training

Evaluation

1. Generate Test Pairs for VC

2. Generate Noisy Styles (x → x')

3. Convert Noisy Pairs with VC Model

4. Evaluate Defense Performance

Ciations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

2. Generate Noisy Styles `(x → x')`

Packages