Spoofed Speech Attribution

This repository focuses on extending the functionality of the 'AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks' [1] model to predict attributes that characterize spoofed speech. The approach introduces a bank of probabilistic detectors that are trained to identify specific features associated with selected spoofing techniques. This results in a comprehensive attribute-based representation of each audio sample. This representation is then analyzed using decision tree modeling to enable accurate spoofed speech detection and detailed explanations for the model's decisions. The dataset selected for the experiments is LA scenario of ASVSpoof 2019.

Figure: Complete implementation workflow of the proposed architecture for explainable spoofed speech detection. Phase I demonstrates the extraction of embeddings using the AASIST model and the subsequent processing of these embeddings through a bank of seven probabilistic feature detectors. Phase II illustrates the concatenation of the outputs from these detectors to create a 25-dimensional vector, which is then fed into a decision tree model for classification. This decision tree model is used for both bonafide/spoofed classification and spoofing attack algorithm characterization.

Getting Started

Create a virtual environment using conda (recommended for visualization of decision trees using graphviz application).
- Download miniconda (https://docs.conda.io/en/latest/miniconda.html).
- Install miniconda by running the downloaded script.
- Create a new environment (python=3.10 recommended):
```
conda create -n spoof_env python=3.10
```
- For installing a package:
```
conda install -n spoof_env <package_name>
```
- For installing graphviz executables:
```
conda install -n spoof_env graphviz
```
- Activate the conda environment:
```
conda activate spoof_env
```
requirements.txt must be installed for execution.

pip install -r requirements.txt

Data Preparation

To download the ASVspoof 2019 logical access dataset [2]:

python download_dataset.py

(Alternative) Manual preparation is available via:

ASVspoof2019 dataset: https://datashare.ed.ac.uk/handle/10283/3336
1. Download LA.zip and unzip it.
2. Set the dataset directory in the configuration files in config/ folder.

Phase I

1. Inference Embedding Extraction

The binary output layer of the AASIST model is stripped and the remaining architecture is used to produce 160-dimensional embeddings for all the audios in training, development and evaluation sets.

To extract AASIST embeddings:

python inference_embedding_extraction.py

A set of embeddings is available in Embeddings/AASIST/ for further use.

2. Training Probabilistic Feature Detectors

The ASVSpoof 2019 dataset provides detailed metadata about the characteristics of each spoofing attack by organizing the spoofing methods into seven attribute sets: Input, Input Processor, Duration, Conversion, Speaker Representation, Output, and Waveform Generation. Probabilistic detectors are trained for each of these attribute sets such that they take the 160-dimensional "raw" AASIST embedding as their (shared) input, and are trained against ground truth labels to predict posterior probabilities for assessing the absence or presence of attributes associated with each spoofing attack algorithm.

To train a probabilistic feature detector for an attribute set:

python emb_main.py

The attribute set number, model architecture for probabilistic feature detector, and related parameters can be set in the configuration file emb_model_AASIST.conf. Experimentation shows that a two-layered neural network architecture with 64 and 32 neurons in the hidden layers, respectively, seems to outperform the other tested architectures with one hidden layer of 0, 4, 8, 32 or 64 neurons, and suits best for all the attribute sets. A set of trained probabilistic feature detectors is available in probabilistic_detectors/ for further use.

Phase II

1. Concatenation of Posterior Probabilities

All the audio recordings in the dataset are passed through the seven probabilistic feature detectors. The generated posterior probabilities are concatenated for each audio to form a 25-dimensional embeddings.

To calculate and concatenate the outputs generated by probabilistic feature detectors:

python create_df.py

The dataset (train/dev/eval), the choice to apply softmax or logit functions to the probabilistic feature detectors' outputs and a common model architecture for all the detectors, can be set in the configuration file emb_model_AASIST.conf as shown below.

A set of dataframes thus obtained, is available in df_posterior_probabilities/ for further use.

2. Decision Tree Modelling

Decision tree models are trained for making use of these 25-dimensional embeddings for two tasks:

Bonafide versus spoof classification:

python decision_tree.py --BonafideSpoof

Spoofing attack algorithm attribution:

python decision_tree.py --SpoofAttacks

Results are stored in decision_tree_results/. Relative paths of the dataframes and the maximum depth of decision tree, can be set in the configuration file emb_model_AASIST.conf.

License

MIT License

Copyright (c) 2024 Manasi Chhibber

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Acknowledgements

This research has been partially supported by the Academy of Finland (Decision No. 349605, project "SPEECHFAKES"). The author additionally acknowledges CSC – IT Center for Science, Finland, for the use of computational resources.
This repository is built on top of AASIST repo.
The dataset used here is ASVspoof 2019 [2].

References

[1] AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks

@INPROCEEDINGS{Jung2021AASIST,
  author={Jung, Jee-weon and Heo, Hee-Soo and Tak, Hemlata and Shim, Hye-jin and Chung, Joon Son and Lee, Bong-Jin and Yu, Ha-Jin and Evans, Nicholas},
  booktitle={arXiv preprint arXiv:2110.01200}, 
  title={AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks}, 
  year={2021}

[2] ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

@article{wang2020asvspoof,
  title={ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech},
  author={Wang, Xin and Yamagishi, Junichi and Todisco, Massimiliano and Delgado, H{\'e}ctor and Nautsch, Andreas and Evans, Nicholas and Sahidullah, Md and Vestman, Ville and Kinnunen, Tomi and Lee, Kong Aik and others},
  journal={Computer Speech \& Language},
  volume={64},
  pages={101114},
  year={2020},
  publisher={Elsevier}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spoofed Speech Attribution

Getting Started

Data Preparation

Phase I

1. Inference Embedding Extraction

2. Training Probabilistic Feature Detectors

Phase II

1. Concatenation of Posterior Probabilities

2. Decision Tree Modelling

License

Acknowledgements

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
Embeddings/AASIST		Embeddings/AASIST
SHAP-plots		SHAP-plots
config		config
decision_tree_results		decision_tree_results
df_posterior_probabilities		df_posterior_probabilities
models		models
probabilistic_detectors		probabilistic_detectors
LICENSE		LICENSE
README.md		README.md
create_df.py		create_df.py
data_utils.py		data_utils.py
decision_tree.py		decision_tree.py
download_dataset.py		download_dataset.py
emb_main.py		emb_main.py
emb_model.py		emb_model.py
evaluation.py		evaluation.py
fetch_data_function.py		fetch_data_function.py
inference_embedding_extraction.py		inference_embedding_extraction.py
requirements.txt		requirements.txt
utils.py		utils.py

License

Manasi2001/Spoofed-Speech-Attribution

Folders and files

Latest commit

History

Repository files navigation

Spoofed Speech Attribution

Getting Started

Data Preparation

Phase I

1. Inference Embedding Extraction

2. Training Probabilistic Feature Detectors

Phase II

1. Concatenation of Posterior Probabilities

2. Decision Tree Modelling

License

Acknowledgements

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages