Skip to content

anupsingh15/LAST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About

This repository presents a fast and efficient speech tokenization framework based on bidirectional Mamba, designed for spoken term detection (STD). The method introduces a speech tokenizer that produces language-agnostic and speaker-independent tokens, ensuring consistent token sequences across different utterances of the same word. The repository includes the implementation, datasets, and pre-trained models.

Language and Speaker-Agnostic Tokenizer

Language-Agnostic Speech Tokenizer for Spoken Term Detection with Efficient Retrieval
Anup Singh, Kris Demuynck, Vipul Arora
Paper: https://www.isca-archive.org/interspeech_2025/singh25d_interspeech.html

Setup

Clone the Repository

git clone https://github.com/anupsingh15/LAST.git
cd LAST

Create a Virtual Environment

conda create -n mSTD anaconda

Install Dependencies

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install mamba-ssm
pip install causal-conv1d>=1.4.0
python -m pip install tslearn
pip install -U tensorboard
pip install POT
pip install librosa
pip install npy-append-array
pip install faiss-cpu
pip install Levenshtein

Usage

To train the model, run:

python main.py

To create the database, build the index, perform retrieval and word-pair tokenization, check: demo/

Datasets & Pre-trained Models

Citation

If you find our work useful, please cite:

@inproceedings{singh25d_interspeech,
  title     = {{Language-Agnostic Speech Tokenizer for Spoken Term Detection with Efficient Retrieval}},
  author    = {{Anup Singh and Kris Demuynck and Vipul Arora}},
  year      = {{2025}},
  booktitle = {{Interspeech 2025}},
  pages     = {{2630--2634}},
  doi       = {{10.21437/Interspeech.2025-2722}},
  issn      = {{2958-1796}},
}

👉 You may also check out our earlier work on the Monolingual Speech Tokenizer:

BEST-STD: Bidirectional Mamba-Enhanced Speech Tokenization for Spoken Term Detection
Anup Singh, Kris Demuynck, Vipul Arora
Paper: https://ieeexplore.ieee.org/abstract/document/10889633

🚀 Coming Soon

We are actively working on enhancing this method with new features and improvements. Stay tuned for upcoming upgrades, including:

  • More efficient tokens
  • Improved token consistency across different noise conditions

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published