GitHub

About

This repository presents a fast and efficient speech tokenization framework based on bidirectional Mamba, designed for spoken term detection (STD). The method introduces a speech tokenizer that produces language-agnostic and speaker-independent tokens, ensuring consistent token sequences across different utterances of the same word. The repository includes the implementation, datasets, and pre-trained models.

Language-Agnostic Speech Tokenizer for Spoken Term Detection with Efficient Retrieval
Anup Singh, Kris Demuynck, Vipul Arora
Paper: https://www.isca-archive.org/interspeech_2025/singh25d_interspeech.html

Setup

Clone the Repository

git clone https://github.com/anupsingh15/LAST.git
cd LAST

Create a Virtual Environment

conda create -n mSTD anaconda

Install Dependencies

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install mamba-ssm
pip install causal-conv1d>=1.4.0
python -m pip install tslearn
pip install -U tensorboard
pip install POT
pip install librosa
pip install npy-append-array
pip install faiss-cpu
pip install Levenshtein

Usage

To train the model, run:

python main.py

To create the database, build the index, perform retrieval and word-pair tokenization, check: demo/

Datasets & Pre-trained Models

Dataset: Kathbath Word Alignments
Pre-trained Models: Download from Google Drive

Citation

If you find our work useful, please cite:

@inproceedings{singh25d_interspeech,
  title     = {{Language-Agnostic Speech Tokenizer for Spoken Term Detection with Efficient Retrieval}},
  author    = {{Anup Singh and Kris Demuynck and Vipul Arora}},
  year      = {{2025}},
  booktitle = {{Interspeech 2025}},
  pages     = {{2630--2634}},
  doi       = {{10.21437/Interspeech.2025-2722}},
  issn      = {{2958-1796}},
}

👉 You may also check out our earlier work on the Monolingual Speech Tokenizer:

BEST-STD: Bidirectional Mamba-Enhanced Speech Tokenization for Spoken Term Detection
Anup Singh, Kris Demuynck, Vipul Arora
Paper: https://ieeexplore.ieee.org/abstract/document/10889633

🚀 Coming Soon

We are actively working on enhancing this method with new features and improvements. Stay tuned for upcoming upgrades, including:

More efficient tokens
Improved token consistency across different noise conditions

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
config		config
demo		demo
src		src
LICENSE		LICENSE
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Setup

Clone the Repository

Create a Virtual Environment

Install Dependencies

Usage

Datasets & Pre-trained Models

Citation

🚀 Coming Soon

About

Uh oh!

Releases

Packages

Languages

License

anupsingh15/LAST

Folders and files

Latest commit

History

Repository files navigation

About

Setup

Clone the Repository

Create a Virtual Environment

Install Dependencies

Usage

Datasets & Pre-trained Models

Citation

🚀 Coming Soon

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages