Skip to content

ggiggit/phoneme_codec_visualization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Phoneme and Codec Token Visualization

Created by Zixiang Wan, 2025

Purpose

Visualize the relationships between phonemes and codec tokens in a specialized speech dataset.

Dataset

LJSpeech Dataset

Preprocessing

Use Montreal Forced Aligner (MFA) to obtain phoneme timestamps.

Visualization

  • Co-occurrence heatmap
  • t-SNE visualization

Usage

1. Download and Unzip the LJSpeech Dataset

wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar -xjfv LJSpeech-1.1.tar.bz2

2. Install Montreal Forced Aligner (Approximately 5-10 Minutes)

conda install -c conda-forge montreal-forced-aligner

Note: This step may take a while.

3. Obtain Phoneme Timestamps Using MFA

3.1 Resample WAV Files to 16kHz

MFA only supports 16kHz audio.

mkdir -p LJSpeech-1.1/wavs_16k

for file in LJSpeech-1.1/wavs/*.wav; do
    base=$(basename "$file")
    sox "$file" -r 16000 "LJSpeech-1.1/wavs_16k/$base"
done

3.2 Prepare .lab Files for MFA

python prepare_files_for_MFA.py

3.3 Download the English Acoustic Model and Dictionary

mfa model download acoustic english_us_arpa
mfa model download dictionary english_us_arpa

Optional: Check available models.

mfa model list acoustic
mfa model list dictionary

More details about MFA commands can be found in the MFA User Guide.

Details about MFA models and dictionaries can be found in the MFA Models Documentation.

3.4 Run MFA for Alignment (Approximately 20 Minutes)

mfa align LJSpeech-1.1/wavs_16k english_us_arpa english_us_arpa textgrids

After alignment is completed, the output folder textgrids will contain TextGrid files corresponding to the audio files. These files contain phoneme-level timestamp information.

4. Generate Co-occurrence Heatmap and t-SNE Visualization

python draw.py

Results

Below are examples of the code execution results:

Co-occurrence Heatmap

Co-occurrence Heatmap

t-SNE Visualization

t-SNE Visualization

About

Visualize the relationships between phonemes and codec tokens in a specialized speech dataset.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages