Phoneme and Codec Token Visualization

Created by Zixiang Wan, 2025

Purpose

Visualize the relationships between phonemes and codec tokens in a specialized speech dataset.

Dataset

LJSpeech Dataset

Preprocessing

Use Montreal Forced Aligner (MFA) to obtain phoneme timestamps.

Visualization

Co-occurrence heatmap
t-SNE visualization

Usage

1. Download and Unzip the LJSpeech Dataset

wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar -xjfv LJSpeech-1.1.tar.bz2

2. Install Montreal Forced Aligner (Approximately 5-10 Minutes)

conda install -c conda-forge montreal-forced-aligner

Note: This step may take a while.

3. Obtain Phoneme Timestamps Using MFA

3.1 Resample WAV Files to 16kHz

MFA only supports 16kHz audio.

mkdir -p LJSpeech-1.1/wavs_16k

for file in LJSpeech-1.1/wavs/*.wav; do
    base=$(basename "$file")
    sox "$file" -r 16000 "LJSpeech-1.1/wavs_16k/$base"
done

3.2 Prepare `.lab` Files for MFA

python prepare_files_for_MFA.py

3.3 Download the English Acoustic Model and Dictionary

mfa model download acoustic english_us_arpa
mfa model download dictionary english_us_arpa

Optional: Check available models.

mfa model list acoustic
mfa model list dictionary

More details about MFA commands can be found in the MFA User Guide.

Details about MFA models and dictionaries can be found in the MFA Models Documentation.

3.4 Run MFA for Alignment (Approximately 20 Minutes)

mfa align LJSpeech-1.1/wavs_16k english_us_arpa english_us_arpa textgrids

After alignment is completed, the output folder textgrids will contain TextGrid files corresponding to the audio files. These files contain phoneme-level timestamp information.

4. Generate Co-occurrence Heatmap and t-SNE Visualization

python draw.py

Results

Below are examples of the code execution results:

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
figures		figures
LICENSE		LICENSE
README.md		README.md
draw.py		draw.py
prepare_files_for_MFA.py		prepare_files_for_MFA.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Phoneme and Codec Token Visualization

Purpose

Dataset

Preprocessing

Visualization

Usage

1. Download and Unzip the LJSpeech Dataset

2. Install Montreal Forced Aligner (Approximately 5-10 Minutes)

3. Obtain Phoneme Timestamps Using MFA

3.1 Resample WAV Files to 16kHz

3.2 Prepare `.lab` Files for MFA

3.3 Download the English Acoustic Model and Dictionary

3.4 Run MFA for Alignment (Approximately 20 Minutes)

4. Generate Co-occurrence Heatmap and t-SNE Visualization

Results

Co-occurrence Heatmap

t-SNE Visualization

About

Uh oh!

Packages

Uh oh!

Languages

License

ggiggit/phoneme_codec_visualization

Folders and files

Latest commit

History

Repository files navigation

Phoneme and Codec Token Visualization

Purpose

Dataset

Preprocessing

Visualization

Usage

1. Download and Unzip the LJSpeech Dataset

2. Install Montreal Forced Aligner (Approximately 5-10 Minutes)

3. Obtain Phoneme Timestamps Using MFA

3.1 Resample WAV Files to 16kHz

3.2 Prepare .lab Files for MFA

3.3 Download the English Acoustic Model and Dictionary

3.4 Run MFA for Alignment (Approximately 20 Minutes)

4. Generate Co-occurrence Heatmap and t-SNE Visualization

Results

Co-occurrence Heatmap

t-SNE Visualization

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Languages

3.2 Prepare `.lab` Files for MFA

Packages