NVIDIA-Nemo-Parakeet-TDT-0-6B-V2-Audio-to-Text

NVIDIA Nemo Parakeet TDT 0.6B V2 Audio to Text Python Script - This converts a WAV or MP3 into text.

This script is for: NVIDIA just open-sourced Parakeet TDT 0.6B V2, a 600M parameter automatic speech recognition (ASR) model that tops the Huggingface Open-ASR leaderboard with RTFx 3380

It's open-sourced under CC-BY-4.0, ready for commercial use.

⚙️ The Details

→ Built on FastConformer encoder + TDT decoder, the model handles up to 24-minute audio chunks with full attention and outputs with punctuation, capitalization, and accurate word/char/segment timestamps.

→ It achieves RTFx 3380 at batch size 128 on the Open ASR leaderboard, but performance varies with audio duration and batch size.

→ Available via NVIDIA NeMo, optimized for GPU inference, and installable via pip install -U nemo_toolkit['asr'].

→ Compatible with Linux, runs on Ampere, Blackwell, Hopper, Volta GPU architectures, requiring minimum 2GB RAM.

How to use:

In the command line (bash in Linux) type (or use python if not python3): python3 transcribe_script.py [audio_filename.wav]

Run the script with your audio file path. You can optionally specify a segment length:

python3 transcribe_script.py /path/to/your/long_audio.wav

python3 transcribe_script.py /path/to/your/long_audio.mp3 --segment_length 30

What you need to do before you use this:

pip install nemo_toolkit[asr]

pip install pydub

sudo apt-get update sudo apt-get install ffmpeg

pip install cuda-python>=12.3

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
transcribe_script.py		transcribe_script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NVIDIA-Nemo-Parakeet-TDT-0-6B-V2-Audio-to-Text

How to use:

What you need to do before you use this:

About

Uh oh!

Releases

Packages

Languages

alby13/NVIDIA-Nemo-Parakeet-TDT-0-6B-V2-Audio-to-Text

Folders and files

Latest commit

History

Repository files navigation

NVIDIA-Nemo-Parakeet-TDT-0-6B-V2-Audio-to-Text

How to use:

What you need to do before you use this:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages