Skip to content

🎧 11K High Quality Anime Audio Clips, Transcriptions & Speaker Labels for TTS, ASR & Voice Cloning ✨

Notifications You must be signed in to change notification settings

taresh18/AnimeVox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 

Repository files navigation

🎧 AnimeVox: Character TTS Corpus ✨

Banner

πŸ—£οΈ Dataset Overview

AnimeVox is an English Text-to-Speech (TTS) dataset featuring 11,020 audio clips from 19 distinct anime characters across popular series. Each clip includes a high-quality transcription, character name, and anime title, making it ideal for voice cloning, custom TTS model fine-tuning, and character voice synthesis research.

The dataset was created and processed using TTSizer, an open-source tool that automates creating high-quality TTS datasets from raw media.

Dataset Links:

Watch the Demo Video:

AnimeVox Dataset Demo Video

πŸ“Š Dataset Statistics

  • Total samples: 11,020
  • Characters: 19
  • Anime series: 15
  • Audio format: 44.1kHz mono WAV
  • Storage size: ~3.5GB

πŸ—οΈ Dataset Structure

  • Instances: Each sample is a dictionary with the following structure:
    {
      "audio": {"path": "...", "array": ..., "sampling_rate": 44100},
      "transcription": "English text spoken by the character.",
      "character_name": "Character Name",
      "anime": "Anime Series Title"
    }
  • Fields:
    • audio: Audio object (44.1kHz).
    • transcription: (str) English transcription.
    • character_name: (str) Name of the speaking character.
    • anime: (str) Anime series title.
  • Splits: A single train split with all 11,020 samples from 19 characters.

πŸ› οΈ Dataset Creation

Source

Audio clips were sourced from official English-dubbed versions of popular anime series. The clips were selected to capture diverse emotional tones and vocal characteristics unique to each character.

Processing with TTSizer

This dataset was generated using TTSizer, which offers an end-to-end automated pipeline for creating TTS-ready datasets. Key features utilized include:

  • Advanced Multi-Speaker Diarization: To accurately identify and segment speech for each of the characters, even in complex audio environments.
  • State-of-the-Art Model Integration: Leveraging models such as MelBandRoformer (for vocal separation), Gemini (for diarization), CTC-Aligner (for precise audio-text alignment), and WeSpeaker (for speaker embedding/verification).
  • Quality Control: Implementing automatic outlier detection to flag and help refine potentially problematic audio-text pairs, ensuring higher dataset quality.

The tool's configurable nature allowed for fine-tuning the entire process to suit the specific needs of this anime voice dataset.

πŸš€ How to Use

from datasets import load_dataset

# Load the dataset
dataset = load_dataset("taresh18/AnimeVox")

# Access the training split
train_data = dataset["train"]

# Print dataset information
print(f"Dataset contains {len(train_data)} samples")

# Access a specific sample
sample = train_data[0]
print(f"Character: {sample['character_name']}")
print(f"From anime: {sample['anime']}")
print(f"Transcription: {sample['transcription']}")

πŸ“œ Licensing & Usage

  • License: Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).

About

🎧 11K High Quality Anime Audio Clips, Transcriptions & Speaker Labels for TTS, ASR & Voice Cloning ✨

Topics

Resources

Stars

Watchers

Forks