Skip to content

RustedBytes/extract-audio

Repository files navigation

extract-audio

Build win-x86_64 win-aarch64

Extract audio files from a parquet or arrow file generated by Hugging Face datasets library.

Demo

Usage

Usage: extract-audio [OPTIONS] --input <INPUT> --output <OUTPUT>

Options:
      --input <INPUT>    The path to the input file
      --format <FORMAT>  File format [default: parquet] [possible values: arrow, parquet]
      --output <OUTPUT>  The path to the output files
  -h, --help             Print help
  -V, --version          Print version

Example

extract-audio --format parquet --input train-00000-of-00010.parquet --output files-parquet/

extract-audio --format arrow --input data-00000-of-01189.arrow --output files-arrow/

Build

You need: cargo, rustc, cross, podman, goreleaser.

  1. build images and increase resources for podman:
podman build --platform=linux/amd64 -f dockerfiles/Dockerfile.aarch64-unknown-linux-gnu -t aarch64-unknown-linux-gnu:my-edge .
podman build --platform=linux/amd64 -f dockerfiles/Dockerfile.x86_64-unknown-linux-gnu -t x86_64-unknown-linux-gnu:my-edge .

podman machine set --cpus 4 --memory 8192
  1. make binaries:
goreleaser build --clean --snapshot --id extract-audio --timeout 60m

About

Extract audio files from a parquet or arrow file generated by Hugging Face `datasets` library.

Topics

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •