Extract audio files from a parquet or arrow file generated by Hugging Face datasets
library.
Usage: extract-audio [OPTIONS] --input <INPUT> --output <OUTPUT>
Options:
--input <INPUT> The path to the input file
--format <FORMAT> File format [default: parquet] [possible values: arrow, parquet]
--output <OUTPUT> The path to the output files
-h, --help Print help
-V, --version Print version
extract-audio --format parquet --input train-00000-of-00010.parquet --output files-parquet/
extract-audio --format arrow --input data-00000-of-01189.arrow --output files-arrow/
You need: cargo, rustc, cross, podman, goreleaser.
- build images and increase resources for podman:
podman build --platform=linux/amd64 -f dockerfiles/Dockerfile.aarch64-unknown-linux-gnu -t aarch64-unknown-linux-gnu:my-edge .
podman build --platform=linux/amd64 -f dockerfiles/Dockerfile.x86_64-unknown-linux-gnu -t x86_64-unknown-linux-gnu:my-edge .
podman machine set --cpus 4 --memory 8192
- make binaries:
goreleaser build --clean --snapshot --id extract-audio --timeout 60m