kNN-SVC: Robust Zero-Shot Singing Voice Conversion with Additive Synthesis and Concatenation Smoothness Optimization
Issues in code/demo welcome // 欢迎提交 Issues // コード・デモの不具合は Issues 歓迎です // 코드/데모 Issues 환영합니다
This repo provides inference for kNN-SVC. The project is managed with Poetry for reproducible, isolated runs.
- Prereqs: Python 3.12, Poetry
- Install deps:
poetry install - Checkpoints can be found under the Releases tab, place them in a folder and specify it as a command line argument (or modify it in the notebook)
- Run conversions using any of the three pathways below. Feel free to report bugs/confusion via Issues.
All examples assume 16kHz, mono audio inputs.
Runs the main entrypoint and saves output next to the source file as: <src_basename>to<tgt_basename>knn<ckpt_type>_<post_opt>.wav
poetry run python ddsp_inference.py /path/to/src.wav /path/to/style.wav \
--ckpt_dir /path/to/ckpt_dir \
--ckpt_type mix \
--post_opt post_opt_0.2 \
--topk 4 \
--device cuda \
--prioritize_f0 true \
--tgt_loudness_db -16Notes:
--ckpt_typeoptions include: mix, mix_harm_no_amp_, mix_no_harm_no_amp_, wavlm_only, wavlm_only_original, where harm indicates the additive synthesis conditioning--post_optsmoothness optimization, can beno_post_optorpost_opt_0.2
Both src and tgt should be dataset roots that contain speaker subfolders.
Converted audio will be written under the parent directory of the target dataset folder, in a directory automatically created like:
<parent_of_tgt>/{src_name}_to_{tgt_name}_{ckpt_type}_post_opt_{post_opt}/
poetry run python ddsp_inference.py /path/to/src_dataset_root /path/to/tgt_dataset_root \
--ckpt_dir /path/to/ckpt_dir \
--ckpt_type mix \
--post_opt post_opt_0.2 \
--required_subset_file /path/to/split.csvNotes:
--required_subset_filecan filter which files are processed (CSV format expected by the code)--dur_limitrestricts the target pool to the first N minutes (set to a number or leave empty for all)
Colab demo: https://colab.research.google.com/github/SmoothKen/knn-svc/blob/master/knnsvc_demo.ipynb
Open knnsvc_demo.ipynb for an interactive, quick demo that uses the same ddsp_inference.py entrypoint under the hood.
Steps:
- Ensure you have 16kHz, mono WAVs for the source (content) and target (style).
- In the first cell, set
src_wav_pathandref_wav_pathand optionally tweakckpt_type,post_opt, andtopk. - Run the next cell to perform the conversion. The result will be saved next to the source file as:
<src_basename>_to_<tgt_basename>_knn_<ckpt_type>_<post_opt>.wav - Subsequent cells will load and play the result inside the notebook.
Tip: ckpt_type options include mix, mix_harm_no_amp_*, mix_no_harm_no_amp_*, wavlm_only, wavlm_only_original. post_opt can be no_post_opt or post_opt_0.2.
We plan to standardize the ckpt_type naming to reduce confusion, but it may depend on how this research further develops. The current options listed above will continue to work for now.
Links:
- Arxiv paper: https://arxiv.org/abs/2504.05686
- Demo page with samples: http://knnsvc.com/
Authors:
@inproceedings{shao2025knn,
title={kNN-SVC: Robust Zero-Shot Singing Voice Conversion with Additive Synthesis and Concatenation Smoothness Optimization},
author={Shao, Keren and Chen, Ke and Baas, Matthew and Dubnov, Shlomo},
booktitle={ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1--5},
year={2025},
organization={IEEE}
}