-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Hi..
I am trying to convert pretrained LJSpeech TTS model based on kan-bayashi/ljspeech_fastspeech2 and parallel_wavegan/ljspeech_parallel_wavegan.v1 using the below code:
########################### ONNX Conversion ############################
from espnet2.bin.tts_inference import Text2Speech
from espnet_onnx.export import TTSModelExport
m = TTSModelExport()
tag_exp = "exp/tts_train_fastspeech2_raw_phn_tacotron_g2p_en_no_space/train.loss.ave_5best.pth"
train_config="exp/tts_train_fastspeech2_raw_phn_tacotron_g2p_en_no_space/config.yaml"
vocoder_tag = 'parallel_wavegan.v1/checkpoint-400000steps.pkl'
vocoder_config= 'parallel_wavegan.v1/config.yml'
text2speech = Text2Speech.from_pretrained(
train_config=train_config,
model_file=tag_exp,
vocoder_file=vocoder_tag,
vocoder_config=vocoder_config,
speed_control_alpha=1.0,
always_fix_seed=False
)
tag_name = 'ljspeech_pretrained'
m.export(text2speech, tag_name, quantize=True)
########################### Inference ############################
from espnet_onnx import Text2Speech
import soundfile
import numpy as np
import time
text2speech = Text2Speech(tag_name)
text = 'hello world!'
wav = wav['wav']
soundfile.write("ljspeech_pretrained_test.wav", wav, 22050, "PCM_16")
######################################################################
On synthesizing, the audio quality is very low.
I realized that the converted ONNX folder did not have stats.h5 file from the pwg vocoder folder.
~/.cache/espnet_onnx/ljspeesch_pretrained/: config.yaml feats_stats.npz full quantize
Can anyone please help how to include the stats.h5 during inference using espnet_onnx