Skip to content

How to include stats.h5 of PWG Vocoder during ONXX conversion for TTS #94

@anirpipi

Description

@anirpipi

Hi..
I am trying to convert pretrained LJSpeech TTS model based on kan-bayashi/ljspeech_fastspeech2 and parallel_wavegan/ljspeech_parallel_wavegan.v1 using the below code:

########################### ONNX Conversion ############################

from espnet2.bin.tts_inference import Text2Speech
from espnet_onnx.export import TTSModelExport

m = TTSModelExport()

tag_exp = "exp/tts_train_fastspeech2_raw_phn_tacotron_g2p_en_no_space/train.loss.ave_5best.pth"
train_config="exp/tts_train_fastspeech2_raw_phn_tacotron_g2p_en_no_space/config.yaml"

vocoder_tag = 'parallel_wavegan.v1/checkpoint-400000steps.pkl'
vocoder_config= 'parallel_wavegan.v1/config.yml'

text2speech = Text2Speech.from_pretrained(
train_config=train_config,
model_file=tag_exp,
vocoder_file=vocoder_tag,
vocoder_config=vocoder_config,
speed_control_alpha=1.0,
always_fix_seed=False
)

tag_name = 'ljspeech_pretrained'
m.export(text2speech, tag_name, quantize=True)

########################### Inference ############################

from espnet_onnx import Text2Speech
import soundfile
import numpy as np
import time

text2speech = Text2Speech(tag_name)

text = 'hello world!'
wav = wav['wav']

soundfile.write("ljspeech_pretrained_test.wav", wav, 22050, "PCM_16")

######################################################################

On synthesizing, the audio quality is very low.
I realized that the converted ONNX folder did not have stats.h5 file from the pwg vocoder folder.
~/.cache/espnet_onnx/ljspeesch_pretrained/: config.yaml feats_stats.npz full quantize

Can anyone please help how to include the stats.h5 during inference using espnet_onnx

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions