How do I use the exported Fastpitch ONNX model to generate spectrogram from raw text using onnxruntime
?
#4130
Replies: 2 comments 3 replies
-
@borisfom, could you please have a look at this? Thanks. |
Beta Was this translation helpful? Give feedback.
-
In my case, I succeed with the same solution that @godspirit00 explained. But when I want to do a solemn speech as explained in this tutorial but with ONNX it does not work ok. Here the code of interest: with torch.no_grad():
spec_norm, audio_norm, durs_norm_pred, pitch_norm_pred = str_to_audio(input_string)
# Let's try to make the speech more solemn
# Let's deamplify the pitch and shift the pitch down by 75% of 1 standard deviation
pitch_sol = (pitch_norm_pred)*0.75-0.75
# Fastpitch tends to raise the pitch before "loss" which sounds inappropriate. Let's just remove that pitch raise
pitch_sol[0][-5] += 0.2
# Now let's pass our new pitch to fastpitch with a 90% pacing to slow it down
spec_sol, audio_sol, durs_sol_pred, _ = str_to_audio(input_string, pitch=pitch_sol, pace=0.9) There you can see that it makes a first inference to obtain pitch_norm_pred. Then it modifies this array and uses it as input again to obtain a solemn voice. The inference with ONNX is as if the speech ran out of air from the third word of each sentence. If I compare the same experiment but with original fastpitch model, it works fine. As explained in the tutorial: Is it possible that the input for ONNX should be unnormalized? If that is the case... How do I do that? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I managed to export a FastPitch model I trained to ONNX using export.py according to #4100 .
My question is, how do I use the exported Fastpitch ONNX model to generate spectrogram from raw text using
onnxruntime
?Since I intend to run the model on a Windows environment with no GPU, I choose
onnxruntime
instead ofnemo2riva
.The code I used is:
Not sure if it is correct.
And the
text
input requires to beint64
instead ofstring
. Looks like the raw text needs to be parsed and tokenized first. But it is done by the model itself in NeMo. So how can I do the same with the ONNX model?Thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions