The generated wav is not good

Hi, thank you for open source the wonderful work !
I followed your instructions 1) install `lightconv_cuda`, 2) download the [checkpoint](https://drive.google.com/drive/folders/1QszdJC7dzBrQHntiLxYcG8ewczvoK4q1), 3) download the [speaker embedding npy](https://drive.google.com/drive/folders/1a4YW2UWdlF9RTqG_phv_VbRjyEcAld7t).
However, the generated result is not good.

Below is my running command

```
python3 synthesize.py \
  --text "Hello world" \
  --speaker_id Actor_22 \
  --emotion_id sad \
  --restore_step 450000 \
  --mode single \
  --dataset RAVDESS
```

```
# sh run.sh 
2022-11-30 13:45:22.626404: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Device of XSpkEmoTrans: cuda
Removing weight norm...
Raw Text Sequence: Hello world
Phoneme Sequence: {HH AH0 L OW1 W ER1 L D}
```

ENV
```
python 3.6.8
fairseq                 0.10.2
torch                   1.7.0+cu110
CUDA 11.0
```

![Hello world_Actor_22_sad](https://user-images.githubusercontent.com/108344115/204957468-1ba60db9-98c0-483f-9189-fea335a3bc26.png)


[Hello world_Actor_22_sad.wav.zip](https://github.com/keonlee9420/Cross-Speaker-Emotion-Transfer/files/10123751/Hello.world_Actor_22_sad.wav.zip)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The generated wav is not good #14

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

The generated wav is not good #14

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions