请问要怎么输出论文里提到的48kHz音频？

论文里提到：

first converting semantic tokens into the Mel spectrogram via a Mel decoder, and then generating the audio with a high sampling rate of 48 kHz via a super-resolution neural vocoder. 

但是README里提供的例子却是按照24000的采样率来保存音频的。

请问要怎么输出48kHz的音频呢？