a better solution for mismatch of speech feat len and speech token len when trainning #1232

boji123 · 2025-04-24T09:24:34Z

aluminumbox · 2025-04-24T09:48:42Z

这个是验证过吗，比删掉最后一帧训的效果要好吗？

boji123 · 2025-04-24T10:05:55Z

尚未对比，主要是deyi的版本，他是整个batch强制trim，会导致部分较长的trim超过一个token，这非常不优雅
而且原版的flow里面为了对齐多了个interpolate，这会人为引入帧级别误差，也非常不优雅
我这个直接在单sample上做，就两个选项嘛，pad to longest 或者 trim to shortest，我这边再对比下

aluminumbox · 2025-04-25T01:23:40Z

尚未对比，主要是deyi的版本，他是整个batch强制trim，会导致部分较长的trim超过一个token，这非常不优雅而且原版的flow里面为了对齐多了个interpolate，这会人为引入帧级别误差，也非常不优雅我这个直接在单sample上做，就两个选项嘛，pad to longest 或者 trim to shortest，我这边再对比下

我们训练时一直是按取整的方式去掉一帧，推理时也是一样，目前没有发现这个对效果有太大影响。当然复制一帧可能也没有什么影响。这个我感觉可能两个做法没什么差异。

boji123 · 2025-04-25T02:03:48Z

flow训练时，是做了interpolate强制重采样对齐，并没有看到去掉一帧的策略？只在推理看到对prompt进行对齐的策略

aluminumbox · 2025-04-25T02:05:01Z

flow训练时，是做了interpolate强制重采样对齐，并没有看到去掉一帧的策略？

其实这个interpolate可以删掉，因为提取mel的时候就已经确保了是2倍关系

aluminumbox · 2025-04-25T02:07:35Z

flow训练时，是做了interpolate强制重采样对齐，并没有看到去掉一帧的策略？

其实这个interpolate可以删掉，因为提取mel的时候就已经确保了是2倍关系

哦我忘记加这个了，之前是准备在compute_fbank里加这个截断代码了，忘记加了。麻烦你把这个pr改成截断一帧

aluminumbox · 2025-04-25T02:08:42Z

flow训练时，是做了interpolate强制重采样对齐，并没有看到去掉一帧的策略？

其实这个interpolate可以删掉，因为提取mel的时候就已经确保了是2倍关系

哦我忘记加这个了，之前是准备在compute_fbank里加这个截断代码了，忘记加了。麻烦你把这个pr改成截断一帧

之前的设计是传个token2mel ratio的参数，根据这个参数强制截断，应该是我后来搞忘记了，忘记加了

boji123 · 2025-04-25T02:16:50Z

好的，已更改

…token len, refer to FunAudioLLM#1051

boji123 changed the title ~~a better solution for mismatch of speech feat len and speech token len~~ a better solution for mismatch of speech feat len and speech token len when trainning Apr 24, 2025

[debug] a better solution for mismatch of speech feat len and speech …

65ad448

…token len, refer to FunAudioLLM#1051

boji123 force-pushed the bj_dev_feat_len_pad branch from f6a8c80 to 74fea6a Compare April 25, 2025 02:36

[feature] modify pad to trim

038ff9f

boji123 force-pushed the bj_dev_feat_len_pad branch from 74fea6a to 038ff9f Compare April 25, 2025 02:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a better solution for mismatch of speech feat len and speech token len when trainning #1232

a better solution for mismatch of speech feat len and speech token len when trainning #1232

boji123 commented Apr 24, 2025 •

edited

Loading

aluminumbox commented Apr 24, 2025

boji123 commented Apr 24, 2025 •

edited

Loading

aluminumbox commented Apr 25, 2025

boji123 commented Apr 25, 2025 •

edited

Loading

aluminumbox commented Apr 25, 2025

aluminumbox commented Apr 25, 2025

aluminumbox commented Apr 25, 2025

boji123 commented Apr 25, 2025 •

edited

Loading

a better solution for mismatch of speech feat len and speech token len when trainning #1232

Are you sure you want to change the base?

a better solution for mismatch of speech feat len and speech token len when trainning #1232

Conversation

boji123 commented Apr 24, 2025 • edited Loading

aluminumbox commented Apr 24, 2025

boji123 commented Apr 24, 2025 • edited Loading

aluminumbox commented Apr 25, 2025

boji123 commented Apr 25, 2025 • edited Loading

aluminumbox commented Apr 25, 2025

aluminumbox commented Apr 25, 2025

aluminumbox commented Apr 25, 2025

boji123 commented Apr 25, 2025 • edited Loading

boji123 commented Apr 24, 2025 •

edited

Loading

boji123 commented Apr 24, 2025 •

edited

Loading

boji123 commented Apr 25, 2025 •

edited

Loading

boji123 commented Apr 25, 2025 •

edited

Loading