Skip to content

Request: Word-Level Timestamps Support in Fish Audio WebSocket API #1107

@dainis-g

Description

@dainis-g

Self Checks

  • I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find any relevant information that meets my needs. English 中文 日本語 Portuguese (Brazil)
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell us your story.

We are developing a conversational AI project that uses the Fish Audio TTS WebSocket API to generate speech synchronized with VRM model facial morphs and animations.

To align the visual expressions with the spoken output, we need word-level timestamps (not phoneme-level). These would allow us to trigger facial morphs and animations — such as [Sad], [Happy], or [::Waving] — at the exact moment each word is spoken.

2. What is your suggested solution?

Could you please advise on the following:

  • Whether Fish Audio’s WebSocket API currently provides word-level timestamps or alignment data.
  • If not, is this feature planned or under consideration?
  • Any recommended workaround for estimating word timing without adding latency.

3. Additional context or comments

We’ve contacted support@fish.audio but haven’t received a response yet, so we’re posting here for visibility.

We really like Fish Audio’s emotion and prosody control features and would love to continue building on your platform if we can find a way to integrate this timing data.

Thank you for your time and for the great work you’re doing with Fish Speech!

4. Can you help us with this feature?

  • I am interested in contributing to this feature.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions