Skip to content

Context-Sensitive Pronunciation Missing #10

@thuongvovan

Description

@thuongvovan

I’ve noticed that the training data for the phonemizer seems to consist entirely of isolated words. This may cause the model to consistently produce canonical (dictionary-style) pronunciations, and fail to handle context-sensitive phonetic variations — for example, always predicting “the” as /ðə/, regardless of the following word (/ðiː/ of the following start with u e o a i). It would be great to consider incorporating sentence-level data to account for connected speech phenomena like weak forms, assimilation, and linking.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions