Emotive speech markup #3

Sciumo · 2024-03-24T00:49:35Z

Sciumo
Mar 24, 2024

A Markdown extension for emotive speech could be better than SSML in expressiveness and ML training efficiency because of its simplicity and tokenizer friendly syntax.
Markdown's straightforward annotations make it easier to read and edit than XML like markup language.

fakerybakery · 2024-03-24T01:02:16Z

fakerybakery
Mar 24, 2024
Maintainer

That's a cool idea! How would it be integrated into the phonemizer?

0 replies

Sciumo · 2024-03-26T13:37:27Z

Sciumo
Mar 26, 2024
Author

there would need to be an extension to the tokens to encode breath and tone variance.
ingressive[1] and egressive [2] breath control and emphasis.

extended characters for
strong
soft
whisper
yell

so the Markdown encoding would then provide emotive post processing.

Possible:
Hey (capitalized sentence emphasis and timing break)
Hey! (exclaim interest sentence)
Hey! (bold ~ strong emote interest)
hey (italic ~ quizzical interest)
(hey) (italic paren ~ whisper)
HEY (cap ~ yelling)
HEY! (bold cap ~ angry yelling)
: hey this is a sentence. (definition ~ narrative voice)

[1] https://en.wikipedia.org/wiki/Ingressive_sound
[2] https://en.wikipedia.org/wiki/Egressive_sound

4 replies

fakerybakery Mar 26, 2024
Maintainer

Cool! So these tokens would be added to the phonemizer?

Sciumo Mar 26, 2024
Author

need to review literature for existing references [1] including tonality [2]

I see the workflow like this.

enumerate initial set definitions with IPA references
assign tokens to definitions
extend token dictionary
map markdown to token phoneme methodology
build training dataset with emotive markdown [3,4]
verify markdown to phoneme translation
fine tune base model with dataset

[1] https://en.wikipedia.org/wiki/Phonological_hierarchy
[2] https://en.wikipedia.org/wiki/Tone_terracing
[3] https://zenodo.org/records/1188976
[4] https://www.youtube.com/watch?v=Y7OQoNEu3dY

fakerybakery Mar 26, 2024
Maintainer

Cool! Do you have any ideas on how the dataset would be created?

Sciumo Mar 26, 2024
Author

I sent an email to maintainers of RAVDESS dataset, which would be the training dataset.

Run the dataset through STT, create JSONL dataset.
Write a script to do first class markdown and canonical phoneme translation
Review by hand and edit the JSONL, probably with an editing script.
...
X) Publish

[1] https://psychlabs.torontomu.ca/smartlab/resources/speech-song-database-ravdess/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Emotive speech markup #3

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Emotive speech markup #3

Uh oh!

Sciumo Mar 24, 2024

Replies: 2 comments · 4 replies

Uh oh!

fakerybakery Mar 24, 2024 Maintainer

Uh oh!

Sciumo Mar 26, 2024 Author

Uh oh!

fakerybakery Mar 26, 2024 Maintainer

Uh oh!

Sciumo Mar 26, 2024 Author

Uh oh!

fakerybakery Mar 26, 2024 Maintainer

Uh oh!

Sciumo Mar 26, 2024 Author

Sciumo
Mar 24, 2024

Replies: 2 comments 4 replies

fakerybakery
Mar 24, 2024
Maintainer

Sciumo
Mar 26, 2024
Author

fakerybakery Mar 26, 2024
Maintainer

Sciumo Mar 26, 2024
Author

fakerybakery Mar 26, 2024
Maintainer

Sciumo Mar 26, 2024
Author