Skip to content

A testing and development environment for algorithms that find the timestamps of sentence and phrase breaks in recorded speech based on an audio file and a transcript.

Notifications You must be signed in to change notification settings

BloomBooks/Spoken-Sentence-Divider

 
 

Repository files navigation

A development and testing environment for algorithms that identify breaks between phrases and sentences in recorded speech.

Originally developed for Bloom Desktop (https://github.com/BloomBooks/BloomDesktop)

To see the results, run the command "npx vite dev" from the root directory, then open "http://localhost:5173" in a web browser

To see a different page, modify the BOOK_NAME and PAGE_INDEX constants in ./src/entryPoints/onepage.test

The meaning of the parameters/constants at the start of the file can be found in the descriptions of the algorithms that use them in "./Algorithm Documentation.pdf"

To add a book to this tool, hand-align two versions of it - one with the phrase breaks at the beginning of each pause and one with phrase breaks at the end.
place both versions in ./public/data/HandAligned, naming them <BOOK_NAME>-Beginning and <BOOK_NAME>-End respectively
place either book's audio directory in ./public/data/HandAlgined and name it <BOOK_NAME>-Beginning-audio

To run a file in ./src/entryPoints other than onepage.ts, change the src attribute on the script element in ./index.html to point to the desired file path.

To test an algorithm's performance on a significant number of bloom books, run ./src/entryPoints/BulkSplits.ts
When you do, be sure to add the names of any additional books you wish to test to the bookNames array starting on line 12

onePangloss.ts and TestJobExample.ts were early tests that were later broken by changes to underlying method signatures.

About

A testing and development environment for algorithms that find the timestamps of sentence and phrase breaks in recorded speech based on an audio file and a transcript.

Resources

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published

Languages

  • HTML 95.5%
  • TypeScript 4.5%