Correct the synced lyrics heuristically

This issue is a follow-on from this short thread in a related side-project; https://github.com/karaokenerds/python-audio-separator/issues/8#issuecomment-1685325394

Problem: `lyrics-transcriber` currently [transcribes](https://github.com/karaokenerds/python-lyrics-transcriber/blob/main/lyrics_transcriber/transcriber.py#L76) the given audio file using [whisper-timestamped](https://github.com/linto-ai/whisper-timestamped) and [writes](https://github.com/karaokenerds/python-lyrics-transcriber/blob/main/lyrics_transcriber/transcriber.py#L88) the detected words to a lyrics file directly with no cleanup or modification.

This results in very variable accuracy for the lyrics output, as Whisper is far from perfect at correctly detecting lyrics from music audio.
For an example, compare these two synced lyrics videos:
- [Synced manually by me](https://www.dropbox.com/scl/fi/9nf2omf5d9y718gqrrs3e/Rock-On-Rockall-Andrew-Manual-Sync.mp4?rlkey=updqkrfrbqpq7qnoxdjnx5ve3&dl=0) with correct lyrics and manual syncing using [MidiCo](https://www.midicokaraoke.com/). Manual [synced .lrc file here](https://www.dropbox.com/scl/fi/vxht3sn5ecqrj1ph965vl/Rock-On-Rockall-Andrew-Manual-Sync.lrc?rlkey=as0oqmp6h29qkw16u9m3z2kjb&dl=0) for comparison.
- [Generated using lyrics-transcriber](https://www.dropbox.com/scl/fi/4lpa85av5sfngxj89ruyw/Rock-On-Rockall-Pure-Whisper.mp4?rlkey=kmbkpqfbskueu2ogsmhxrj3rh&dl=0) - pure Whisper transcription only, taking [the output .lrc file](https://www.dropbox.com/scl/fi/gc8i33pnyhvw45e7cdy93/Rock-On-Rockall-Pure-Whisper.lrc?rlkey=dpg4fcag441ysseeau66qmg5a&dl=0) and loading that into MidiCo 

Fortunately, for the majority of songs, as long as we know the artist and title, we can download lyrics from the internet and hopefully use this to _correct_ the detected lyrics from Whisper.

I've already [implemented](https://github.com/karaokenerds/python-lyrics-transcriber/blob/main/lyrics_transcriber/transcriber.py#L80) the fetch of lyrics from both genius and spotify.

This issue is to track the implementation of the _hard part_ - using those lyrics to correct the detected lyrics.

Before discussing ways to approach this, it's worth being aware of the biggest limitations first:

**1 - Lyrics from the internet are often _wrong_ in various ways**
Common examples include:
- Missing repetitions of chorus/refrain or bridge sections of songs
- Missing intro or outro sections
- Wrong/incorrect words, e.g. where a person typing up the lyrics has misheard
- Wrong/incorrect words, e.g. where the "official" lyrics don't match what ended up actually being sung by the artist in the commercial recording

**2 - Whisper-timestamped transcriptions are almost always wrong in various places**
- It will almost always have some words which are wrong, depending on the singers style, accent, background music, recording quality, etc. This is especially likely when the lyrics include names or less common words, and are sometimes hilarious to read, e.g. mishearing "Whitehall" as "Phytol" in one song I recently created a karaoke version of 😄 
- While it usually gets the timestamps of words correctly (even if the word itself is wrong), there are still some issues with this which may need to be solved in the whisper-timestamped project itself, e.g. it commonly gets the timestamp of the very first word wrong, and occasionally starts sentences too soon.
- Fortunately, it at least provides a **confidence score** for each detected word, which we can hopefully use to improve the transcription by replacing low confidence words with more-likely words from the internet lyrics

So, given these challenges, I'm holding out hope for the following approach (roughly):
- Take the internet lyrics and split those up into lines (both genius and spotify if both were successfully fetched)
- For each line returned from the whisper transcription, find a couple of "anchor words" which have a high confidence score
- Attempt to match up the line with a lyrics line from the internet lyrics using these "anchor words"
- Attempt to replace the low confidence (less than 50%?) words with words from the matched internet lyrics line, potentially replacing the entire line if there are multiple low confidence words in the line or if the number of words doesn't match up

This is a super rough set of thoughts though, and I'm sure the reality of this approach will become apparent when attempting to implement ;) 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Correct the synced lyrics heuristically #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Correct the synced lyrics heuristically #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions