Overlap vs. Context Mode #311
Mattk70
started this conversation in
Show and tell
Replies: 1 comment 4 replies
-
(or 2 in the case of BirdNET) |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
The classifiers (models) used by Chirpity segment audio into 3 second chunks and analyse a spectrogram (or 2 in the case of BirdNET) generated from that audio segment. By default each segment is in sequence with no overlap. This is generally fine, however, because the model 'sees' only 3 seconds of audio, there is no context for the classification and sometimes a call which lies at the edge of one of those windows is cut short.
Although this case is factored into model training, that can still have one of two undesirable effects:
Two approaches can be used to mitigate these problems. They both involve overlapping the segments:
The simple approach is to analyse these overlapping segments and report the results for each segment. This addresses issue 1, but increases the occurrence of problem 2.
In Chirpity, a different approach is available for the Nocmig models. This is "Context mode" :

The model generates predictions for each 3 second audio segment, and compares these to the predictions of the surrounding audio - offset by 1.5 seconds before and after the current segment.
The results are compared and any result that does not meet the chosen confidence threshold in 2 of the 3 segments is discarded. This approach mitigates both issues 1 and 2, although a lower confidence threshold is usually required for best results.
N.B. Because of the additional processing, you can expect Context Mode to take about 50% longer to complete an analysis.
Beta Was this translation helpful? Give feedback.
All reactions