the best way to preprocess audio to address domain adaptation + relevance for projects #8

radekosmulski · 2020-07-15T15:25:39Z

radekosmulski
Jul 15, 2020

I am continuing to run experiments wrt the birdcall competition. A significant finding is that to address domain shift the following normalization works best: mean centering followed by standard deviation normalization. This is closely followed in generalization ability by mean centering, standard deviation normalization and normalizing the values to between 0 and 1. All calculations are performed on an example by example basis.

I exchanged emails with Christos - he is busy with fieldwork at this time of the year - given the input that we probably do not want to look at shot detection initially, from the list of projects we discussed grey parrot (one project) and cattle (second project) detection to jump in for starters.

The grey parrot project is important from a conservation perspective but has limited data. There is some annotated data coming from the sensors, but we might also have to rely on data coming from directional microphones for training (exactly the same set up as for the birdcall competition! being able to diminish the impact of domain shift will be key). I discussed with Christos and posing the problem as detection in segments of 5 seconds length would work great. This will help to simplify the architecture vs the original plan of using object detection, will most likely mean we will need less labeled data (and we will have an easier time generating more data from the data that we will have) and we will be able to leverage methods field tested in the birdcall competition. We will have cleaner data on input than in the birdcall competition, but the big question is this -> will the false positive rate be good enough if we frame the problem in this way? A significant concern but not something we can find out without actually giving this a try!

aza · 2020-07-16T02:41:55Z

aza
Jul 16, 2020
Maintainer

Two thoughts on domain shift problem:

It seems too that a robust set of data augmentation could help.
Also a pre-trained model, pre-trained on a lot more data might help.

Thing I am very intrigued by:

Currently data augmentation I have seen is manually/procedurally created.
Why not deep-learn data augmentation transformations?
(Style-transfer is just one example of a deep-learned data augmentation)

Paper On SoTA Audio Recognition

https://ai.facebook.com/research/publications/a-sequential-self-teaching-approach-for-improving-generalization-in-sound-event-recognition/
Is this an approach worth adopting/playing with?

cc @pcbermant @bs

0 replies

radekosmulski · 2020-07-16T10:55:56Z

radekosmulski
Jul 16, 2020
Author

This is a great paper that you referenced @aza!!! So much good information there (also, the references in the paper are very informative). I love the approach. I think it would be great to combine this with what Jeremy Howard mentioned on a couple of occasions, that is using trained models to clean the data to train better models down the road. I think this could work quite well because on one hand we have relatively quite a lot of data and the labels are even weaker than the scenario they mention in the paper.

The set up I am thinking of is this -> train a set of models on the current train set, use them to predict on the data and either fine tune them or train models from scratch on the new dataset, singling out a negative class also, discarding the examples where the model is most in doubt, etc. In fact, I like this idea so much, I think I will start working towards that 😊 Thanks so much for the paper and your suggestions 🙏

I 100% agree with the list of things that you mention that are worth a try. What is problematic for some techniques (style transfer, deep learned augmentations) is the lack of access to the test set - in both scenarios it could be extremely helpful to have examples of what the test set looks like. Another two techniques I would add to the list are:

training with restarts and maybe snapshot ensembling
label smoothing

The wealth of techniques one can try is very, very large 😊 I am also trying to optimize for what might be useful to us down the road in choosing what to focus on. I really, really like the idea of using current models to train the next generation of models, so that is definitely on the plate atm. Same for SGD with restarts. And hand labeling data and training on it (I do think this can be useful to the competition, but above all it can be extremely useful to us, to go through the entire pipeline and have it in working order).

2 replies

bs Jul 19, 2020
Maintainer

@radekosmulski, which technique was the one that you mentioned during our sync a few weeks ago as being highly promising, but required access to the test set?

It strikes me that while we don't have access to the test set in the Cornell Kaggle competition, we often will with field recordings.

radekosmulski Jul 19, 2020
Author

Yes, this is a very good point! The method is described here and here is the implementation. I didn't spend too much time tweaking it but applying it to another dataset, one where we would have access to the test set, would make for a very interesting exercise. Demonstrating that this technique can work on audio would be very interesting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

the best way to preprocess audio to address domain adaptation + relevance for projects #8

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

the best way to preprocess audio to address domain adaptation + relevance for projects #8

Uh oh!

radekosmulski Jul 15, 2020

Replies: 2 comments · 2 replies

Uh oh!

aza Jul 16, 2020 Maintainer

Uh oh!

radekosmulski Jul 16, 2020 Author

Uh oh!

bs Jul 19, 2020 Maintainer

Uh oh!

radekosmulski Jul 19, 2020 Author

radekosmulski
Jul 15, 2020

Replies: 2 comments 2 replies

aza
Jul 16, 2020
Maintainer

radekosmulski
Jul 16, 2020
Author

bs Jul 19, 2020
Maintainer

radekosmulski Jul 19, 2020
Author