the best way to preprocess audio to address domain adaptation + relevance for projects #8
Replies: 2 comments 2 replies
-
Two thoughts on domain shift problem:
Thing I am very intrigued by:
Paper On SoTA Audio Recognition
cc @pcbermant @bs |
Beta Was this translation helpful? Give feedback.
-
This is a great paper that you referenced @aza!!! So much good information there (also, the references in the paper are very informative). I love the approach. I think it would be great to combine this with what Jeremy Howard mentioned on a couple of occasions, that is using trained models to clean the data to train better models down the road. I think this could work quite well because on one hand we have relatively quite a lot of data and the labels are even weaker than the scenario they mention in the paper. The set up I am thinking of is this -> train a set of models on the current train set, use them to predict on the data and either fine tune them or train models from scratch on the new dataset, singling out a negative class also, discarding the examples where the model is most in doubt, etc. In fact, I like this idea so much, I think I will start working towards that 😊 Thanks so much for the paper and your suggestions 🙏 I 100% agree with the list of things that you mention that are worth a try. What is problematic for some techniques (style transfer, deep learned augmentations) is the lack of access to the test set - in both scenarios it could be extremely helpful to have examples of what the test set looks like. Another two techniques I would add to the list are:
The wealth of techniques one can try is very, very large 😊 I am also trying to optimize for what might be useful to us down the road in choosing what to focus on. I really, really like the idea of using current models to train the next generation of models, so that is definitely on the plate atm. Same for SGD with restarts. And hand labeling data and training on it (I do think this can be useful to the competition, but above all it can be extremely useful to us, to go through the entire pipeline and have it in working order). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am continuing to run experiments wrt the birdcall competition. A significant finding is that to address domain shift the following normalization works best: mean centering followed by standard deviation normalization. This is closely followed in generalization ability by mean centering, standard deviation normalization and normalizing the values to between 0 and 1. All calculations are performed on an example by example basis.
I exchanged emails with Christos - he is busy with fieldwork at this time of the year - given the input that we probably do not want to look at shot detection initially, from the list of projects we discussed grey parrot (one project) and cattle (second project) detection to jump in for starters.
The grey parrot project is important from a conservation perspective but has limited data. There is some annotated data coming from the sensors, but we might also have to rely on data coming from directional microphones for training (exactly the same set up as for the birdcall competition! being able to diminish the impact of domain shift will be key). I discussed with Christos and posing the problem as detection in segments of 5 seconds length would work great. This will help to simplify the architecture vs the original plan of using object detection, will most likely mean we will need less labeled data (and we will have an easier time generating more data from the data that we will have) and we will be able to leverage methods field tested in the birdcall competition. We will have cleaner data on input than in the birdcall competition, but the big question is this -> will the false positive rate be good enough if we frame the problem in this way? A significant concern but not something we can find out without actually giving this a try!
Beta Was this translation helpful? Give feedback.
All reactions