Replies: 7 comments 12 replies
-
Thanks Ben, this is extremely useful and I appreciate you taking the time to make these notes! |
Beta Was this translation helpful? Give feedback.
-
Continue day 2. I was expecting to find a 'trim' frequency clip, since after my band passes, data looks like |
Beta Was this translation helpful? Give feedback.
-
Still working on the general flow of fine-tuning, whether to use the https://opensoundscape.org/en/latest/source/opensoundscape.ml.html#opensoundscape.ml.datasets.AudioFileDataset or create my own dataset and dataloader classes and use pytorch lightning. I cannot follow the docs in terms of how the dataframes should be formatted. ![]() ![]() but looking at the API source, the key sentence is that the filename is the index, and the rest is one hot encoded. Most of the examples come out of raven annotations and the clip_labels functions, making it hard to match from what I've got. I succeeded with ![]() |
Beta Was this translation helpful? Give feedback.
-
I think this is related to the high level concept of how annotations are structured, all the examples comes from the boxed annotations concept. When you have full recordings, its difficult to translate that to the documents. For example, in this Kaggle competition, you get a minute long recording of an Oriole, should I mock a dataframe in 5 second increments that label each 5 as 'oriole'? the ghost of these choices is clear in the embedding structure. For example, here is my input sample.
The index is the path to a sample, and one hot encoded for the entire frame. I grabbed one sample per class.
but when you embed, somewhere the 5 second idea re-emerges.
In general, once an annotation concept is bypassed, you might expect the rest of the functions to operate on those structures, provided it is a well formatted. Instead, in this case, the model reaches back and forces an annotation structure. In this case, because the input labels are per file, shape 192 and the embeddings have been subsetted, shape 1315, you get a shape error when fitting the classifier, understandably.
yields
stepping into this, we already knew the answer -> the train features and labels don't have the same shape.
Potential workaroundsI haven't done this yet, i'll need to check the docs (feedback welcome).
|
Beta Was this translation helpful? Give feedback.
-
I've been trying in vain to silence all these userwarnings. They really fill up the stdout. i've never truthfully understood this area of python and how it relates to user code versus external libraries
https://www.google.com/search?q=warnings.filterwarnings(%22ignore%22)+still+raises+warnings&oq=warnings.filterwarnings(%22ignore%22)+still+raises+warnings&gs_lcrp=EgZjaHJvbWUyBggAEEUYOdIBCDMwNDZqMGo3qAIAsAIA&sourceid=chrome&ie=UTF-8 |
Beta Was this translation helpful? Give feedback.
-
@sammlapp , I know this is upstream, but I'm really struggling to get opensoundscape installed within kaggle servers, its due to grad-cam https://www.kaggle.com/code/benweinstein/medell-n-workshop/log?scriptVersionId=235231319 Any suggestiions? |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Kitzes lab, ben weinstein here. I thought i'd narrate a somewhat unorganized unboxing story as I play with opensoundscape for the first time. These are just thoughts for user design and experience. I would love if anyone wanted to do the same for DeepForest.
User
Ben Weinstein
Experience Level
10+ years developing machine learning models and open source packages. No experience with audio data. Working on a workshop in Colombia with Santiago.
User goal
Getting Started
Tutorials and getting started docs are clean and compelling. They gave me confidence and made sense.
Sample data
https://drive.google.com/file/d/11BT4trlQsUsSfrRm79rsn3pnSl9zNQ9m/view?usp=sharing
Installation
installation process with multiple packages, models and dependencies tripped me up. Easy to solve for me, probably not for others.
There are lots of models to choose from, as a novice user I would have liked the experts (you) to pick a horse in the race. A default model of course makes no sense for all the problems, but just gets me off the ground. Probably having one model in opensoundscape as a default with all dependencies would be a good compromise with the rest of the models in the zoo. I am biased towards pytorch over tensorflow, all the tools are there.
Load the audio
audio = Audio.from_file("../birdclef-2025/train_audio/21038/iNat65519.ogg")
audio
that error made me think I had to normalize, or that normalize would be an argument.
Then I realized this is just connecting with jupyter notebooks.
Predicting with a prebuilt model
At first it felt like the Audio class is the centerpiece of the package, so I thought the predict function would operate on that class.
but that's not right, the prediction happens at the file level?
works. but now i'm confused, if i perform functions with the Audio class, lots of cool functionality there, do I then save that file to predict it?
Analyzing results
The function returns the embeddings, but most users won't know what that is, it feels like a similiar theory as above, I expected the softmax function to be the default, and the embeddings be an optional argument, not the reverse. Perhaps this is the difference in DeepForest versus opensoundscape users, we try to always prioritize the more novice user, since they are more likely to get discouraged earlier, anyone know who what an embedding is, will know to look for an argument for it.
Other assorted notes
or maybe m.labels. Clicking into the model link got me the repo, but no closer to accessing which species labels, you can see them after prediction. Give the brutal nature of taxonomy, especially in birds, connecting to some kind of taxize type thing feels key. Or atleast trying to document (for a default model) what taxonomy was used.
Fine-tuning
The annotation data structure isn't clear to me, most/all of the tutorial examples use 'BoxedAnnotations', which looking at source, assumes that you are subsetting longer recordings. What if the data you have is already clipped?
My data looks like
I'll update here as I continue. Thanks for your interest and amazing work. I know how hard it is.
Beta Was this translation helpful? Give feedback.
All reactions