-
Hello here. I have been testing small-object detection on music scores via Detectron 2, and I got pretty good results, but they are not good enough for small objects such as note heads or stems, and I am wondering if there is a way to improve that, or I'd need a completely different approach (maybe a different system by Detectron?) I am using the largest music score dataset available online (Deepscores V2), so I have a pretty good dataset of over 100,000 images. In my specific current case, I am trying to detect the stems of notes, and as you can see from the example below, the model I have trained after over 8,000 iterations already gives good results: But that's not enough. I'd like to be able to detect almost all the stems on that score, just as an example. And unfortunately, I don't see any improvement with Detectron beyond that. The total loss starts bouncing after around 8000 iterations and I see no improvement after that. Here is what I have tried: I have tried different starting models for music score detection, and the best one I found is the "faster_rcnn_X_101_32x8d_FPN_3x.yaml" I have tried different batch sizes (16, 32, 64, 128), and in the case of note stems, a batch size of 64 seems to work best. But besides all that, I don't know what else to do to improve the detection of small objects on the music score. Here is the simple Python program I have set up for training:
Do you have any ideas I could try? Or would you suggest a different approach? I look forward to hearing from you. Thank you in advance. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
My recommendation is to use your model with another pre- and post-processing approach, such as sahi. This will improve the detection of small objects, but may require post-processing to eliminate incorrect detections. An example would be the following: train a model with images or subimages of 640 pixels in size and inference with 256-pixel slices. |
Beta Was this translation helpful? Give feedback.
My recommendation is to use your model with another pre- and post-processing approach, such as sahi. This will improve the detection of small objects, but may require post-processing to eliminate incorrect detections. An example would be the following: train a model with images or subimages of 640 pixels in size and inference with 256-pixel slices.