How to increase the detection of small objects on grey-scale/monochromatic images such as music scores? #5243

fablau · 2024-03-22T22:46:38Z

fablau
Mar 22, 2024

Hello here.

I have been testing small-object detection on music scores via Detectron 2, and I got pretty good results, but they are not good enough for small objects such as note heads or stems, and I am wondering if there is a way to improve that, or I'd need a completely different approach (maybe a different system by Detectron?)

I am using the largest music score dataset available online (Deepscores V2), so I have a pretty good dataset of over 100,000 images. In my specific current case, I am trying to detect the stems of notes, and as you can see from the example below, the model I have trained after over 8,000 iterations already gives good results:

But that's not enough. I'd like to be able to detect almost all the stems on that score, just as an example. And unfortunately, I don't see any improvement with Detectron beyond that. The total loss starts bouncing after around 8000 iterations and I see no improvement after that.

Here is what I have tried:

I have tried different starting models for music score detection, and the best one I found is the "faster_rcnn_X_101_32x8d_FPN_3x.yaml"

I have tried different batch sizes (16, 32, 64, 128), and in the case of note stems, a batch size of 64 seems to work best. But besides all that, I don't know what else to do to improve the detection of small objects on the music score.

Here is the simple Python program I have set up for training:

############################
import detectron2
from detectron2.utils.logger import setup_logger
from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2 import model_zoo
from detectron2.data.datasets import register_coco_instances
import os

setup_logger()

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_X_101_32x8d_FPN_3x.yaml"))
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_X_101_32x8d_FPN_3x.yaml")
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.001
cfg.SOLVER.MAX_ITER = 10000
cfg.SOLVER.CHECKPOINT_PERIOD = 2000
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 64
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 2

json_path = 'data/json'
images_path = 'data/images_selected'

#Starts loading of annotation files...
json_files = []
coco_names = []
cc = 0
for root, dirs, files in os.walk(json_path):
    for file in files:
        if file.endswith(".json"):

            fileHr = "data/json/" + file
            #print ("fileHr: ", fileHr)

            json_files.append(fileHr)

            coco_names.append("coco_train_" + str(cc))

            register_coco_instances("coco_train_" + str(cc), {}, fileHr, "data/images_selected")

            cc += 1
            

cfg.DATASETS.TRAIN = (coco_names)

# Initialize the model using the configuration
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)

# Train the model
trainer.resume_or_load(resume=False)
trainer.train()

#Save configuration for later use...
with open("mycfgALL.yaml", "w") as f:
    f.write(cfg.dump())

############################

Do you have any ideas I could try? Or would you suggest a different approach?

I look forward to hearing from you.

Thank you in advance.

Answered by dlegor

Jun 24, 2025

My recommendation is to use your model with another pre- and post-processing approach, such as sahi. This will improve the detection of small objects, but may require post-processing to eliminate incorrect detections. An example would be the following: train a model with images or subimages of 640 pixels in size and inference with 256-pixel slices.

View full answer

dlegor · 2025-06-24T13:00:18Z

dlegor
Jun 24, 2025

My recommendation is to use your model with another pre- and post-processing approach, such as sahi. This will improve the detection of small objects, but may require post-processing to eliminate incorrect detections. An example would be the following: train a model with images or subimages of 640 pixels in size and inference with 256-pixel slices.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to increase the detection of small objects on grey-scale/monochromatic images such as music scores? #5243

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to increase the detection of small objects on grey-scale/monochromatic images such as music scores? #5243

Uh oh!

Uh oh!

fablau Mar 22, 2024

Replies: 1 comment

Uh oh!

dlegor Jun 24, 2025

fablau
Mar 22, 2024

dlegor
Jun 24, 2025