Modifying Detectron2 to consume 4 channel input #5466

Kolkhoznyk · 2025-05-15T11:22:29Z

Kolkhoznyk
May 15, 2025

Hello,

i want to pass the movement information to make Deterctron2 pay more attention to moving objects. I created the custom data mapper which loads 2 images from the dataset, calculates the motion strength map and stacks this motion strength map as 4th cannel in the image. I also customized the backbone to take the 4 channel images. It seems to work and gives the right dimensions but crashes after 1st batch. Somehow it does not manage to proceed through batches.

✅ Mapper is executing
image_flow_shape torch.Size([4, 1080, 1920])
✅ Mapper is executing
image_flow_shape torch.Size([4, 1080, 1920])
✅ Mapper is executing
image_flow_shape torch.Size([4, 1080, 1920])
✅ Mapper is executing
image_flow_shape torch.Size([4, 1080, 1920])
✅ Mapper is executing
image_flow_shape torch.Size([4, 1080, 1920])
🧪 data[0]['image'].shape = torch.Size([4, 1080, 1920])
🧪 data[1]['image'].shape = torch.Size([4, 1080, 1920])
🧪 data[2]['image'].shape = torch.Size([4, 1080, 1920])
🧪 data[3]['image'].shape = torch.Size([4, 1080, 1920])
🧪 data[4]['image'].shape = torch.Size([4, 1080, 1920])
🧪 data[5]['image'].shape = torch.Size([4, 1080, 1920])
🧪 data[6]['image'].shape = torch.Size([4, 1080, 1920])
🧪 data[7]['image'].shape = torch.Size([4, 1080, 1920])
✅ Mapper is executing
image_flow_shape torch.Size([4, 1080, 1920])
✅ Mapper is executing
image_flow_shape torch.Size([4, 1080, 1920])
✅ Mapper is executing
image_flow_shape torch.Size([4, 1080, 1920])
ERROR [05/15 12:41:07 d2.engine.train_loop]: Exception during training:
RuntimeError: Given groups=1, weight of size [64, 4, 7, 7], expected input[8, 3, 1080, 1920] to have 4 channels, but got 3 channels instead

so it loads the correct size of the images and the model also correctly expects the 4 channels as input and the debug statements say that it works until the end of the 1st batch and then it crushes again and gives 3 channels [8, 3, 1080, 1920] instead of [8, 4, 1080, 1920]. if i increase batch size it iterates until

🧪 data[15]['image'].shape = torch.Size([4, 1080, 1920])
and then the same error.

I suppose the problem is in building batches. Maybe it's hardcoded to 3 channels somehow or my custom datamapper fails to iterate correctly through dataset after the 1st batch. Maybe it's not enough to just write the custom mapper, call it in training and adjust the backbone. I am not using any data augmentation or transformation which may cause the issue.

here is my custom data mapper code

class SequentialDatasetMapper(DatasetMapper):
def init(self, cfg, is_train=True):
super().init(cfg, is_train)

def __call__(self, dataset_dict):
    # Ensure we don't modify the original dict
    dataset_dicts = copy.deepcopy(dataset_dict)
    
        # Read current image
    file_path = dataset_dicts["file_name"]
    image = utils.read_image(file_path)

     # Extract number from filename
    match = re.search(r"(\d+)", os.path.basename(file_path))
    if match:
        frame_number = int(match.group(1))
        next_frame_number = frame_number + 1
        # Construct next image filename
        next_filename = file_path.replace(str(frame_number), str(next_frame_number))
       
        if os.path.exists(next_filename):
             image2 = utils.read_image(next_filename)
        else:
            print(f"Warning: next image not found: {next_filename}")
            image2 = image  # fallback
    else:
        image2 = image 


    image_tensor = torch.tensor(image.transpose(2, 0, 1))
    optical_flow = calculate_optical_flow(image, image2)
    optical_flow_resize = cv2.resize(optical_flow, (image.shape[1], image.shape[0]))
    flow_tensor = torch.from_numpy(optical_flow_resize).unsqueeze(0)
    image_with_flow = torch.cat((image_tensor, flow_tensor), dim=0)
    annotations = dataset_dicts.get("annotations", [])
    for ann in annotations:
        ann.pop("keypoints", None) 
    instances_frame = utils.annotations_to_instances(annotations, image.shape[1:])

    print("✅ Mapper is executing")

    dataset_dicts["image"] = image_with_flow
    print("image_flow_shape", dataset_dicts["image"].shape)
    dataset_dicts["height"], dataset_dicts["width"] = image_with_flow.shape[1], image_with_flow.shape[2]
    dataset_dicts["instances"] = instances_frame  # if you're using annotations

    return dataset_dicts

calculate_optical_flow is an external function called from another file.

Any ideas and suggestions? would be very helpful! Thanks a lot in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Modifying Detectron2 to consume 4 channel input #5466

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Modifying Detectron2 to consume 4 channel input #5466

Uh oh!

Kolkhoznyk May 15, 2025

Replies: 0 comments

Kolkhoznyk
May 15, 2025