Skip to content

Early training bias due to random sampling with uneven # from sites or cameras #34

@abfleishman

Description

@abfleishman

Another thought that I have been pondering with the active-learning pipeline is how to avoid biasing your detector to the types of images that you labeled first? For instance, say I have 10 cameras and each camera has taken a different number of images from 1000 to 100,000 over a 1-year deployment. If you label 100 randomly selected images to start with, the majority will be from the camera that took the most images, and maybe the background in that camera is distinct. if you train a model with those initial 100 images, it may be highly biased toward detecting things in images from that camera (because of some characteristic of those images). Images from other cameras might not even have detections and might not get "served" to the person tagging?
Essentially I see it as the same idea as the class imbalance, but instead, it is an imbalance in the raw data. How/does this normally get addressed in active learning?

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs testingWorkaround is provided, needs verification

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions