Skip to content

SegmentationEvaluation

Robert Sachunsky edited this page Dec 10, 2021 · 6 revisions

This is about plans to implement a OCR-D processor for OLR evaluation. There are 2 possibilities:

Options

pixel-based layout evaluation

Both GT and prediction are presented as images, with (specially coded) color values as (multi-class) classification, e.g. foreground vs background, text vs graphics vs background, line labels vs background etc.

Evaluation calculates statistics of the usual (multi-class) error measures, e.g. precision, recall, F measure, class-averaged accuracy/success rate (mACC), frequency-weighted Jaccard index (fwIoU), average precision over confidence levels (mAP) etc.

Examples of existing tools:

geometry-based layout evaluation

Both GT and prediction are represented as PAGE-XML annotation. Specifically, we are using polygon coordinates and the segment and @type classification of the elements. We compare segments by extent and class, considering the following error cases:

  • Merge of two or more neighbouring segments
    (predicted segment overlaps multiple GT segments)
    • Sub-case: Allowable Merge
      (predicted segment equals GT segments if concatenated by reading-order / textline-order, e.g. 2 left-to-right top-to-bottom blocks above each other)
  • Split of a segment into two or more
    (GT segment overlaps multiple predicted segments)
    • Sub-case: Allowable Split
      (GT segment equals predicted segments if concatenated by reading-order / textline-order, e.g. 2 left-to-right top-to-bottom blocks above each other)
  • Miss of a segment
    (false negative; GT segment has no overlapping predicted segment)
  • Partial Miss of a segment
    (false negative; GT segment has no overlapping predicted segment without significant gaps)
  • False Detection of a segment
    (false positive; no GT segment overlaps predicted segment)
  • Misclassification of a segment
    (GT segment overlapping predicted segment do not belong to the same class or type)

Evaluation calculates error rates for all error cases relative to segment full area or segment foreground area, aggregated over segment types and/or error cases, and weighted by scenario-specific evaluation profiles. In addition, pixel-based precision/recall/F-measure are also given, either strict (with classification) or non-strict (only geometry).

Examples of existing tools:

Considerations

  • PAGE-XML annotated GT can (in general, provided that segments of the same hierarchy level do not overlap) be converted to image GT (i.e. colour-coded label maps). The reverse is not true, that's why PAGE-XML GT is not as easy to come by.
  • Conversion takes some computation time. Image GT takes (much) more space than PAGE-XML GT.
  • PAGE-XML GT allows identifying individual segments (distinguishing them from neighbours, even if they touch or overlap), while image GT does not (e.g. lines could be labelled incrementally, but that would make comparison difficult).
  • We probably have to implement both approaches anyway, to get comparable across different kinds of (new / already published) datasets.
  • For the pixel-based evaluation approach, as for the NN segmentation itself, it is probably best to wrap around an image-only API.
  • We should also explicitly support the error metric used by our NN segmentation in its objective function.
  • Computing overlaps between each pair of segments can take a lot of time (needs to be implemented efficiently).
  • Evaluation results should not only contain metrics/measures, but also the actual matches. PRImA's layout evaluation schema seems most suited for this (but the only GUI so far is LayoutEval itself).
Clone this wiki locally