Skip to content

Commit a5c14c6

Browse files
committed
update README
1 parent c8041ea commit a5c14c6

File tree

2 files changed

+27
-19
lines changed

2 files changed

+27
-19
lines changed

README.md

Lines changed: 26 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,44 @@
11
# ocrd_segment
22

3-
This repository aims to provide a number of [OCR-D-compliant processors](https://ocr-d.github.io/cli) for layout analysis and evaluation.
3+
This repository aims to provide a number of [OCR-D](https://ocr-d.de) [compliant](https://ocr-d.de/en/spec) [processors](https://ocr-d.de/en/spec/cli) for layout analysis and evaluation.
44

55
[![image](https://img.shields.io/pypi/v/ocrd_segment.svg)](https://pypi.org/project/ocrd_segment/)
66

77
## Installation
88

9-
In your virtual environment, run:
10-
```bash
11-
pip install .
12-
```
9+
In your [Python virtual environment](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/), run:
10+
11+
pip install ocrd_segment
12+
1313

1414
## Usage
1515

16-
- exporting page images (including results from preprocessing like cropping/masking, deskewing, dewarping or binarization) along with region polygon coordinates and metadata, also MS-COCO:
17-
- [ocrd-segment-extract-pages](ocrd_segment/extract_pages.py)
18-
- exporting region images (including results from preprocessing like cropping/masking, deskewing, dewarping or binarization) along with region polygon coordinates and metadata:
19-
- [ocrd-segment-extract-regions](ocrd_segment/extract_regions.py)
20-
- exporting line images (including results from preprocessing like cropping/masking, deskewing, dewarping or binarization) along with line polygon coordinates and metadata:
21-
- [ocrd-segment-extract-lines](ocrd_segment/extract_lines.py)
22-
- importing layout segmentations from other formats (mask images, MS-COCO JSON annotation):
23-
- [ocrd-segment-from-masks](ocrd_segment/import_image_segmentation.py)
24-
- [ocrd-segment-from-coco](ocrd_segment/import_coco_segmentation.py)
25-
- repairing layout segmentations (input file groups N >= 1, based on heuristics implemented using Shapely):
26-
- [ocrd-segment-repair](ocrd_segment/repair.py) :construction: (much to be done)
27-
- comparing different layout segmentations (input file groups N = 2, compute the distance between two segmentations, e.g. automatic vs. manual):
28-
- [ocrd-segment-evaluate](ocrd_segment/evaluate.py) :construction: (very early stage)
16+
Contains processors for various tasks:
17+
18+
- exporting segment images (including results from preprocessing like cropping/masking, deskewing, dewarping or binarization) along with polygon coordinates and metadata:
19+
- [ocrd-segment-extract-pages](ocrd_segment/extract_pages.py) (for pages, also exports [MS-COCO](https://cocodataset.org/) format)
20+
- [ocrd-segment-extract-regions](ocrd_segment/extract_regions.py) (for regions)
21+
- [ocrd-segment-extract-lines](ocrd_segment/extract_lines.py) (for lines, also exports text and .xlsx)
22+
- [ocrd-segment-extract-words](ocrd_segment/extract_words.py) (for words, also exports text)
23+
- [ocrd-segment-extract-glyphs](ocrd_segment/extract_glyphs.py) (for glyphs, also exports text)
24+
- importing layout segmentations from other formats:
25+
- [ocrd-segment-from-masks](ocrd_segment/import_image_segmentation.py) (for mask/label images, i.e. semantic segmentation)
26+
- [ocrd-segment-from-coco](ocrd_segment/import_coco_segmentation.py) (for [MS-COCO](https://cocodataset.org/) annotation)
27+
- post-processing or repairing layout segmentations:
28+
- [ocrd-segment-repair](ocrd_segment/repair.py) (validity and consistency of coordinates, reducing overlaps/redundancy between neighbours, shrinking regions to the alpha shape of their lines)
29+
- [ocrd-segment-project](ocrd_segment/project.py) (remake segment coordinates into the convex hull of their constituents)
30+
- [ocrd-segment-replace-original](ocrd_segment/replace_original.py) (rebase all segments on cropped+deskewed border frame as new full page)
31+
- [ocrd-segment-replace-page](ocrd_segment/replace_page.py) (2 input fileGrps; overwrite segmentation below page of first fileGrp by all segments of second fileGrp, rebasing all coordinates; "inverse" of `replace-original`)
32+
- comparing different layout segmentations:
33+
- [ocrd-segment-evaluate](ocrd_segment/evaluate.py) :construction: (2 input fileGrps; align, compare and evaluate page segmentations; early stage)
34+
- [page-segment-evaluate](ocrd_segment/evaluate.py) (same with standalone CLI)
2935
- pattern-based segmentation (input file groups N=1, based on a PAGE template, e.g. from Aletheia, and some XSLT or Python to apply it to the input file group)
3036
- `ocrd-segment-via-template` :construction: (unpublished)
3137
- data-driven segmentation (input file groups N=1, based on a statistical model, e.g. Neural Network)
3238
- `ocrd-segment-via-model` :construction: (unpublished)
3339

34-
For detailed description on input/output and parameters, see [ocrd-tool.json](ocrd_segment/ocrd-tool.json)
40+
For detailed behaviour, see `--help` on each processor CLI.
41+
For detailed description on input/output and parameters, see [ocrd-tool.json](ocrd_segment/ocrd-tool.json) or `--dump-json` on each processor CLI.
3542

3643
## Testing
3744

setup.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
- ocrd-segment-replace-original
1515
- ocrd-segment-replace-page
1616
- ocrd-segment-evaluate
17+
- page-segment-evaluate
1718
"""
1819
import codecs
1920

0 commit comments

Comments
 (0)