Skip to content

AnnotationParser is a universal Python library that parses annotation files from different formats (LabelMe, COCO, VOC, etc.) into a single unified Shape data structure. This allows you to work with annotations using a consistent interface, regardless of the original format.

License

Notifications You must be signed in to change notification settings

omigutin/annotation_parser

Repository files navigation

Python AnnotationParser

🇷🇺 Read in Russian

AnnotationParser is a universal Python library that parses annotation files from various formats (LabelMe, COCO, VOC, etc.) and converts them into a single, unified Shape type. This approach allows you to read, filter, and save shapes using the same interface, regardless of the original annotation format.

Note: Currently only the LabelMe format is fully implemented and tested. Other formats are planned for future releases (see Limitations & Roadmap).


Table of Contents


Features

  • Unified API for reading, saving, and filtering shapes in annotation files
  • Converts any supported format into a universal Shape type for downstream processing
  • Extensible adapter system for multiple formats (LabelMe, COCO, VOC, ...)
  • Functional and OOP usage styles
  • High-level filtering and transformation functions for shape objects
  • Clean, type-safe, and well-documented codebase

Usage Examples

Installation

Python 3.10+ required

Install with pip (recommended for most users):

pip install annotation-parser

Or, if you have the source code locally:

pip install -e .

Parse and Filter (LabelMe)

from annotation_parser import create, get_shapes_by_label

file = "tests/labelme/labelme_test.json"
parser = create(file, "labelme")
shapes = parser.parse()  # tuple of Shape

# Get all shapes with label "person"
persons = get_shapes_by_label(shapes, "person")
print(persons)

Save Annotations

from annotation_parser import save_labelme

save_labelme(shapes, "result.json", backup=True)

Filter by Working Zone, Group Number, Custom Predicate

from annotation_parser import get_shapes_by_wz_number, get_shapes_by_number, filter_shapes

# Filter by working zone (wz_number)
zone2 = get_shapes_by_wz_number(shapes, wz_number=2)

# Filter by instance/group number
group_1 = get_shapes_by_number(shapes, number=1)

# Filter with any condition (lambda)
big_shapes = filter_shapes(shapes, lambda s: hasattr(s, "coords") and len(s.coords) > 3)

OOP Style (Advanced)

from annotation_parser import create

parser = create("tests/labelme/labelme_test.json", "labelme")
shapes = parser.parse()
# You can call parser.save(), parser.parse(), parser.filter_shapes() if needed

Functional Style (shortcut)

from annotation_parser import parse_labelme

shapes = parse_labelme("tests/labelme/labelme_test.json")

Command-Line Interface (CLI) [experimental]

Experimental! Not fully tested. See cli.py for current options.

python cli.py parse --file tests/labelme/labelme_test.json --adapter labelme
python cli.py save --file tests/labelme/labelme_test.json --adapter labelme --out result.json --backup
python cli.py filter --file tests/labelme/labelme_test.json --adapter labelme --label crop

Supported Formats

Format Status
LabelMe ✅ Supported
COCO 🕑 Planned
Pascal VOC 🕑 Planned
YOLO 🕑 Planned
... (Suggest yours!)

💡 Want to see your annotation format supported? Open an issue or PR — or help me implement a new adapter! Any contribution or feedback on new formats is very welcome.


Limitations & Roadmap

  • LabelMe format is currently the only one fully implemented and tested.
  • Adapters for other formats (COCO, Pascal VOC, YOLO, etc.) are planned, not yet implemented.
  • Standard logging (with configurable log levels and error handling) will be added in future releases.
  • The command-line interface (cli.py) is experimental and not fully tested; improvements needed.

Contributing

  • PRs, bug reports, and suggestions are welcome!
  • For new formats, contribute an adapter in src/annotation_parser/adapters/
  • All code should be type-checked (mypy), formatted (black), and covered by tests (pytest).

Development & Testing

  • Install dev dependencies:

    poetry install --with dev
  • Run tests:

    pytest

FAQ / Common Issues

Q: Why do only LabelMe files work? A: Only the LabelMe adapter is currently implemented. COCO/VOC support is planned.

Q: CLI throws errors or doesn't work as expected? A: cli.py is not fully tested. Check Limitations & Roadmap and use the Python API for production.


Author

Telegram GitHub

Project: github.com/omigutin/annotation_parser Project Tracker: annotation_parser Project Board Contact: migutin83@yandex.ru


License

MIT License. See LICENSE for details.

About

AnnotationParser is a universal Python library that parses annotation files from different formats (LabelMe, COCO, VOC, etc.) into a single unified Shape data structure. This allows you to work with annotations using a consistent interface, regardless of the original format.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages