Wouldn't this be more versatile if it were integrated into [ocr-fileformat](https://github.com/UB-Mannheim/ocr-fileformat) / [ocrd_fileformat](https://github.com/OCR-D/ocrd_fileformat) ?