slim containers... #465

bertsky · 2025-04-29T11:00:50Z

Since we agreed it is not feasible to continue supporting native installation into a shared venv (along with risk of dependency clashes, necessity to find compromises or sidestep via sub-venv), which will not be needed for the network (WebAPI server-client) installation anyway, this implements the first step: keeping backwards-compatible CLI interfaces, but delegating to Docker images throughout. (The second step will be about replacing or complementing the CLI interfaces with server-client setup, in line with #449.)

generate executables by delegating to slim-container Docker images
automagically prepare a shared named volume for models with user-friendly permissions and copying pre-installed models
create convenient CLIs for ocrd resmgr in each slim image
remove unnecessary native installation rules and definitions
remove unnecessary fat-container Docker build rules and definitions
find a new solution for the ocrd-all-*.json targets
...

…and fat-container Docker builds

stweil · 2025-04-29T12:27:06Z

we agreed it is not feasible to continue supporting native installation into a shared venv

Did we? Will there be support for another kind of native installation? I have no intention to use a dockerized OCR-D.

bertsky · 2025-04-29T12:58:32Z

Did we? Will there be support for another kind of native installation? I have no intention to use a dockerized OCR-D.

We talked about this over and over, and repeatedly asked for commentary – esp. in the Tech Call. I kept this alive for a few years with lots and lots of effort, but not only is my time limited – with slim containers, there is no use for this anymore. Container images are much better anyway.

You can still install modules individually from their respective readmes if you want.

bertsky · 2025-04-30T09:30:20Z

So, to illustrate, if you make all (with default options DOCKER_PULL_POLICY=pull DOCKER_VOL_MODELS=ocrd-models DOCKER_RUN_OPTS="-v $(DOCKER_VOL_MODELS):/usr/local/share/ocrd-resources -v $$PWD:/data -u $$UID"), then this will docker pull all images and install a delegator shell script under venv/bin/ocrd-... for each executable, so for example ocrd-tesserocr-recognize will become:

#!/usr/bin/env bash
docker run --rm "${DOCKER_RUN_OPTS[@]}" -v ocrd-models:/usr/local/share/ocrd-resources -v $PWD:/data -u $UID ocrd/tesserocr ocrd-tesserocr-recognize "$@"

It will then proceed to build ocrd-all-tool.json and ocrd-all-meta.json from the checked out ocrd-tool.json of every submodule.

If passing DOCKER_PULL_POLICY=build, then in each checked out submodule, a respective make docker will be run to rebuild the images locally, instead of pulling them from Dockerhub.

To just pull (or build) the images, without (re-)installing the executable shell scripts, do make images.

Finally, to initialise the named volume ocrd-models from the pre-installed processor resources in the images and fix their permissions, just do make init-vol-models once.

To then manage processor resources (list or download), there are now additional delegator shell scripts for every image that just wrap the ocrd CLI, respectively. For example, to see what is installed for ocrd/tesserocr, do ocrd-tesserocr-ocrd resmgr list-installed -e ocrd-tesserocr-recognize. To install all registered models, do ocrd-tesserocr-ocrd resmgr download ocrd-tesserocr-recognize "*". (That's exactly what make install-models-tesseract now does.)

(We cannot just use ocrd resmgr for this directly, as that only delegates to the ocrd/core image, which has no other processors installed, so it does not know about any resources.)

Using the processor CLIs is as simple as calling them by name, like in the native installation before, but now these will automatically start the respective container, mount the model volume, mount the current working directory into /data (the internal CWD) and run the processor in there.

So what is not possible anymore is using multi-processor tools like ocrd process ... (as ocrd just delegates to the ocrd/core image, which has no processors besides ocrd-dummy and ocrd-filter, and you cannot spin up containers from other containers), or ocrd-make (as that's just the ocrd/workflow-configuration image, which only has ocrd-page-transform installed internally). One could install core and workflow-configuration natively, so when ocrd process ... or ocrd-make start other processors, they get to use the delegator scripts...

delegate to slim-container Docker images, remove native installation …

0acbfa1

…and fat-container Docker builds

adapt ocrd-all-*.json rules, separate init-vol-models from images / all

d32441c

bertsky mentioned this pull request Jul 4, 2025

slim containers, local or ocrd_network #468

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

slim containers... #465

slim containers... #465

Uh oh!

bertsky commented Apr 29, 2025 •

edited

Loading

Uh oh!

stweil commented Apr 29, 2025

Uh oh!

bertsky commented Apr 29, 2025

Uh oh!

bertsky commented Apr 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

slim containers... #465

Are you sure you want to change the base?

slim containers... #465

Uh oh!

Conversation

bertsky commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stweil commented Apr 29, 2025

Uh oh!

bertsky commented Apr 29, 2025

Uh oh!

bertsky commented Apr 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bertsky commented Apr 29, 2025 •

edited

Loading