GPU-Accelerated Batching for pages of a PDF during Inference

### Question

Is there a way we can improve inference latency of Docling on a GPU by creating a batch of page images as an input to the different models - EasyOCR, Layout Detection and TableFormer?

I am using a single A10 GPU for inference, and it is significantly underutilized (~15%). It would be ideal if we can batch 

Looking into the Docling documentation, I have tried increasing `num_threads`, but that seems to only work for CPU and not GPUs.
When I did a little digging into the code I saw that docling iterates over the pages in a page_batch only passes a single page as an input to these models like so:

```python
def __call__(
        self, conv_res: ConversionResult, page_batch: Iterable[Page]
    ) -> Iterable[Page]:
        for page in page_batch:
            assert page._backend is not None
            if not page._backend.is_valid():
                yield page
            else:
                with TimeRecorder(conv_res, "layout"):
                    assert page.size is not None
                    page_image = page.get_image(scale=1.0)
                    assert page_image is not None

                    clusters = []
                    for ix, pred_item in enumerate(
                        self.layout_predictor.predict(page_image)
                    ):
                        label = DocItemLabel(
                            pred_item["label"]
                            .lower()
                            .replace(" ", "_")
                            .replace("-", "_")
                        )
                      ........
                      ........
```

It would be great if we can do batching of the page images and maximize the GPU capabilities.
Looking forward to hearing back!
Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPU-Accelerated Batching for pages of a PDF during Inference #1669

Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPU-Accelerated Batching for pages of a PDF during Inference #1669

Description

Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions