Memory leak caused by EasyOCR in docker #1401

evelyn023 · 2025-04-15T19:28:41Z

evelyn023
Apr 15, 2025

Hello. I have been experimenting with Docling for a while and am impressed by its performance. Everything runs well in my local environment.
The only problem is that when I ran the same codes in a container environment, the CPU memory kept increasing until it went OOM and the container killed itself. I have figured out the problem is a memory leak caused by the reader.readertxt function in EasyOCR, and a similar issue was reported but unsolved under EasyOCR's repo. If I changed the OCR engine to Tesseract the problem is gone. But I still want to use EasyOCR as I observed its performance is way better.

The piece of code I used is

pipeline_options = PdfPipelineOptions()
pipeline_options.do_ocr = True
pipeline_options.do_table_structure = True
pipeline_options.table_structure_options.do_cell_matching = True
pipeline_options.ocr_options = EasyOcrOptions(force_full_page_ocr=True, lang=['es'])
pipeline_options.accelerator_options = AcceleratorOptions(
        num_threads=8, device=AcceleratorDevice.CPU
    )
doc_converter = DocumentConverter(
     format_options={
        InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
     }
 )
conv_result = doc_converter.convert(input_doc_path)

My local environment is MacOS (M3) with 18G RAM. My container environment is Linux with 18G memory limit.
Docling version is the latest. Python version is 3.12.

evelyn023 · 2025-04-16T20:14:03Z

evelyn023
Apr 16, 2025
Author

I'm also curious about the reason behind the big performance gap between EasyOCR and Tesseract. For the attached image of a table, Docling parsed it perfectly with EasyOCR. But with Tesseract it produced something like

| Lt. PROVISIO                                                     |
|------------------------------------------------------------------|
| Outputs collectible accredited 2,402 2,139                       |
| City shore puts 3,9/6 4,455                                      |
| Suspended 2,946 832                                              |
| Fold toshiba recovery 6,114 9,9/8                                |
| Sunshine barbara pontiac (48) (9,936)                            |
| Aggregate voyuer examines 300 1,482                              |
| Flights succeed daughters Ld Jd Jd ee ae ee ee ee a f,6/0  merry |
| After replaced 3,/39 240                                         |
| rm amanda villages de 2,3/3 6,094                                |
| Nigeria strategic neither titanium (6,354) (2,694)               |
| Thal replace terrorism 3,/QO/ 2,636                              |
| Postposted Invalid — BE Weettiws 4 5/9 creations 5               |
| Sacred behavior henry 3,836 9,610                                |
| riltering (/,490) (6,8//)                                        |
| Married silicon jim request 2,208 4,04/                          |
| Mailed (6,139)                                                   |
| Valley hospitality 5 A75 descending                              |
| lawyer leon 3,915 922                                            |

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory leak caused by EasyOCR in docker #1401

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Memory leak caused by EasyOCR in docker #1401

Uh oh!

Uh oh!

evelyn023 Apr 15, 2025

Replies: 1 comment

Uh oh!

Uh oh!

evelyn023 Apr 16, 2025 Author

evelyn023
Apr 15, 2025

evelyn023
Apr 16, 2025
Author