Replies: 1 comment
-
I'm also curious about the reason behind the big performance gap between EasyOCR and Tesseract. For the attached image of a table, Docling parsed it perfectly with EasyOCR. But with Tesseract it produced something like
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello. I have been experimenting with Docling for a while and am impressed by its performance. Everything runs well in my local environment.
The only problem is that when I ran the same codes in a container environment, the CPU memory kept increasing until it went OOM and the container killed itself. I have figured out the problem is a memory leak caused by the
reader.readertxt
function in EasyOCR, and a similar issue was reported but unsolved under EasyOCR's repo. If I changed the OCR engine to Tesseract the problem is gone. But I still want to use EasyOCR as I observed its performance is way better.The piece of code I used is
My local environment is MacOS (M3) with 18G RAM. My container environment is Linux with 18G memory limit.
Docling version is the latest. Python version is 3.12.
Beta Was this translation helpful? Give feedback.
All reactions