Replies: 1 comment
-
https://pypdf2.readthedocs.io/en/latest/user/extract-text.html
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Some PDF files might be scanned documents, consist of images instead of texts. Some PDF documents might have images beside texts. For both situations, we lose some information.
extract_text() function of PyPDF2 can be extended to process the images automatically as well as usual texts. It would make our life easier. Though I don't know the backend. Is that possible to implement?
Beta Was this translation helpful? Give feedback.
All reactions