ImportError: Failed to import 'PDFToTextConverter' #4531

Chance-Obondo · 2023-03-28T20:00:06Z

Chance-Obondo
Mar 28, 2023

I am trying to use the Pdf to text converter and it is outputting the following error

ImportError: Failed to import 'PDFToTextConverter', which is an optional component in Haystack. Run 'pip install 'farm-haystack[ocr]'' to install the required dependencies and make this component available. (Original error: Failed to import 'haystack.nodes.file_converter.pdf', which is an optional component in Haystack. Run 'pip install 'farm-haystack[ocr]'' to install the required dependencies and make this component available. (Original error: No module named 'pdf2image'))

I have then again installed pdf2image yet still it is not working. It is still giving me the same error even after installing haystack again with all its dependencies.

Here is the code I am using. It is a basic code that I was to use to test how the pdf to text converter works before modifying it.

this will be a demonstration of how the pdf to text converter works!

from haystack.utils.import_utils import Path
from haystack.nodes import PDFToTextConverter

converter = PDFToTextConverter(remove_numeric_tables=True, valid_languages=["en"])

docs = converter.convert(file_path=Path("https://www.cdc.gov/cancer/breast/pdf/breast-cancer-fact-sheet-508.pdf", remove_numeric_tables=False, valid_languages=["en"]))

Is there a solution to this kindly, or what should I do as a next step?

Answered by anakin87

Mar 29, 2023

Hello.
There has been a refactoring on this node lately (dropping xpdf in favor of PyMuPDF).

To use it properly before the official release of Haystack 1.15.0,
you can install the pre-release by running pip install farm-haystack[pdf]==1.15.0-rc2.

Your code should work correctly (I tested it with a local PDF).

Soon, when Haystack 1.15.0 will be released, you can install it by running:
pip install farm-haystack[pdf]==1.15.0

View full answer

abdusalam7474 · 2023-03-29T07:37:05Z

abdusalam7474
Mar 29, 2023

I am facing the same problem right now. An answer will be greatly appreciated.

0 replies

anakin87 · 2023-03-29T12:38:25Z

anakin87
Mar 29, 2023
Maintainer

Hello.
There has been a refactoring on this node lately (dropping xpdf in favor of PyMuPDF).

To use it properly before the official release of Haystack 1.15.0,
you can install the pre-release by running pip install farm-haystack[pdf]==1.15.0-rc2.

Your code should work correctly (I tested it with a local PDF).

Soon, when Haystack 1.15.0 will be released, you can install it by running:
pip install farm-haystack[pdf]==1.15.0

0 replies

Chance-Obondo · 2023-03-30T09:16:30Z

Chance-Obondo
Mar 30, 2023
Author

Hi this has worked for me very well. Thanks for this suggestion!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ImportError: Failed to import 'PDFToTextConverter' #4531

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

ImportError: Failed to import 'PDFToTextConverter' #4531

Uh oh!

Uh oh!

Chance-Obondo Mar 28, 2023

this will be a demonstration of how the pdf to text converter works!

Replies: 3 comments

Uh oh!

abdusalam7474 Mar 29, 2023

Uh oh!

anakin87 Mar 29, 2023 Maintainer

Uh oh!

Chance-Obondo Mar 30, 2023 Author

Chance-Obondo
Mar 28, 2023

abdusalam7474
Mar 29, 2023

anakin87
Mar 29, 2023
Maintainer

Chance-Obondo
Mar 30, 2023
Author