ImportError: Failed to import 'PDFToTextConverter' #4531
-
I am trying to use the Pdf to text converter and it is outputting the following error
I have then again installed pdf2image yet still it is not working. It is still giving me the same error even after installing haystack again with all its dependencies. Here is the code I am using. It is a basic code that I was to use to test how the pdf to text converter works before modifying it. this will be a demonstration of how the pdf to text converter works!from haystack.utils.import_utils import Path
from haystack.nodes import PDFToTextConverter
converter = PDFToTextConverter(remove_numeric_tables=True, valid_languages=["en"])
docs = converter.convert(file_path=Path("https://www.cdc.gov/cancer/breast/pdf/breast-cancer-fact-sheet-508.pdf", remove_numeric_tables=False, valid_languages=["en"])) Is there a solution to this kindly, or what should I do as a next step? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
I am facing the same problem right now. An answer will be greatly appreciated. |
Beta Was this translation helpful? Give feedback.
-
Hello. To use it properly before the official release of Haystack 1.15.0, Your code should work correctly (I tested it with a local PDF). Soon, when Haystack 1.15.0 will be released, you can install it by running: |
Beta Was this translation helpful? Give feedback.
-
Hi this has worked for me very well. Thanks for this suggestion! |
Beta Was this translation helpful? Give feedback.
Hello.
There has been a refactoring on this node lately (dropping xpdf in favor of PyMuPDF).
To use it properly before the official release of Haystack 1.15.0,
you can install the pre-release by running
pip install farm-haystack[pdf]==1.15.0-rc2
.Your code should work correctly (I tested it with a local PDF).
Soon, when Haystack 1.15.0 will be released, you can install it by running:
pip install farm-haystack[pdf]==1.15.0