Explore the Tool »
Developer's contact
This is an all-in-one tool:
To convert PDF pages to images To extract text from PDF Documents using Optical Character Recognition (using pytesseract).
PDF pages --> Images of pages --> Text extracted with OCR
Run - python main.py <pdf_file_path>
List of python libraries you need to implement the project.
- pdf2image
- pillow
- pytesseract
Any contributions/suggestions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b PDF-to-Text-Tool/suggestion
) - Commit your Changes (
git commit -m 'Add some suggestion'
) - Push to the Branch (
git push origin PDF-to-Text-Tool/suggestion
) - Open a Pull Request
Ritvik Patil - pritvik0@gmail.com
Project Link: https://github.com/RitvikPatil/PDF-to-Text-Tool