This project extracts handwritten Kannada text from PDF images using PyMuPDF and Tesseract OCR. It processes images for better contrast using OpenCV, improving text recognition accuracy. The script also filters out unwanted patterns like URLs and digits, ensuring clean output. This tool is ideal for digitizing Kannada handwritten documents.
-
Updated
Sep 19, 2024 - Python