Vision Language Model : tailored for tasks that involve [messy] optical character recognition (ocr), image-to-text conversion, and math problem solving with latex formatting.
pillow video-processing opencv-python video-understanding ocr-recognition ocr-python huggingface-transformers qwen2-vl-2b qwen2-5-vl monkey-ocr
-
Updated
Jul 26, 2025 - Python