qwen2-5-vl

Star

Here are 19 public repositories matching this topic...

sophgo / LLM-TPU

Star

Run generative AI models in sophgo BM1684X/BM1688

large-language-models llm generative-ai llm-inference bm1684x llama3 qwen3 qwen2-5-vl bm1688 internvl3

Updated Sep 9, 2025
C++

PRITHIVSAKTHIUR / OCR-ReportLab-Notebooks

Star

A dedicated Colab notebooks to experiment (Nanonets OCR, Monkey OCR, OCRFlux 3B, Typhoo OCR 3B & more..) On T4 GPU - free tier

pdf ocr docx document gradio reportlab ocr-recognition huggingface-transformers vision-transformer reportlab-pdf qwen2-5-vl

Updated Jul 29, 2025
Jupyter Notebook

PRITHIVSAKTHIUR / Multimodal-OCR

Star

Vision Language Model : tailored for tasks that involve [messy] optical character recognition (ocr), image-to-text conversion, and math problem solving with latex formatting.

pillow video-processing opencv-python video-understanding ocr-recognition ocr-python huggingface-transformers qwen2-vl-2b qwen2-5-vl monkey-ocr

Updated Jul 26, 2025
Python

yuanc3 / DATE

Star

Use 2 lines to empower absolute time awareness for Qwen2.5VL

qwen2-5-vl qwen3-vl

Updated Sep 12, 2025
Python

smsk-01 / GRPO-Trainer-Images

Star

GRPO trainer for VLM

images grpo qwen2-5-vl grpovlm grpoimages

Updated Apr 22, 2025
Python

PRITHIVSAKTHIUR / Tiny-VLMs-Lab

Star

Tiny VLMs Lab is a Hugging Face Space and open-source project showcasing lightweight Vision-Language Models for image captioning, OCR, reasoning, and multimodal understanding. It offers a simple Gradio interface to upload images, query models, adjust generation settings, and export results in Markdown or PDF.

ocr cuda optical-character-recognition gradio multimodality captioning-images huggingface-transformers vision-transformer hugging-face huggingface-spaces vision-language-model flash-attention-2 vlms qwen2-5-vl

Updated Sep 4, 2025
Python

PRITHIVSAKTHIUR / VisionScope-R2

Star

thinking/reasoning multimodal/vision-language model (VLM) trained to enhance spatial reasoning

ocr gradio vlm huggingface-transformers llm vision-language-model ollama qwen2-5-vl

Updated Sep 9, 2025
Python

PRITHIVSAKTHIUR / Doc-VLMs-v2-Localization

Star

Doc-VLMs-v2-Localization is a demo app for the Camel-Doc-OCR-062825 model, fine-tuned from Qwen2.5-VL-7B-Instruct for advanced document retrieval, extraction, and analysis. It enhances document understanding and also integrates other notable Hugging Face models.

ocr table gradio document-retrieval ocr-recognition vision-language 7b huggingface-transformers vision-transformer qwen2-5-vl

Updated Jul 13, 2025
Python

Kathan-max / RAG-Enhanced-Chatbot-with-LoRA-Fine-Tuning

Star

Transform your documents into intelligent conversations. This open-source RAG chatbot combines semantic search with fine-tuned language models (LLaMA, Qwen2.5VL-3B) to deliver accurate, context-aware responses from your own knowledge base. Join our community!

Updated Aug 13, 2025
Python

cilabuniba / artseek

Star

ArtSeek: Deep artwork understanding via multimodal in-context reasoning and late interaction retrieval

computer-vision deep-learning multimodal-learning multimodal vision-language large-language-models llm mllm multimodal-large-language-models retrieval-augmented-generation qwen qwen2-5 qwen2-5-vl

Updated Aug 6, 2025
Jupyter Notebook

PRITHIVSAKTHIUR / Multimodal-VLMs

Star

A comprehensive Gradio-based interface for running multiple state-of-the-art Vision-Language Models (VLMs) for Optical Character Recognition (OCR) and Visual Question Answering (VQA) tasks.

ocr pillow torch gradio multimodality torchvision huggingface-transformers vision-transformer multimodal-large-language-models vlms qwen2-5-vl

Updated Jul 19, 2025
Python

chenking2020 / zeroshot-astock-predict

Star

汇集ARIMA、时序大模型、视觉大模型等主流或前沿零样本无训练股价预测算法，并在实际数据测试对比下，寻找适合A股的最佳预测模型

time-series stock chronos arima predict zeroshot vllm qwen2-5-vl

Updated Jul 12, 2025
Python

ppraneth / MedScan-AI

Star

MedScan AI is a modular multimodal medical assistant designed to streamline clinical data processing.

python3 generative-ai vision-language-model qwen2-5-vl

Updated Jul 14, 2025
Python

PRITHIVSAKTHIUR / Cosmos-x-DocScope

Star

Understand physical common sense and generate appropriate embodied decisions. optimized for document-level optical character recognition, long-context vision-language understanding. build with hand-curated dataset for text-to-image models, providing significantly more detailed descriptions or captions of given images.

ocr torch torchvision huggingface-transformers vision-transformer qwen2-5-vl