Run generative AI models in sophgo BM1684X/BM1688
-
Updated
Sep 9, 2025 - C++
Run generative AI models in sophgo BM1684X/BM1688
A dedicated Colab notebooks to experiment (Nanonets OCR, Monkey OCR, OCRFlux 3B, Typhoo OCR 3B & more..) On T4 GPU - free tier
Vision Language Model : tailored for tasks that involve [messy] optical character recognition (ocr), image-to-text conversion, and math problem solving with latex formatting.
Use 2 lines to empower absolute time awareness for Qwen2.5VL
Tiny VLMs Lab is a Hugging Face Space and open-source project showcasing lightweight Vision-Language Models for image captioning, OCR, reasoning, and multimodal understanding. It offers a simple Gradio interface to upload images, query models, adjust generation settings, and export results in Markdown or PDF.
thinking/reasoning multimodal/vision-language model (VLM) trained to enhance spatial reasoning
Doc-VLMs-v2-Localization is a demo app for the Camel-Doc-OCR-062825 model, fine-tuned from Qwen2.5-VL-7B-Instruct for advanced document retrieval, extraction, and analysis. It enhances document understanding and also integrates other notable Hugging Face models.
Transform your documents into intelligent conversations. This open-source RAG chatbot combines semantic search with fine-tuned language models (LLaMA, Qwen2.5VL-3B) to deliver accurate, context-aware responses from your own knowledge base. Join our community!
ArtSeek: Deep artwork understanding via multimodal in-context reasoning and late interaction retrieval
A comprehensive Gradio-based interface for running multiple state-of-the-art Vision-Language Models (VLMs) for Optical Character Recognition (OCR) and Visual Question Answering (VQA) tasks.
汇集ARIMA、时序大模型、视觉大模型等主流或前沿零样本无训练股价预测算法,并在实际数据测试对比下,寻找适合A股的最佳预测模型
MedScan AI is a modular multimodal medical assistant designed to streamline clinical data processing.
Understand physical common sense and generate appropriate embodied decisions. optimized for document-level optical character recognition, long-context vision-language understanding. build with hand-curated dataset for text-to-image models, providing significantly more detailed descriptions or captions of given images.
[GENERIC] API for practical large models
optimized for document-level optical character recognition, long-context vision-language understanding.
Extracts data and answers questions from images, charts, tables, and documents.
A modern Streamlit application for extracting structured data from invoice images using Qwen 2.5 VL via OpenRouter API.
Code and dataset for evaluating Multimodal LLMs on indexical, iconic, and symbolic gestures (Nishida et al., ACL 2025)
Add a description, image, and links to the qwen2-5-vl topic page so that developers can more easily learn about it.
To associate your repository with the qwen2-5-vl topic, visit your repo's landing page and select "manage topics."