Run generative AI models in sophgo BM1684X/BM1688
-
Updated
Jul 14, 2025 - C++
Run generative AI models in sophgo BM1684X/BM1688
A dedicated Colab notebooks to experiment (Nanonets OCR, Monkey OCR, OCRFlux 3B, Typhoo OCR 3B & more..) On T4 GPU - free tier
A batched implementation for efficient Qwen2.5-VL inference.
Vision Language Model : tailored for tasks that involve [messy] optical character recognition (ocr), image-to-text conversion, and math problem solving with latex formatting.
A comprehensive multimodal OCR application that supports both image and video document processing using state-of-the-art vision-language models. This application provides an intuitive Gradio interface for extracting text, converting documents to markdown, and performing advanced document analysis.
A specialized optical character recognition (OCR) application built on advanced vision-language models, designed for document-level OCR, long-context understanding, and mathematical LaTeX formatting. Supports both image and video processing with multiple state-of-the-art model
A comprehensive Gradio-based interface for running multiple state-of-the-art Vision-Language Models (VLMs) for Optical Character Recognition (OCR) and Visual Question Answering (VQA) tasks.
The Qwen2.5-VL-7B-Instruct model is a multimodal AI model developed by Alibaba Cloud that excels at understanding both text and images. It's a Vision-Language Model (VLM) designed to handle various visual understanding tasks, including image understanding, video analysis, and even multilingual support.
thinking/reasoning multimodal/vision-language model (VLM) trained to enhance spatial reasoning
An experimental document-focused Vision-Language Model application that provides advanced document analysis, text extraction, and multimodal understanding capabilities. This application features a streamlined Gradio interface for processing both images and videos using state-of-the-art vision-language models specialized in document understanding.
drex-062225-exp (document retrieval and extraction expert) model is a specialized fine-tuned version of docscopeocr-7b-050425-exp, optimized for document retrieval, content extraction, and analysis recognition. built on top of the qwen2.5-vl architecture.
Transform your documents into intelligent conversations. This open-source RAG chatbot combines semantic search with fine-tuned language models (LLaMA, Qwen2.5VL-3B) to deliver accurate, context-aware responses from your own knowledge base. Join our community!
Doc-VLMs-v2-Localization is a demo app for the Camel-Doc-OCR-062825 model, fine-tuned from Qwen2.5-VL-7B-Instruct for advanced document retrieval, extraction, and analysis. It enhances document understanding and also integrates other notable Hugging Face models.
Understand physical common sense and generate appropriate embodied decisions. optimized for document-level optical character recognition, long-context vision-language understanding. build with hand-curated dataset for text-to-image models, providing significantly more detailed descriptions or captions of given images.
[GENERIC] API for practical large models
optimized for document-level optical character recognition, long-context vision-language understanding.
汇集ARIMA、时序大模型、视觉大模型等主流或前沿零样本无训练股价预测算法,并在实际数据测试对比下,寻找适合A股的最佳预测模型
MedScan AI is a modular multimodal medical assistant designed to streamline clinical data processing.
Code and dataset for evaluating Multimodal LLMs on indexical, iconic, and symbolic gestures (Nishida et al., ACL 2025)
Add a description, image, and links to the qwen2-5-vl topic page so that developers can more easily learn about it.
To associate your repository with the qwen2-5-vl topic, visit your repo's landing page and select "manage topics."