LAVIS - A One-stop Library for Language-Vision Intelligence
-
Updated
Nov 18, 2024 - Jupyter Notebook
LAVIS - A One-stop Library for Language-Vision Intelligence
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation
Instruction Following Agents with Multimodal Transforemrs
code for studying OpenAI's CLIP explainability
[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
VTC: Improving Video-Text Retrieval with User Comments
Vision-Language Models Toolbox: Your all-in-one solution for multimodal research and experimentation
A collection of VLMs papers, blogs, and projects, with a focus on VLMs in Autonomous Driving and related reasoning techniques.
Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling
A list of research papers on knowledge-enhanced multimodal learning
Streamlit App Combining Vision, Language, and Audio AI Models
Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling
Coding a Multi-Modal vision model like GPT-4o from scratch, inspired by @hkproj and PaliGemma
Mini-batch selective sampling for knowledge adaption of VLMs for mammography.
This reporsitory contains all the Homeworks, and Projects from the Deep Learning Course by Prof. Chinmay Hegde, in Spring 2025, at NYU.
Add a description, image, and links to the vision-language-transformer topic page so that developers can more easily learn about it.
To associate your repository with the vision-language-transformer topic, visit your repo's landing page and select "manage topics."