Skip to content

Jasper0068/arxiv-papers-daily

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[![Contributors][contributors-shield]][contributors-url] [![Forks][forks-shield]][forks-url] [![Stargazers][stars-shield]][stars-url] [![Issues][issues-shield]][issues-url]

Updated on 2025.08.14

Website

You can learn directly from this page

Table of Contents
  1. Tracking
  2. HDR
  3. Low-Level
  4. Image Matching
  5. MutilModal
  6. Prompt

Tracking

Publish Date Title Authors PDF Code
2025-07-23 Giant Damping-like Torque Efficiency via Synergistic Spin Hall and enhanced Orbital Hall Effects Subhakanta Das et.al. 2507.17372 null
2025-07-21 Is Tracking really more challenging in First Person Egocentric Vision? Matteo Dunnhofer et.al. 2507.16015 null
2025-07-18 AeroThrow: An Autonomous Aerial Throwing System for Precise Payload Delivery Ziliang Li et.al. 2507.13903 null
2025-07-14 Vision-Based Anti Unmanned Aerial Technology: Opportunities and Challenges Guanghai Ding et.al. 2507.10006 null
2025-07-12 MVPinn: Integrating Milne-Eddington Inversion with Physics-Informed Neural Networks for GST/NIRIS Observations Qin Li et.al. 2507.09430 null
2025-07-08 Can We Predict Your Next Move Without Breaking Your Privacy? Arpita Soni et.al. 2507.08843 null
2025-07-11 SAM2RL: Towards Reinforcement Learning Memory Control in Segment Anything Model 2 Alen Adamyan et.al. 2507.08548 null
2025-07-10 Temporal Unlearnable Examples: Preventing Personal Video Data from Unauthorized Exploitation by Object Tracking Qiangqiang Wu et.al. 2507.07483 null
2025-07-02 TrackingMiM: Efficient Mamba-in-Mamba Serialization for Real-time UAV Object Tracking Bingxi Liu et.al. 2507.01535 null
2025-07-01 UMDATrack: Unified Multi-Domain Adaptive Tracking Under Adverse Weather Conditions Siyuan Yao et.al. 2507.00648 null
2025-06-30 Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking Shiao Wang et.al. 2506.23783 null
2025-07-22 R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning Biao Wang et.al. 2506.21980 null
2025-06-23 Lightweight RGB-T Tracking with Mobile Vision Transformers Mahdi Falaki et.al. 2506.19154 null
2025-06-18 SOT Enabled 3D Magnetic Field Sensor with Low Offset and High Sensitivity Sebastian Zeilinger et.al. 2506.15320 null
2025-06-17 Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios Aswin Shanmugam Subramanian et.al. 2506.14204 null
2025-06-15 Learning Unpaired Image Dehazing with Physics-based Rehazy Generation Haoyou Deng et.al. 2506.12824 null
2025-06-15 SC-SOT: Conditioning the Decoder on Diarized Speaker Information for End-to-End Overlapped Speech Recognition Yuta Hirano et.al. 2506.12672 null
2025-06-12 Joint ASR and Speaker Role Tagging with Serialized Output Training Anfeng Xu et.al. 2506.10349 null
2025-06-09 Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition Asahi Sakuma et.al. 2506.07515 null
2025-06-06 Diarization-Aware Multi-Speaker Automatic Speech Recognition via Large Language Models Yuke Lin et.al. 2506.05796 null
2025-06-03 MVTD: A Benchmark Dataset for Maritime Visual Object Tracking Ahsan Baidar Bakht et.al. 2506.02866 null
2025-05-28 Nanoscale quantum imaging of field-free deterministic switching of a chiral antiferromagnet Jingcheng Zhou et.al. 2505.22856 null
2025-05-27 Fully Spiking Neural Networks for Unified Frame-Event Object Tracking Jingjun Yang et.al. 2505.20834 null
2025-05-28 Progressive Scaling Visual Object Tracking Jack Hong et.al. 2505.19990 null
2025-05-26 Systems of Twinned Systems: A Systematic Literature Review Feyi Adesanya et.al. 2505.19916 link
2025-05-26 Comparison of Polar Magnetic Fields Derived from MILOS and MERLIN Inversions with Hinode/SOT-SP Data Masahito Kubo et.al. 2505.19468 null
2025-05-23 Adapting SAM 2 for Visual Object Tracking: 1st Place Solution for MMVPR Challenge Multi-Modal Tracking Cheng-Yen Yang et.al. 2505.18111 null
2025-05-19 Towards Low-Latency Event Stream-based Visual Object Tracking: A Slow-Fast Approach Shiao Wang et.al. 2505.12903 link
2025-05-30 Effect of crystallinity on spin-orbit torque in 5 $\textit{d}$ iridium oxide IrO$_{2}$ Tetsuro Morimoto et.al. 2505.10907 null
2025-05-14 Recent progress on electron- and magnon-mediated torques Jia-Min Lai et.al. 2505.09257 null
2025-05-14 Enhanced Spin Pumping and Magnetization dynamics in Ni ${80}$Fe${20}$/MoS$_2$ stack via interface modification Mahammad Tahir et.al. 2505.09248 null
2025-05-11 Nonlinear Model Predictive Control for Leaderless UAV Formation Flying with Collision Avoidance under Directed Graphs Yiming Wang et.al. 2505.06895 null
2025-05-11 Streaming Sliced Optimal Transport Khai Nguyen et.al. 2505.06835 link
2025-05-10 Nonlinearity Modulation of Auto-oscillations in Three-terminal Magnetic Tunnel Junctions Zixi Wang et.al. 2505.06547 null
2025-05-06 Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation Gabriele Rosi et.al. 2505.06280 link
2025-05-09 CGTrack: Cascade Gating Network with Hierarchical Feature Aggregation for UAV Tracking Weihong Li et.al. 2505.05936 link
2025-05-08 A Simple Detector with Frame Dynamics is a Strong Tracker Chenxu Peng et.al. 2505.04917 link
2025-05-06 Modality-Guided Dynamic Graph Fusion and Temporal Diffusion for Self-Supervised RGB-T Tracking Shenglan Li et.al. 2505.03507 link
2025-05-02 Current-induced Dynamics of Bloch Domain-wall Bimerons Jiwen Chen et.al. 2505.00959 null
2025-05-01 A High-resolution, Inversion-Based Synoptic Study of Solar Granulation James Crowley et.al. 2505.00826 null
2025-05-01 DARTer: Dynamic Adaptive Representation Tracker for Nighttime UAV Tracking Xuzhao Li et.al. 2505.00752 null
2025-04-24 RGB-D Tracking via Hierarchical Modality Aggregation and Distribution Network Boyue Xu et.al. 2504.17595 null
2025-04-22 SonarT165: A Large-scale Benchmark and STFTrack Framework for Acoustic Object Tracking Yunfeng Li et.al. 2504.15609 link
2025-04-19 Adversarial Attack for RGB-Event based Visual Object Tracking Qiang Chen et.al. 2504.14423 link
2025-04-28 HyDra: SOT-CAM Based Vector Symbolic Macro for Hyperdimensional Computing Md Mizanur Rahaman Nayan et.al. 2504.14020 null
2025-04-18 FocusTrack: A Self-Adaptive Local Sampling Algorithm for Efficient Anti-UAV Tracking Ying Wang et.al. 2504.13604 link
2025-04-17 TAXI: Traveling Salesman Problem Accelerator with X-bar-based Ising Macros Powered by SOT-MRAMs and Hierarchical Clustering Sangmin Yoo et.al. 2504.13294 null
2025-04-16 Efficient spin-orbit torque driven magnetization switching of GdFe using phosphorus-implanted platinum layers Kazuki Shintaku et.al. 2504.11796 null
2025-04-15 Chiral Domain Walls Induced by Radially Magnetized Nanotube Geometry Nobuyuki Umetsu et.al. 2504.11005 null
2025-04-16 Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution Chenghao Li et.al. 2504.09566 link
2025-04-13 Sub-nanosecond in-plane magnetization switching induced by field-like spin-orbit torques from ferromagnets Hanying Zhang et.al. 2504.09431 null
2025-04-12 Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking You Wu et.al. 2504.09228 link
2025-04-11 Bayesian Reasoning Enabled by Spin-Orbit Torque Magnetic Tunnel Junctions Yingqian Xu et.al. 2504.08257 null
2025-04-08 Magnetic Memory Driven by Orbital Current Jingkai Xu et.al. 2504.05780 null
2025-04-07 Dimensionality Enhanced Out-of-Plane Spin Currents in NbIrTe $_4$ for Efficient Field-Free Switching of Perpendicular Magnetization Wei Yang et.al. 2504.05280 null
2025-04-02 Shape Anisotropy Enabled Field Free Switching of Perpendicular Nanomagnets Akanksha Chouhan et.al. 2504.01634 null
2025-03-31 Symmetry Enhanced Unconventional Spin Current Anisotropy in a Collinear Antiferromagnet Pankhuri Gupta et.al. 2503.20545 null
2025-03-26 Intrinsic back-switching phenomenon in SOT-MRAM devices Kuldeep Ray et.al. 2503.19840 null
2025-03-22 MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking Haolin Qin et.al. 2503.17699 link
2025-04-07 Strong Baseline: Multi-UAV Tracking via YOLOv12 with BoT-SORT-ReID Yu-Hsi Chen et.al. 2503.17237 link
2025-03-21 Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks Haijin Zeng et.al. 2503.16930 null
2025-03-21 Dynamic Attention Mechanism in Spatiotemporal Memory Networks for Object Tracking Meng Zhou et.al. 2503.16768 null
2025-03-17 UncTrack: Reliable Visual Object Tracking with Uncertainty-Aware Prototype Memory Network Siyuan Yao et.al. 2503.12888 link
2025-03-16 Equivalent-Circuit Thermal Model for Batteries with One-Shot Parameter Identification Myisha A. Chowdhury et.al. 2503.12616 null
2025-03-13 Target-aware Bidirectional Fusion Transformer for Aerial Object Tracking Xinglong Sun et.al. 2503.09951 null
2025-03-09 Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking Chaocan Xue et.al. 2503.06625 link
2025-03-09 Dynamic Updates for Language Adaptation in Visual-Language Tracking Xiaohai Li et.al. 2503.06621 link
2025-03-06 High resolution spectra of the [6297-6303] and [6361-6367] Angstr{ö}m domains (including forbidden OI lines) of the Sun and brightest stars Jean-Marie Malherbe et.al. 2503.05832 null
2025-03-07 Separating the bulk and interface contribution of spin-orbit torque in ferromagnet-Heavy metal bilayers tuned by variation of resistivity of heavy metal Abu Bakkar Miah et.al. 2503.05341 null
2025-03-07 Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching Simon A. Aytes et.al. 2503.05179 link
2025-03-02 Inefficiency of the orbit Hall effect on spin torque in transition metal/ferromagnet bilayers Yizhuo Song et.al. 2503.00910 null
2025-02-27 MITracker: Multi-View Integration for Visual Object Tracking Mengjie Xu et.al. 2502.20111 null
2025-03-08 Dynamic Degradation Decomposition Network for All-in-One Image Restoration Huiqiang Wang et.al. 2502.19068 null
2025-02-25 UASTrack: A Unified Adaptive Selection Framework with Modality-Customization in Single Object Tracking He Wang et.al. 2502.18220 null
2025-02-24 Symmetry-breaking effects on spin-orbit torque switching in ferromagnetic semiconductors with perpendicular magnetic anisotropy Apu Kumar Jana et.al. 2502.16788 null
2025-02-17 Effects of antiferromagnetic coupling and pinning on domain wall dynamics in synthetic ferrimagnets Sougata Mallick et.al. 2502.11621 null
2025-02-13 Modelling spin-orbitronics effects at interfaces and chiral molecules Poonam Kumari et.al. 2502.09239 null
2025-02-12 Highly efficient field-free switching by orbital Hall torque in a MoS2-based device operating at room temperature Antonio Bianco et.al. 2502.08483 null
2025-02-08 Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark Shiao Wang et.al. 2502.05574 link
2025-02-06 Visualizing Field-free Deterministic Magnetic Switching of all-van der Waals Spin-Orbit Torque System Using Spin Ensembles in Hexagonal Boron Nitride Xi Zhang et.al. 2502.04561 null
2025-01-27 Investigation of Sub-configurations Reveals Stable Spin-Orbit Torque Switching Polarity in Polycrystalline Mn3Sn Boyu Zhao et.al. 2501.15815 null
2025-01-25 Thermal Stability and Depinning Currents of Domain Wall-Based Artificial Synapses Guntas Kaur et.al. 2501.15102 null
2025-02-16 Enhancing Unconventional Spin-Orbit Torque Efficiency: Numerical Study on the Influence of Crystallographic Texture and Polycrystalline Effects on Low-Symmetry Materials Yifei Yang et.al. 2501.14200 null
2025-01-22 Enhanced Field-Free Perpendicular Magnetization Switching via spin splitting torque in Altermagnetic RuO2-based Heterostructures Badsha Sekh et.al. 2501.12593 null
2025-01-18 Multilayered MXenes for future two-dimensional nonvolatile magnetic memories P. Kumar et.al. 2501.10678 null
2025-01-13 Robust Single Object Tracking in LiDAR Point Clouds under Adverse Weather Conditions Xiantong Zhao et.al. 2501.07133 null
2025-01-11 ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation Xuanle Zhao et.al. 2501.06598 link
2025-01-18 BTMTrack: Robust RGB-T Tracking via Dual-template Bridging and Temporal-Modal Candidate Elimination Zhongxuan Zhang et.al. 2501.03616 null
2025-01-05 DeTrack: In-model Latent Denoising Learning for Visual Object Tracking Xinyu Zhou et.al. 2501.02467 null
2024-12-31 Alternative harmonic detection approach for quantitative determination of spin and orbital torques Y. Xu et.al. 2501.00403 null
2024-12-30 An Experimental Study of Passive UAV Tracking with Digital Arrays and Cellular Downlink Signals Yifei Sun et.al. 2412.20788 null
2024-12-30 Spin-orbit torque in a three-fold-symmetric bilayer and its effect on magnetization dynamics Wuzhang Fang et.al. 2412.20746 null
2024-12-28 Learning Adaptive and View-Invariant Vision Transformer with Multi-Teacher Knowledge Distillation for Real-Time UAV Tracking You Wu et.al. 2412.20002 link
2024-12-27 Enhancing Vision-Language Tracking by Effectively Converting Textual Cues into Visual Cues X. Feng et.al. 2412.19648 link
2024-12-26 Semistrong edge colorings of planar graphs Yuquan Lin et.al. 2412.19230 null
2024-12-26 SUTrack: Towards Simple and Unified Single Object Tracking Xin Chen et.al. 2412.19138 link
2024-12-24 Linear Enhancement of Spin-Orbit Torques and Absence of Bulk Rashba-Type Spin Splitting in Perpendicularly Magnetized [Pt/Co/W]n Superlattices Zhihao Yan et.al. 2412.18481 null
2024-12-24 Field-free current-induced magnetization switching of a room temperature van der Waals magnet for neuromorphic computing Chenxi Zhou et.al. 2412.18429 null
2024-12-24 All-electric mimicking synaptic plasticity based on the noncollinear antiferromagnetic device Cuimei Cao et.al. 2412.18418 null
2025-01-01 Unsupervised UAV 3D Trajectories Estimation with Sparse Point Clouds Hanfang Liang et.al. 2412.12716 link
2024-12-15 Exploring Enhanced Contextual Information for Video-Level Object Tracking Ben Kang et.al. 2412.11023 link
2024-12-13 Visual Object Tracking across Diverse Data Modalities: A Review Mengmeng Wang et.al. 2412.09991 null
2024-12-09 Magnetic Switching in Monolayer 2D Diluted Magnetic Semiconductors via Spin-to- Spin Conversion Siwei Chen et.al. 2412.06650 null
2024-12-09 Energy Efficient Stochastic Signal Manipulation in Superparamagnetic Tunnel Junctions via Voltage-Controlled Exchange Coupling Qi Jia et.al. 2412.06256 null
2024-12-03 GSOT3D: Towards Generic 3D Single Object Tracking in the Wild Yifan Jiao et.al. 2412.02129 link
2024-12-01 MambaNUT: Nighttime UAV Tracking via Mamba and Adaptive Curriculum Learning You Wu et.al. 2412.00626 link
2024-11-29 Current-driven motion of magnetic domain-wall skyrmions Haoyang Nie et.al. 2411.19566 null
2024-11-28 Unveiling the anisotropy of linear and nonlinear charge-spin conversion in Weyl semimetal TaIrTe4 Tao Tang et.al. 2411.19062 null
2024-12-04 A Distractor-Aware Memory for Visual Object Tracking with SAM2 Jovana Videnovic et.al. 2411.17576 link
2024-11-24 MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking Chunhui Zhang et.al. 2411.15761 link
2024-11-23 How Texts Help? A Fine-grained Evaluation to Reveal the Role of Language in Vision-Language Tracking Xuchen Li et.al. 2411.15600 null
2024-11-23 MambaVLT: Time-Evolving Multimodal State Space Model for Vision-Language Tracking Xinqi Liu et.al. 2411.15459 null
2024-11-24 ClickTrack: Towards Real-time Interactive Single Object Tracking Kuiran Wang et.al. 2411.13183 null
2024-11-30 SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory Cheng-Yen Yang et.al. 2411.11922 link
2024-11-14 Compression Method for Solar Polarization Spectra Collected from Hinode SOT/SP Observations Jargalmaa Batmunkh et.al. 2411.09311 null
2024-11-10 Orthogonal Spin-Orbit Torque-Induced Deterministic Switching in NiO Yixiao Qiao et.al. 2411.06379 null
2024-11-08 Giant spin Hall effect with multi-directional spin components in Ni4W Yifei Yang et.al. 2411.05682 null
2024-11-04 Single-layer spin-orbit-torque magnetization switching due to spin Berry curvature generated by minute spontaneous atomic displacement in a Weyl oxide Hiroto Horiuchi et.al. 2411.01806 null
2024-11-04 ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model Yiming Sun et.al. 2411.01756 null
2024-11-03 Capping layer dependent anti-correlation between magnetic damping and spin-orbital to charge conversion Antarjami Sahoo et.al. 2411.01662 null
2024-11-01 Spin orbit torque-driven motion of quasi-Bloch domain wall in perpendicularly magnetized W/CoFeB/MgO structures Nobuyuki Umetsu et.al. 2411.00516 null
2024-10-31 Origin of line broadening in fading granule: influence of small-scale turbulence Ryohtaroh T. Ishikawa et.al. 2410.23654 null
2024-10-27 NT-VOT211: A Large-Scale Benchmark for Night-time Visual Object Tracking Yu Liu et.al. 2410.20421 link
2024-10-25 Can Stories Help LLMs Reason? Curating Information Space Through Narrative Vahid Sadiri Javadi et.al. 2410.19221 null
2024-10-19 The Solution for Single Object Tracking Task of Perception Test Challenge 2024 Zhiqiang Zhong et.al. 2410.16329 null
2024-10-14 A stronger form of Yamamoto's theorem II -- Spectral operators Soumyashant Nayak et.al. 2410.16318 null
2024-10-03 Leveraging Event Streams with Deep Reinforcement Learning for End-to-End UAV Tracking Ala Souissi et.al. 2410.14685 null
2024-10-16 DaDiff: Domain-aware Diffusion Model for Nighttime UAV Tracking Haobo Zuo et.al. 2410.12270 link
2024-10-14 SMART-TRACK: A Novel Kalman Filter-Guided Sensor Fusion For Robust UAV Object Tracking in Dynamic Environments Khaled Gabr et.al. 2410.10409 link
2024-10-09 DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM Xuchen Li et.al. 2410.02492 null
2024-10-01 Energy-efficient picosecond spin-orbit torque magnetization switching in ferro- and ferrimagnetic films Eva Díaz et.al. 2410.00474 null
2024-09-27 Improving Visual Object Tracking through Visual Prompting Shih-Fang Chen et.al. 2409.18901 link
2024-09-27 Prompt-Driven Temporal Domain Adaptation for Nighttime UAV Tracking Changhong Fu et.al. 2409.18533 link
2024-09-26 A 5T-2MTJ STT-assisted Spin Orbit Torque based Ternary Content Addressable Memory for Hardware Accelerators Siri Narla et.al. 2409.17863 null
2024-09-26 General Compression Framework for Efficient Transformer Object Tracking Lingyi Hong et.al. 2409.17564 null
2024-09-26 Dynamic Subframe Splitting and Spatio-Temporal Motion Entangled Sparse Attention for RGB-E Tracking Pengcheng Shao et.al. 2409.17560 null
2024-09-25 Towards Underwater Camouflaged Object Tracking: An Experimental Evaluation of SAM and SAM 2 Chunhui Zhang et.al. 2409.16902 link
2024-09-25 Conditional Generative Denoiser for Nighttime UAV Tracking Yucheng Wang et.al. 2409.16834 link
2024-09-25 Progressive Representation Learning for Real-Time UAV Tracking Changhong Fu et.al. 2409.16652 link
2024-09-25 Enhancing Nighttime UAV Tracking with Light Distribution Suppression Liangliang Yao et.al. 2409.16631 link
2024-09-24 Pulse Shaping Strategies for Efficient Switching of Magnetic Tunnel Junctions by Spin-Orbit Torque Marco Hoffmann et.al. 2409.16454 null
2024-09-24 CloudTrack: Scalable UAV Tracking with Cloud Semantics Yannik Blei et.al. 2409.16111 link
2024-09-20 A survey of sulfur-bearing molecular lines toward the dense cores in eleven massive protoclusters Mengyao Tang et.al. 2409.13231 null
2024-09-19 Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC Jiawen Kang et.al. 2409.12388 link
2024-09-11 Topological Spin-Orbit Torque in Ferrimagnetic Weyl Semimetal Tomonari Meguro et.al. 2409.07106 null
2024-09-09 Effects of Interfacial Oxygen Diffusion on the Magnetic Properties and Thermal Stability of Pd/CoFeB/Pd/Ta Heterostructure Saravanan Lakshmanan et.al. 2409.05783 null
2024-09-11 Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition Hao Shi et.al. 2409.00815 null
2024-08-30 Advancing Multi-talker ASR Performance with Large Language Models Mohan Shi et.al. 2408.17431 null
2024-08-30 Cross Fusion RGB-T Tracking with Bi-directional Adapter Zhirong Zeng et.al. 2408.16979 null
2024-08-23 Energy-efficient field-free unconventional spin-orbit torque magnetization switching dynamics in van der Waals heterostructures Lalit Pandey et.al. 2408.13095 null
2024-08-21 Low-Light Object Tracking: A Benchmark Pengzhi Zhong et.al. 2408.11463 link
2024-08-20 MambaEVT: Event Stream based Visual Object Tracking using State Space Model Xiao Wang et.al. 2408.10487 link
2024-08-19 Reconfigurable Spin Logics and High-density Multistate Memory in a Single Spin-orbit Torque Device Raghvendra Posti et.al. 2408.09866 null
2024-08-16 Initialization-Free Multistate Memristor: Synergy of Spin-Orbit Torque and Magnetic Fields Raghvendra Posti et.al. 2408.08641 null
2024-08-15 MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking Simiao Lai et.al. 2408.07889 null
2024-08-12 Latent Disentanglement for Low Light Image Enhancement Zhihao Zheng et.al. 2408.06245 null
2024-08-11 Comparative Evaluation of Memory Technologies for Synaptic Crossbar Arrays- Part 2: Design Knobs and DNN Accuracy Trends Jeffry Victor et.al. 2408.05857 null
2024-08-05 VoxelTrack: Exploring Voxel Representation for 3D Point Cloud Object Tracking Yuxuan Lu et.al. 2408.02263 null
2024-08-04 3D Single-object Tracking in Point Clouds with High Temporal Variation Qiao Wu et.al. 2408.02049 null
2024-07-30 Strained topological insulator spin-orbit torque random access memory (STI-SOTRAM) bit cell for energy-efficient Processing in Memory Md Golam Morshed et.al. 2407.20925 null
2024-07-19 HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation Zezeng Li et.al. 2407.14419 null
2024-07-17 Strawberry detection and counting based on YOLOv7 pruning and information based tracking algorithm Shiyu Liu et.al. 2407.12614 null
2024-07-15 Effective Motion Modeling for UAV-platform Multiple Object Tracking with Re-Margin Loss Mufeng Yao et.al. 2407.10485 link
2024-07-16 Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking Lorenzo Vaquero et.al. 2407.10151 link
2024-07-12 DroneMOT: Drone-based Multi-Object Tracking Considering Detection Difficulties and Simultaneous Moving of Drones and Objects Peng Wang et.al. 2407.09051 null
2024-07-11 Manipulating a Tetris-Inspired 3D Video Representation Mihir Godbole et.al. 2407.08885 null
2024-07-11 Visual Multi-Object Tracking with Re-Identification and Occlusion Handling using Labeled Random Finite Sets Linh Van Ma et.al. 2407.08872 link
2024-07-11 CommRad: Context-Aware Sensing-Driven Millimeter-Wave Networks Ish Kumar Jain et.al. 2407.08817 null
2024-07-10 Deep Learning-Based Robust Multi-Object Tracking via Fusion of mmWave Radar and Camera Sensors Lei Cheng et.al. 2407.08049 null
2024-07-10 Large spin-orbit torque in a-plane $α$-Fe${2}$O${3}$ /Pt bilayers Igor Lyalin et.al. 2407.07731 null
2024-07-10 Spin Splitting in Altermagnetic RuO $_2$ Enables Field-free Spin-Orbit Torque Switching via Dominant Out-of-Plane Spin Polarization Zhuoyi Li et.al. 2407.07447 null
2024-07-09 Unconventional Spin-Orbit Torques from Sputtered MoTe2 Films Shuchen Li et.al. 2407.06487 null
2024-07-07 Addressing single object tracking in satellite imagery through prompt-engineered solutions Athena Psalta et.al. 2407.05518 null
2024-07-07 Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking You Wu et.al. 2407.05383 null
2024-07-09 P2P: Part-to-Part Motion Cues Guide a Strong Tracking Framework for LiDAR Point Clouds Jiahao Nie et.al. 2407.05238 link
2024-07-05 Median Mishaps between Chirality and Spin-Orbit Torques via Asymmetric Hysteresis Minhwan Kim et.al. 2407.04624 null
2024-07-04 Serialized Output Training by Learned Dominance Ying Shi et.al. 2407.03966 null
2024-07-04 TrackPGD: A White-box Attack using Binary Masks against Robust Transformer Trackers Fatemeh Nourilenjan Nokabadi et.al. 2407.03946 link
2024-07-04 Out-of-Plane Polarization from Spin Reflection Induces Field-Free Spin-Orbit Torque Switching in Structures with Canted NiO Interfacial Moments Zhe Zhang et.al. 2407.03676 null

(back to top)

HDR

Publish Date Title Authors PDF Code
2025-07-23 UNICE: Training A Universal Image Contrast Enhancer Ruodai Cui et.al. 2507.17157 null
2025-07-16 Dark-EvGS: Event Camera as an Eye for Radiance Field in the Dark Jingqian Wu et.al. 2507.11931 null
2025-07-16 SEPose: A Synthetic Event-based Human Pose Estimation Dataset for Pedestrian Monitoring Kaustav Chanda et.al. 2507.11910 null
2025-07-16 CompressedVQA-HDR: Generalized Full-reference and No-reference Quality Assessment Models for Compressed High Dynamic Range Videos Wei Sun et.al. 2507.11900 null
2025-07-14 MetaH2: A Snapshot Metasurface HDR Hyperspectral Camera Yuxuan Liu et.al. 2507.08282 null
2025-07-09 A hybrid dosimetry approach for remote audits in Ir-192 HDR interstitial brachytherapy: Development and pilot implementation Eleftherios P Pappas et.al. 2507.06958 null
2025-07-09 Capturing Stable HDR Videos Using a Dual-Camera System Qianyu Zhang et.al. 2507.06593 null
2025-07-07 Clinical test cases for model-based dose calculation algorithm commissioning, QA and benchmarking, for 192Ir HDR brachytherapy of gynecologic cancers V. Peppa et.al. 2507.05144 null
2025-07-21 Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration Yuyi Zhang et.al. 2507.05108 null
2025-07-05 Towards Spatially-Varying Gain and Binning Anqi Yang et.al. 2507.04190 null
2025-07-01 EvRWKV: A RWKV Framework for Effective Event-guided Low-Light Image Enhancement WenJie Cai et.al. 2507.03184 null
2025-07-02 Enhancing Multi-Exposure High Dynamic Range Imaging with Overlapped Codebook for Improved Representation Learning Keuntek Lee et.al. 2507.01588 null
2025-07-02 DiffusionLight-Turbo: Accelerated Light Probes for Free via Single-Pass Chrome Ball Inpainting Worameth Chinchuthakun et.al. 2507.01305 null
2025-06-30 Event-based Tiny Object Detection: A Benchmark Dataset and Baseline Nuo Chen et.al. 2506.23575 null
2025-07-05 AFUNet: Cross-Iterative Alignment-Fusion Synergy for HDR Reconstruction via Deep Unfolding Paradigm Xinyue Li et.al. 2506.23537 null
2025-06-29 Event-based Stereo Visual-Inertial Odometry with Voxel Map Zhaoxing Zhang et.al. 2506.23078 null
2025-06-28 SPICE-HL3: Single-Photon, Inertial, and Stereo Camera dataset for Exploration of High-Latitude Lunar Landscapes David Rodríguez-Martínez et.al. 2506.22956 null
2025-06-28 ICME 2025 Generalizable HDR and SDR Video Quality Measurement Grand Challenge Yixu Chen et.al. 2506.22790 null
2025-06-27 Single-shot HDR using conventional image sensor shutter functions and optical randomization Xiang Dai et.al. 2506.22426 null
2025-06-19 Seven-Probe Fiber Detector for Time-Resolved Source Tracking in HDR-Brachytherapy: Experimental Evaluation Mathieu Gonod et.al. 2506.16124 null
2025-06-14 Fine-Grained HDR Image Quality Assessment From Noticeably Distorted to Very High Fidelity Mohsen Jenadeleh et.al. 2506.12505 null
2025-06-13 Automated Treatment Planning for Interstitial HDR Brachytherapy for Locally Advanced Cervical Cancer using Deep Reinforcement Learning Mohammadamin Moradi et.al. 2506.11957 null
2025-06-11 Automatic Treatment Planning using Reinforcement Learning for High-dose-rate Prostate Brachytherapy Tonghe Wang et.al. 2506.09805 null
2025-06-11 TRAPs, Generalisations of MZVs, Locality and Resurgence for Quantum Field Theories Pierre J. Clavier et.al. 2506.09493 null
2025-06-04 Photoreal Scene Reconstruction from an Egocentric Device Zhaoyang Lv et.al. 2506.04444 link
2025-06-04 GRAVITY+ adaptive optics (GPAO) tests in Europe Florentin Millour et.al. 2506.03721 null
2025-06-03 IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation Yuanze Lin et.al. 2506.03150 null
2025-05-29 LCB-CV-UNet: Enhanced Detector for High Dynamic Range Radar Signals Yanbin Wang et.al. 2505.23454 null
2025-05-29 iHDR: Iterative HDR Imaging with Arbitrary Number of Exposures Yu Yuan et.al. 2505.22971 null
2025-05-27 HDRSDR-VQA: A Subjective Video Quality Dataset for HDR and SDR Comparative Evaluation Bowen Chen et.al. 2505.21831 null
2025-05-26 Total-Editing: Head Avatar with Editable Appearance, Motion, and Lighting Yizhou Zhao et.al. 2505.20582 null
2025-05-28 EventEgoHands: Event-based Egocentric 3D Hand Mesh Reconstruction Ryosei Hara et.al. 2505.19169 null
2025-05-23 Distance Estimation in Outdoor Driving Environments Using Phase-only Correlation Method with Event Cameras Masataka Kobayashi et.al. 2505.17582 null
2025-05-22 V2V: Scaling Event-Based Vision through Efficient Video-to-Voxel Simulation Hanyue Lou et.al. 2505.16797 link
2025-05-21 Evaluation of Mobile Environment for Vehicular Visible Light Communication Using Multiple LEDs and Event Cameras Ryota Soga et.al. 2505.15412 null
2025-05-17 NTIRE 2025 Challenge on Efficient Burst HDR and Restoration: Datasets, Methods, and Results Sangmin Lee et.al. 2505.12089 null
2025-05-22 Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression Recognition Runduo Han et.al. 2505.12007 link
2025-05-16 Towards Navigation-Grade and Deployable Optomechanical Accelerometery Chang Ge et.al. 2505.11751 null
2025-05-16 Planar Velocity Estimation for Fast-Moving Mobile Robots Using Event-Based Optical Flow Liam Boyle et.al. 2505.11116 null
2025-05-14 Efficient Modelling of Lyman-α opacity fluctuations during late EoR Barun Maity et.al. 2505.09369 null
2025-05-13 A Survey of 3D Reconstruction with Event Cameras: From Event-based Geometry to Neural 3D Rendering Chuanzhi Xu et.al. 2505.08438 null
2025-05-12 Asynchronous Multi-Object Tracking with an Event Camera Angus Apps et.al. 2505.08126 link
2025-05-12 Towards a physically realistic computationally efficient DVS pixel model Rui Graca et.al. 2505.07386 null
2025-05-12 RealRep: Generalized SDR-to-HDR Conversion with Style Disentangled Representation Learning Gang He et.al. 2505.07322 null
2025-04-30 From Events to Enhancement: A Survey on Event-Based Imaging Technologies Yunfan Lu et.al. 2505.05488 null
2025-05-08 EDmamba: A Simple yet Effective Event Denoising Method with State Space Model Ciyu Ruan et.al. 2505.05391 null
2025-05-07 EvEnhancer: Empowering Effectiveness, Efficiency and Generalizability for Continuous Space-Time Video Super-Resolution with Events Shuoyan Wei et.al. 2505.04657 link
2025-05-06 Benchmark-based Study of CPU/GPU Power-Related Features through JAX and TensorFlow Roblex Nana Tchakoute et.al. 2505.03398 null
2025-05-02 High Dynamic Range Novel View Synthesis with Single Exposure Kaixuan Zhang et.al. 2505.01212 link
2025-04-29 A Survey on Event-based Optical Marker Systems Nafiseh Jabbari Tofighi et.al. 2504.20736 null
2025-05-12 Spike Imaging Velocimetry: Dense Motion Estimation of Fluids Using Spike Cameras Yunzhong Zhang et.al. 2504.18864 null
2025-04-25 Boxi: Design Decisions in the Context of Algorithmic Performance for Robotics Jonas Frey et.al. 2504.18500 null
2025-04-25 BiasBench: A reproducible benchmark for tuning the biases of event cameras Andreas Ziegler et.al. 2504.18235 null
2025-04-25 Post-Transfer Learning Statistical Inference in High-Dimensional Regression Nguyen Vu Khai Tam et.al. 2504.18212 null
2025-04-24 CasualHDRSplat: Robust High Dynamic Range 3D Gaussian Splatting from Casually Captured Videos Shucheng Gong et.al. 2504.17728 link
2025-04-27 EHGCN: Hierarchical Euclidean-Hyperbolic Fusion via Motion-Aware GCN for Hybrid Event Stream Perception Haosheng Chen et.al. 2504.16616 null
2025-04-23 SaENeRF: Suppressing Artifacts in Event-based Neural Radiance Fields Yuanjian Wang et.al. 2504.16389 link
2025-04-20 Approaches to High Dynamic Range Imaging - Application to the ngVLA T. K. Sridharan et.al. 2504.14449 null
2025-04-17 CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework Wentao Wu et.al. 2504.12576 link
2025-04-21 Event Quality Score (EQS): Assessing the Realism of Simulated Event Camera Streams via Distances in Latent Space Kaustav Chanda et.al. 2504.12515 null
2025-04-16 Deep Generative Models for Bayesian Inference on High-Rate Sensor Data: Applications in Automotive Radar and Medical Imaging Tristan S. W. Stevens et.al. 2504.12154 null
2025-04-11 High Dynamic Range Modulo Imaging for Robust Object Detection in Autonomous Driving Kebin Contreras et.al. 2504.11472 null
2025-04-17 GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR Christophe Bolduc et.al. 2504.10809 null
2025-04-14 Minimal Sensing for Orienting a Solar Panel Jeremy Klotz et.al. 2504.10765 null
2025-04-13 Low-Light Image Enhancement using Event-Based Illumination Estimation Lei Sun et.al. 2504.09379 null
2025-04-10 S2R-HDR: A Large-Scale Rendered Dataset for HDR Fusion Yujin Wang et.al. 2504.07667 null
2025-04-08 Orthogonal Matching Pursuit based Reconstruction for Modulo Hysteresis Operators Matthias Beckmann et.al. 2504.05895 null
2025-04-08 Inter-event Interval Microscopy for Event Cameras Changqing Su et.al. 2504.04924 null
2025-04-06 eKalibr-Stereo: Continuous-Time Spatiotemporal Calibration for Event-Based Stereo Visual Systems Shuolong Chen et.al. 2504.04451 link
2025-04-05 Autoregressive High-Order Finite Difference Modulo Imaging: High-Dynamic Range for Computer Vision Applications Brayan Monroy et.al. 2504.04228 null
2025-04-03 Brightness Perceiving for Recursive Low-Light Image Enhancement Haodian Wang et.al. 2504.02362 link
2025-04-02 Anomaly Detection for Hybrid Butterfly Subspecies via Probability Filtering Bo-Kai Ruan et.al. 2504.01671 link
2025-03-31 DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting Seungjun Lee et.al. 2503.24210 null
2025-03-29 SuperEIO: Self-Supervised Event Feature Learning for Event Inertial Odometry Peiyu Chen et.al. 2503.22963 link
2025-03-28 Enhancing Celestial Imaging: High Dynamic Range with Neuromorphic Cameras Satyapreet Singh Yadav et.al. 2503.22814 null
2025-03-26 SpikeDerain: Unveiling Clear Videos from Rainy Sequences Using Color Spike Streams Hanwen Liang et.al. 2503.20315 null
2025-03-26 A Survey on Event-driven 3D Reconstruction: Development under Different Categories Chuanzhi Xu et.al. 2503.19753 null
2025-03-25 Maximum Likelihood Estimation Based Complex-Valued Robust Chinese Remainder Theorem and Its Fast Algorithm Xiaoping Li et.al. 2503.18625 null
2025-03-21 Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras Shuang Guo et.al. 2503.17262 link
2025-03-20 Neuromorphic Cameras in Astronomy: Unveiling the Future of Celestial Imaging Beyond Conventional Limits Satyapreet Singh Yadav et.al. 2503.15883 null
2025-03-19 Boosting HDR Image Reconstruction via Semantic Knowledge Transfer Qingsen Yan et.al. 2503.15361 null
2025-03-20 VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention Mingzhe Zheng et.al. 2503.15138 null
2025-03-18 Weakly Supervised Spatial Implicit Neural Representation Learning for 3D MRI-Ultrasound Deformable Image Registration in HDR Prostate Brachytherapy Jing Wang et.al. 2503.14395 null
2025-03-17 UCF-Crime-DVS: A Novel Event-Based Dataset for Video Anomaly Detection with Spiking Neural Networks Yuanbin Qian et.al. 2503.12905 link
2025-03-17 Stereo Event-based, 6-DOF Pose Tracking for Uncooperative Spacecraft Zibin Liu et.al. 2503.12732 link
2025-03-16 EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera Luming Wang et.al. 2503.12419 link
2025-03-14 Gain-MLP: Improving HDR Gain Map Encoding via a Lightweight MLP Trevor D. Canham et.al. 2503.11883 null
2025-03-13 GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping Jinfeng Liu et.al. 2503.10143 null
2025-03-10 Retinex-MEF: Retinex-based Glare Effects Aware Unsupervised Multi-Exposure Image Fusion Haowen Bai et.al. 2503.07235 null
2025-03-08 Optimization models for needle placement in 3D-printed masks for high dose rate brachytherapy Nasim Mirzavand Boroujeni et.al. 2503.06000 null
2025-03-16 DeepGrav: Anomalous Gravitational-Wave Detection Through Deep Latent Features Jianqi Yan et.al. 2503.03799 link
2025-03-05 BAT: Learning Event-based Optical Flow with Bidirectional Adaptive Temporal Correlation Gangwei Xu et.al. 2503.03256 null
2025-03-04 ERetinex: Event Camera Meets Retinex Theory for Low-Light Image Enhancement Xuejian Guo et.al. 2503.02484 link
2025-03-03 S-R2D2: a spherical extension of the R2D2 deep neural network series paradigm for wide-field radio-interferometric imaging A. Tajja et.al. 2503.01462 null
2025-03-03 Adaptive cold-atom magnetometry mitigating the trade-off between sensitivity and dynamic range Zhu Ma et.al. 2503.01211 null
2025-03-01 High Dynamic Range Video Compression: A Large-Scale Benchmark Dataset and A Learned Bit-depth Scalable Compression Algorithm Zhaoyi Tian et.al. 2503.00410 link
2025-03-01 Adversarial Attacks on Event-Based Pedestrian Detectors: A Physical Approach Guixu Lin et.al. 2503.00377 null
2025-02-28 EVLoc: Event-based Visual Localization in LiDAR Maps via Event-Depth Registration Kuangyi Chen et.al. 2503.00167 link
2025-02-28 SEE: See Everything Every Time -- Adaptive Brightness Adjustment for Broad Light Range Images via Events Yunfan Lu et.al. 2502.21120 null
2025-02-18 Fast Antibiotic resistance-Based gene editing of mammalian cells with CRISPR-Cas9 (FAB-CRISPR) Petia Adarska et.al. 2502.12675 null
2025-02-14 Quantifying Phase Magnitudes of Open-Source Focused-Probe 4D-STEM Ptychography Reconstructions Toma Susi et.al. 2502.09938 link
2025-02-10 Indoor Light and Heat Estimation from a Single Panorama Guanzhou Ji et.al. 2502.06973 null
2025-02-09 Compressed sensing enabled high-bandwidth and large dynamic range magnetic sensing Galya Haim et.al. 2502.06070 null
2025-02-09 Energy-Efficient Autonomous Aerial Navigation with Dynamic Vision Sensors: A Physics-Guided Neuromorphic Approach Sourav Sanyal et.al. 2502.05938 null
2025-02-07 Differentiable Mobile Display Photometric Stereo Gawoon Ban et.al. 2502.05055 null
2025-02-05 Deep Learning-based Event Data Coding: A Joint Spatiotemporal and Polarity Solution Abdelrahman Seleem et.al. 2502.03285 null
2025-02-04 Event-aided Semantic Scene Completion Shangwei Guo et.al. 2502.02334 link
2025-01-23 HP2 Survey V. Ophiuchus: Filament formation in a dispersing cloud complex João Alves et.al. 2501.13931 null
2025-01-22 DocTTT: Test-Time Training for Handwritten Document Recognition Using Meta-Auxiliary Learning Wenhao Gu et.al. 2501.12898 null
2025-01-20 UltraFusion: Ultra High Dynamic Imaging using Exposure Fusion Zixuan Chen et.al. 2501.11515 null
2025-01-10 eKalibr: Dynamic Intrinsic Calibration for Event Cameras From First Principles of Events Shuolong Chen et.al. 2501.05688 link
2025-01-07 AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scene Chaoran Feng et.al. 2501.02807 null
2024-12-26 Learning Monocular Depth from Events via Egomotion Compensation Haitao Meng et.al. 2412.19067 null
2024-12-25 HAND: Hierarchical Attention Network for Multi-Scale Handwritten Document Recognition and Layout Analysis Mohammed Hamdan et.al. 2412.18981 null
2024-12-20 High-Dynamic Range Broadband Terahertz Time-Domain Spectrometer Based on Organic Crystal MNA Samira Mansourzadeh et.al. 2412.15718 null
2024-12-19 Event-assisted 12-stop HDR Imaging of Dynamic Scene Shi Guo et.al. 2412.14705 null
2025-01-06 LEDiff: Latent Exposure Diffusion for HDR Generation Chao Wang et.al. 2412.14456 null
2024-12-18 Development of a High-Resolution, High-Dynamic-Range Charge Detector for Ion Beam Monitoring O. Adriani et.al. 2412.13934 null
2024-12-18 Multi-Exposure Image Fusion via Distilled 3D LUT Grid with Editable Mode Xin Su et.al. 2412.13749 link
2024-12-17 Transforming Single Photon Camera Images to Color High Dynamic Range Images Sumit Sharma et.al. 2412.12942 null
2024-12-17 Efficient Event-based Semantic Segmentation with Spike-driven Lightweight Transformer-based Networks Xiaxin Zhu et.al. 2412.12843 null
2024-12-17 Compressed Sensing Based Residual Recovery Algorithms and Hardware for Modulo Sampling Shaik Basheeruddin Shah et.al. 2412.12724 null
2024-12-16 Towards Physically-Based Sky-Modeling Ian J. Maquignaz et.al. 2412.11883 null
2024-12-16 High dynamic-range quantum sensing of magnons and their dynamics using a superconducting qubit Sonia Rani et.al. 2412.11859 null
2024-12-16 Predicting the Original Appearance of Damaged Historical Documents Zhenhua Yang et.al. 2412.11634 link
2024-12-16 Event-based Detectors for Laser Guide Star Tip-Tilt Sensing Monique Cockram et.al. 2412.11436 null
2024-12-12 Continuous Gaussian Process Pre-Optimization for Asynchronous Event-Inertial Odometry Zhixiang Wang et.al. 2412.08909 null
2024-12-10 EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering Toshiya Yura et.al. 2412.07293 null
2024-12-09 Fitting Spherical Gaussians to Dynamic HDRI Sequences Pascal Clausen et.al. 2412.06511 null
2024-12-09 Event fields: Capturing light fields at high speed, resolution, and dynamic range Ziyuan Qu et.al. 2412.06191 null
2024-12-07 On an Analytical Inversion Formula for the Modulo Radon Transform Matthias Beckmann et.al. 2412.05711 null
2024-12-05 DHOST theories as disformal gravity: From black holes to radiative spacetimes Jibril Ben Achour et.al. 2412.04135 null
2024-12-05 High-power single-cycle THz emission from large-area photoconductive emitters at 400 kHz Mohsen Khalili et.al. 2412.04004 null
2024-12-05 Enhancing and Accelerating Diffusion-Based Inverse Problem Solving through Measurements Optimization Tianyu Chen et.al. 2412.03941 null
2024-12-04 Accelerating HI density predictions during the Epoch of Reionization using a GPR-based emulator on N-body simulations Gaurav Pundir et.al. 2412.03485 null
2024-12-03 EvRT-DETR: The Surprising Effectiveness of DETR-based Detection for Event Cameras Dmitrii Torbunov et.al. 2412.02890 link
2024-12-02 Learning Differential Pyramid Representation for Tone Mapping Qirui Yang et.al. 2412.01463 null
2024-11-28 Event-based Tracking of Any Point with Motion-Robust Correlation Features Friedhelm Hamann et.al. 2412.00133 link
2024-11-25 CapHDR2IR: Caption-Driven Transfer from Visible Light to Infrared Domain Jingchao Peng et.al. 2411.16327 null
2024-11-22 High-dynamic-range atomic clocks with dual Heisenberg-limited precision scaling Jungeng Zhou et.al. 2411.14944 null
2024-11-20 Demonstrating the Suitability of Neuromorphic, Event-Based, Dynamic Vision Sensors for In Process Monitoring of Metallic Additive Manufacturing and Welding David Mascareñas et.al. 2411.13108 null
2024-11-18 Noise Filtering Benchmark for Neuromorphic Satellites Observations Sami Arja et.al. 2411.11233 link
2024-11-16 Beyond Feature Mapping GAP: Integrating Real HDRTV Priors for Superior SDRTV-to-HDRTV Conversion Kepeng Xu et.al. 2411.10775 null
2024-11-15 CaLES: A GPU-accelerated solver for large-eddy simulation of wall-bounded flows Maochao Xiao et.al. 2411.09364 link
2024-11-11 Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models NVIDIA et.al. 2411.07126 null
2024-11-25 Increasing the scalability of graph convolution for FPGA-implemented event-based vision Piotr Wzorek et.al. 2411.04269 null
2024-11-13 DEIO: Deep Event Inertial Odometry Weipeng Guan et.al. 2411.03928 link
2024-11-05 Monocular Event-Based Vision for Obstacle Avoidance with a Quadrotor Anish Bhattacharya et.al. 2411.03303 null
2024-11-05 Learning-based Lossless Event Data Compression Ahmadreza Sezavar et.al. 2411.03010 null
2024-10-30 Automatic programming via large language models with population self-evolution for dynamic job shop scheduling problem Jin Huang et.al. 2410.22657 null
2024-10-29 EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data Zhonghua Yi et.al. 2410.21743 link
2024-10-28 NYC-Event-VPR: A Large-Scale High-Resolution Event-Based Visual Place Recognition Dataset in Dense Urban Environments Taiyi Pan et.al. 2410.21615 link
2024-10-27 BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events Yijin Li et.al. 2410.20451 null
2024-10-26 Unleashing Dynamic Range and Resolution in Unlimited Sensing Framework via Novel Hardware Yuliang Zhu et.al. 2410.20193 null
2024-10-21 Scene-Segmentation-Based Exposure Compensation for Tone Mapping of High Dynamic Range Scenes Yuma Kinoshita et.al. 2410.19839 null
2024-10-24 Environment Maps Editing using Inverse Rendering and Adversarial Implicit Functions Antonio D'Orazio et.al. 2410.18622 null
2024-10-23 Frequency-dependent amplitude correction to free-precession scalar magnetometers M. E. Limes et.al. 2410.18224 null
2024-10-22 SpikMamba: When SNN meets Mamba in Event-based Human Action Recognition Jiaqi Chen et.al. 2410.16746 link
2024-10-19 A Cycle Ride to HDR: Semantics Aware Self-Supervised Framework for Unpaired LDR-to-HDR Image Translation Hrishav Bakul Barua et.al. 2410.15068 link
2024-10-17 360U-Former: HDR Illumination Estimation with Panoramic Adapted Vision Transformers Jack Hilliard et.al. 2410.13566 null
2024-10-17 On Quantum Programming Languages Benoît Valiron et.al. 2410.13337 null
2024-10-16 An O(m+n)-Space Spatiotemporal Denoising Filter with Cache-Like Memories for Dynamic Vision Sensors Qinghang Zhao et.al. 2410.12423 null
2024-10-10 DifFRelight: Diffusion-Based Facial Performance Relighting Mingming He et.al. 2410.08188 null
2024-10-18 IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera Jian Huang et.al. 2410.08107 link
2024-10-09 Fourier-based Action Recognition for Wildlife Behavior Quantification with Event Cameras Friedhelm Hamann et.al. 2410.06698 null
2024-10-03 Spiking Neural Network as Adaptive Event Stream Slicer Jiahang Cao et.al. 2410.02249 link
2024-10-03 Capturing complex hand movements and object interactions using machine learning-powered stretchable smart textile gloves Arvin Tashakori et.al. 2410.02221 link
2024-10-01 Signatures of Black Hole Spin and Plasma Acceleration in Jet Polarimetry Zachary Gelles et.al. 2410.00954 null
2024-10-04 VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models Jiapeng Wang et.al. 2410.00741 null
2024-09-26 Photon Inhibition for Energy-Efficient Single-Photon Imaging Lucas J. Koerner et.al. 2409.18337 null
2024-09-26 Deblur e-NeRF: NeRF from Motion-Blurred Events under High-speed or Low-light Conditions Weng Fei Low et.al. 2409.17988 null
2024-09-26 Unsupervised Learning Based Multi-Scale Exposure Fusion Chaobing Zheng et.al. 2409.17830 null
2024-09-26 Event-based Stereo Depth Estimation: A Survey Suman Ghosh et.al. 2409.17680 null
2024-09-26 Dynamic Subframe Splitting and Spatio-Temporal Motion Entangled Sparse Attention for RGB-E Tracking Pengcheng Shao et.al. 2409.17560 null
2024-09-25 EventHDR: from Event to High-Speed HDR Videos and Beyond Yunhao Zou et.al. 2409.17029 null
2024-09-25 Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training Kun Song et.al. 2409.16767 null
2024-09-24 Sub-Nyquist USF Spectral Estimation: $K$ Frequencies with $6K + 4$ Modulo Samples Ruiming Guo et.al. 2409.16472 null
2024-09-24 Neuromorphic Drone Detection: an Event-RGB Multimodal Approach Gabriele Magrini et.al. 2409.16099 link
2024-09-24 Deep chroma compression of tone-mapped images Xenios Milidonis et.al. 2409.16032 link
2024-09-23 Mixing Data-driven and Geometric Models for Satellite Docking Port State Estimation using an RGB or Event Camera Cedric Le Gentil et.al. 2409.15581 null
2024-09-23 SpikeGS: Learning 3D Gaussian Fields from Continuous Spike Stream Jinze Yu et.al. 2409.15176 link
2024-09-21 Monocular Event-Inertial Odometry with Adaptive decay-based Time Surface and Polarity-aware Tracking Kai Tang et.al. 2409.13971 null
2024-09-20 Intrinsic Single-Image HDR Reconstruction Sebastian Dille et.al. 2409.13803 link
2024-09-20 Elite-EvGS: Learning Event-based 3D Gaussian Splatting by Distilling Event-to-Video Priors Zixin Zhang et.al. 2409.13392 null
2024-09-18 EventAug: Multifaceted Spatio-Temporal Data Augmentation Methods for Event-based Learning Yukun Tian et.al. 2409.11813 null
2024-09-18 Enhancing Complex Formula Recognition with Hierarchical Detail-Focused Network Jiale Wang et.al. 2409.11677 null
2024-09-16 Programmable multifunctional integrated microwave photonic circuit on thin-film lithium niobate Chuangchuang Wei et.al. 2409.10227 null
2024-09-15 SciDVS: A Scientific Event Camera with 1.7% Temporal Contrast Sensitivity at 0.7 lux Rui Graca et.al. 2409.09648 null
2024-09-13 Integration of high-performance compact interferometric sensors in a suspended interferometer Alexandra Mitchell et.al. 2409.08843 null
2024-09-13 Adaptive Robust High-Precision Atomic Gravimetry Jinye Wei et.al. 2409.08550 null
2024-09-07 Neural Augmentation Based Panoramic High Dynamic Range Stitching Chaobing Zheng et.al. 2409.04679 null
2024-09-05 MouseSIS: A Frames-and-Events Dataset for Space-Time Instance Segmentation of Mice Friedhelm Hamann et.al. 2409.03358 link
2024-09-03 Gradient events: improved acquisition of visual information in event cameras Eero Lehtonen et.al. 2409.01764 null
2024-09-02 SoK: Security of the Image Processing Pipeline in Autonomous Vehicles Michael Kühr et.al. 2409.01234 link
2024-08-30 Synthetic Lunar Terrain: A Multimodal Open Dataset for Training and Evaluating Neuromorphic Vision Algorithms Marcus Märtens et.al. 2408.16971 null
2024-08-29 EvLight++: Low-Light Video Enhancement with an Event Camera: A Large-Scale Real-World Dataset, Novel Method, and More Kanghao Chen et.al. 2408.16254 null
2024-08-28 ES-PTAM: Event-based Stereo Parallel Tracking and Mapping Suman Ghosh et.al. 2408.15605 link
2024-08-27 Towards Real-world Event-guided Low-light Video Enhancement and Deblurring Taewoo Kim et.al. 2408.14916 link
2024-08-27 Recent Event Camera Innovations: A Survey Bharatesh Chakravarthi et.al. 2408.13627 link
2024-08-24 Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation Yuxuan Zhou et.al. 2408.13586 link
2024-08-22 ISETHDR: A Physics-based Synthetic Radiance Dataset for High Dynamic Range Driving Scenes Zhenyi Liu et.al. 2408.12048 link
2024-08-20 Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm Xiao Wang et.al. 2408.10488 link
2024-08-20 MambaEVT: Event Stream based Visual Object Tracking using State Space Model Xiao Wang et.al. 2408.10487 link
2024-08-19 Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms Xiao Wang et.al. 2408.09764 link
2024-08-19 Phase-Separated Charge Order and Twinning Across Length Scales in CsV $_3$Sb$_5$ Jayden Plumb et.al. 2408.08842 null
2024-08-16 CoSEC: A Coaxial Stereo Event Camera Dataset for Autonomous Driving Shihan Peng et.al. 2408.08500 null
2024-08-13 MAIR++: Improving Multi-view Attention Inverse Rendering with Implicit Lighting Representation JunYong Choi et.al. 2408.06707 null
2024-08-13 HDRGS: High Dynamic Range Gaussian Splatting Jiahao Wu et.al. 2408.06543 link
2024-08-12 Rethinking Video with a Universal Event-Based Representation Andrew Freeman et.al. 2408.06248 null
2024-08-10 EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency Junjie Jiang et.al. 2408.05452 null
2024-08-06 Line-based 6-DoF Object Pose Estimation and Tracking With an Event Camera Zibin Liu et.al. 2408.03225 link
2024-07-31 Exploiting Change Blindness for Video Coding: Perspectives from a Less Promising User Study Mitra Amiri et.al. 2408.00052 null
2024-07-23 HDRSplat: Gaussian Splatting for High Dynamic Range 3D Scene Reconstruction from Raw Images Shreyas Singh et.al. 2407.16503 link
2024-07-23 SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging Lingtong Kong et.al. 2407.16308 link
2024-07-24 SwinSF: Image Reconstruction from Spatial-Temporal Spike Streams Liangyan Jiang et.al. 2407.15708 link
2024-08-04 Exposure Completing for Temporally Consistent Neural High Dynamic Range Video Rendering Jiahao Cui et.al. 2407.13309 link
2024-07-18 Learned HDR Image Compression for Perceptually Optimal Storage and Display Peibei Cao et.al. 2407.13179 null
2024-07-17 Nonlinear tomographic reconstruction via nonsmooth optimization Vasileios Charisopoulos et.al. 2407.12984 null
2024-07-16 VideoClusterNet: Self-Supervised and Adaptive Clustering For Videos Devesh Walawalkar et.al. 2407.12214 null
2024-07-16 I $^2$ -SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM Gwangtak Bae et.al. 2407.11347 null
2024-07-15 Temporal Event Stereo via Joint Learning with Stereoscopic Flow Hoonhee Cho et.al. 2407.10831 link
2024-07-15 Towards Robust Event-based Networks for Nighttime via Unpaired Day-to-Night Event Translation Yuhwan Jeong et.al. 2407.10703 link
2024-07-15 Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction Lin Zhu et.al. 2407.10636 null
2024-07-18 Efficient hybrid technique for generating sub-grid haloes in reionization simulations Ankur Barsode et.al. 2407.10585 null
2024-07-12 Radiance Fields from Photons Sacha Jungerman et.al. 2407.09386 null
2024-07-11 Event-based vision on FPGAs -- a survey Tomasz Kryjak et.al. 2407.08356 null
2024-07-12 Dynamic phase transition into a mixed-CDW state in 1 $T$-TaS$_2$ via a thermal quench A. de la Torre et.al. 2407.07953 null
2024-07-08 PanDORA: Casual HDR Radiance Acquisition for Indoor Scenes Mohammad Reza Karimi Dastjerdi et.al. 2407.06150 null
2024-07-08 Neuromorphic Imaging with Super-Resolution Pei Zhang et.al. 2407.05764 null

(back to top)

Low-Level

Publish Date Title Authors PDF Code
2025-07-23 DFDNet: Dynamic Frequency-Guided De-Flare Network Minglong Xue et.al. 2507.17489 null
2025-07-23 Content-based 3D Image Retrieval and a ColBERT-inspired Re-ranking for Tumor Flagging and Staging Farnaz Khun Jush et.al. 2507.17412 null
2025-07-23 PolarAnything: Diffusion-based Polarimetric Image Synthesis Kailong Zhang et.al. 2507.17268 null
2025-07-23 UNICE: Training A Universal Image Contrast Enhancer Ruodai Cui et.al. 2507.17157 null
2025-07-22 A High Magnifications Histopathology Image Dataset for Oral Squamous Cell Carcinoma Diagnosis and Prognosis Jinquan Guan et.al. 2507.16360 null
2025-07-21 SAIGFormer: A Spatially-Adaptive Illumination-Guided Network for Low-Light Image Enhancement Hanting Li et.al. 2507.15520 null
2025-07-20 EBA-AI: Ethics-Guided Bias-Aware AI for Efficient Underwater Image Enhancement and Coral Reef Monitoring Lyes Saad Saoud et.al. 2507.15036 null
2025-07-20 U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs Xiaojie Li et.al. 2507.14902 null
2025-07-20 Exploring Scalable Unified Modeling for General Low-Level Vision Xiangyu Chen et.al. 2507.14801 null
2025-07-18 Global Modeling Matters: A Fast, Lightweight and Effective Baseline for Efficient Image Restoration Xingyu Jiang et.al. 2507.13663 null
2025-07-17 FAR-Net: Multi-Stage Fusion Network with Enhanced Semantic Alignment and Adaptive Reconciliation for Composed Image Retrieval Jeong-Woo Park et.al. 2507.12823 null
2025-07-17 MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval Jeong-Woo Park et.al. 2507.12819 null
2025-07-16 QuRe: Query-Relevant Retrieval through Hard Negative Sampling in Composed Image Retrieval Jaehyun Kwak et.al. 2507.12416 null
2025-07-16 Wavelet-based Decoupling Framework for low-light Stereo Image Enhancement Shuangli Du et.al. 2507.12188 null
2025-07-16 Learning Pixel-adaptive Multi-layer Perceptrons for Real-time Image Enhancement Junyu Lou et.al. 2507.12135 null
2025-07-16 Unsupervised Part Discovery via Descriptor-Based Masked Image Restoration with Optimized Constraints Jiahao Xia et.al. 2507.11985 null
2025-07-16 A Spatial-Physics Informed Model for 3D Spiral Sample Scanned by SQUID Microscopy J. Senthilnath et.al. 2507.11853 null
2025-07-14 CWNet: Causal Wavelet Network for Low-Light Image Enhancement Tongshun Zhang et.al. 2507.10689 null
2025-07-14 GT-Loc: Unifying When and Where in Images Through a Joint Embedding Space David G. Shatwell et.al. 2507.10473 null
2025-07-14 RefSTAR: Blind Facial Image Restoration with Reference Selection, Transfer, and Reconstruction Zhicun Yin et.al. 2507.10470 null
2025-07-14 Text-to-Remote-Sensing-Image Retrieval beyond RGB Sources Daniele Rege Cambrin et.al. 2507.10403 null
2025-07-14 On a class of forward-backward reaction-diffusion systems with local and nonlocal coupling for image restoration Yihui Tong et.al. 2507.10393 null
2025-07-13 A New Wireless Image Transmission System Using Code Index Modulation and Image Enhancement for High-Rate Next Generation Networks Burak Ahmet Ozden et.al. 2507.09713 null
2025-07-11 RadiomicsRetrieval: A Customizable Framework for Medical Image Retrieval Using Radiomics Features Inye Na et.al. 2507.08546 null
2025-07-11 Deep Hashing with Semantic Hash Centers for Image Retrieval Li Chen et.al. 2507.08404 null
2025-07-11 Single-Step Latent Diffusion for Underwater Image Restoration Jiayi Wu et.al. 2507.07878 null
2025-07-10 IRAF-SLAM: An Illumination-Robust and Adaptive Feature-Culling Front-End for Visual SLAM in Challenging Environments Thanh Nguyen Canh et.al. 2507.07752 null
2025-07-10 Degradation-Agnostic Statistical Facial Feature Transformation for Blind Face Restoration in Adverse Weather Conditions Chang-Hwan Son et.al. 2507.07464 null
2025-07-08 FACap: A Large-scale Fashion Dataset for Fine-grained Composed Image Retrieval François Gardères et.al. 2507.07135 null
2025-07-09 HVI-CIDNet+: Beyond Extreme Darkness for Low-Light Image Enhancement Qingsen Yan et.al. 2507.06814 null
2025-07-09 Residual Prior-driven Frequency-aware Network for Image Fusion Guan Zheng et.al. 2507.06735 null
2025-07-09 Enhancing Diffusion Model Stability for Image Restoration via Gradient Management Hongjie Wu et.al. 2507.06656 null
2025-07-09 MS-DPPs: Multi-Source Determinantal Point Processes for Contextual Diversity Refinement of Composite Attributes in Text to Image Retrieval Naoya Sogi et.al. 2507.06654 null
2025-07-09 Capturing Stable HDR Videos Using a Dual-Camera System Qianyu Zhang et.al. 2507.06593 null
2025-07-08 Automatic Synthesis of High-Quality Triplet Data for Composed Image Retrieval Haiwen Li et.al. 2507.05970 null
2025-07-08 OFFSET: Segmentation-based Focus Shift Revision for Composed Image Retrieval Zhiwei Chen et.al. 2507.05631 null
2025-07-08 Kernel Density Steering: Inference-Time Scaling via Mode Seeking for Image Restoration Yuyang Hu et.al. 2507.05604 null
2025-07-07 Simulating Refractive Distortions and Weather-Induced Artifacts for Resource-Constrained Autonomous Perception Moseli Mots'oehli et.al. 2507.05536 null
2025-07-07 Llama Nemoretriever Colembed: Top-Performing Text-Image Retrieval Model Mengyao Xu et.al. 2507.05513 null
2025-07-07 An analysis of vision-language models for fabric retrieval Francesco Giuliari et.al. 2507.04735 null
2025-07-06 Multi-Modal Semantic Parsing for the Interpretation of Tombstone Inscriptions Xiao Zhang et.al. 2507.04377 null
2025-07-06 Towards Lightest Low-Light Image Enhancement Architecture for Mobile Devices Guangrui Bai et.al. 2507.04277 null
2025-07-06 Quick Bypass Mechanism of Zero-Shot Diffusion-Based Image Restoration Yu-Shan Tai et.al. 2507.04207 null
2025-07-05 EdgeSRIE: A hybrid deep learning framework for real-time speckle reduction and image enhancement on portable ultrasound systems Hyunwoo Cho et.al. 2507.03937 null
2025-07-03 IGDNet: Zero-Shot Robust Underexposed Image Enhancement via Illumination-Guided and Denoising Hailong Yan et.al. 2507.02445 null
2025-07-03 MAC-Lookup: Multi-Axis Conditional Lookup Model for Underwater Image Enhancement Fanghai Yi et.al. 2507.02270 null
2025-07-03 SurgVisAgent: Multimodal Agentic Model for Versatile Surgical Visual Enhancement Zeyu Lei et.al. 2507.02252 null
2025-07-02 MobileIE: An Extremely Lightweight and Effective ConvNet for Real-Time Image Enhancement on Mobile Devices Hailong Yan et.al. 2507.01838 null
2025-07-02 DocShaDiffusion: Diffusion Model in Latent Space for Document Image Shadow Removal Wenjie Liu et.al. 2507.01422 null
2025-07-01 UAVD-Mamba: Deformable Token Fusion Vision Mamba for Multimodal UAV Detection Wei Li et.al. 2507.00849 null
2025-07-04 LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling Huaqiu Li et.al. 2507.00790 null
2025-07-01 Laplace-Mamba: Laplace Frequency Prior-Guided Mamba-CNN Fusion Network for Image Dehazing Yongzhen Wang et.al. 2507.00501 null
2025-06-30 Oneta: Multi-Style Image Enhancement Using Eigentransformation Functions Jiwon Kim et.al. 2506.23547 null
2025-06-29 Layer Decomposition and Morphological Reconstruction for Task-Oriented Infrared Image Enhancement Siyuan Chai et.al. 2506.23353 null
2025-06-29 Double-Diffusion: Diffusion Conditioned Diffusion Probabilistic Model For Air Quality Prediction Hanlin Dong et.al. 2506.23053 null
2025-06-28 Utilizing a Novel Deep Learning Method for Scene Categorization in Remote Sensing Data Ghufran A. Omran et.al. 2506.22939 null
2025-06-28 Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval Li-Cheng Shen et.al. 2506.22864 null
2025-06-28 UniFuse: A Unified All-in-One Framework for Multi-Modal Medical Image Fusion Under Diverse Degradations and Misalignments Dayong Su et.al. 2506.22736 null
2025-06-27 EAMamba: Efficient All-Around Vision State Space Model for Image Restoration Yu-Cheng Lin et.al. 2506.22246 null
2025-06-27 ReF-LLE: Personalized Low-Light Enhancement via Reference-Guided Deep Reinforcement Learning Ming Zhao et.al. 2506.22216 null
2025-06-26 Elucidating and Endowing the Diffusion Training Paradigm for General Image Restoration Xin Lu et.al. 2506.21722 null
2025-06-26 Wild refitting for black box prediction Martin J. Wainwright et.al. 2506.21460 null
2025-06-26 Learning to See in the Extremely Dark Hai Jiang et.al. 2506.21132 null
2025-06-25 On the Burstiness of Faces in Set Jiong Wang et.al. 2506.20312 null
2025-06-25 TDiR: Transformer based Diffusion for Image Restoration Tasks Abbas Anwar et.al. 2506.20302 null
2025-06-24 A Comparative Study of NAFNet Baselines for Image Restoration Vladislav Esaulov et.al. 2506.19845 null
2025-06-24 NAADA: A Noise-Aware Attention Denoising Autoencoder for Dental Panoramic Radiographs Khuram Naveed et.al. 2506.19387 null
2025-06-24 jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval Michael Günther et.al. 2506.18902 null
2025-06-23 Enhancing Image Restoration Transformer via Adaptive Translation Equivariance JiaKui Hu et.al. 2506.18520 null
2025-06-23 BSMamba: Brightness and Semantic Modeling for Long-Range Interaction in Low-Light Image Enhancement Tongshun Zhang et.al. 2506.18346 null
2025-06-23 A Multi-Scale Spatial Attention-Based Zero-Shot Learning Framework for Low-Light Image Enhancement Muhammad Azeem Aslam et.al. 2506.18323 null
2025-06-23 Attention-Based Ensemble Learning for Crop Classification Using Landsat 8-9 Fusion Zeeshan Ramzan et.al. 2506.18321 null
2025-06-26 Referring Expression Instance Retrieval and A Strong End-to-End Baseline Xiangzhao Hao et.al. 2506.18246 null
2025-06-22 CmFNet: Cross-modal Fusion Network for Weakly-supervised Segmentation of Medical Images Dongdong Meng et.al. 2506.18042 null
2025-06-20 Reversing Flow for Image Restoration Haina Qin et.al. 2506.16961 null
2025-06-20 Visual-Instructed Degradation Diffusion for All-in-One Image Restoration Wenyang Luo et.al. 2506.16960 link
2025-06-20 Temperature calibration of surface emissivities with an improved thermal image enhancement network Ning Chu et.al. 2506.16803 null
2025-06-23 RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought Junbo Qiao et.al. 2506.16796 link
2025-06-20 TextBraTS: Text-Guided Volumetric Brain Tumor Segmentation with Innovative Dataset Development and Fusion Module Exploration Xiaoyu Shi et.al. 2506.16784 link
2025-06-20 Infrared and Visible Image Fusion Based on Implicit Neural Representations Shuchen Sun et.al. 2506.16773 null
2025-06-20 Class Agnostic Instance-level Descriptor for Visual Instance Search Qi-Ying Sun et.al. 2506.16745 null
2025-06-20 TeSG: Textual Semantic Guidance for Infrared and Visible Image Fusion Mingrui Zhu et.al. 2506.16730 null
2025-06-19 MambaHash: Visual State Space Deep Hashing Model for Large-Scale Image Retrieval Chao He et.al. 2506.16353 link
2025-06-19 Fine-grained Image Retrieval via Dual-Vision Adaptation Xin Jiang et.al. 2506.16273 null
2025-06-18 DM-FNet: Unified multimodal medical image fusion via diffusion process-trained encoder-decoder Dan He et.al. 2506.15218 link
2025-06-18 ReSeDis: A Dataset for Referring-based Object Search across Large-Scale Image Collections Ziling Huang et.al. 2506.15180 null
2025-06-17 HARMONY: A Scalable Distributed Vector Database for High-Throughput Approximate Nearest Neighbor Search Qian Xu et.al. 2506.14707 null
2025-06-17 Optimization-Based Image Restoration under Implementation Constraints in Optical Analog Circuits Taisei Kato et.al. 2506.14624 null
2025-06-17 Unsupervised Imaging Inverse Problems with Diffusion Distribution Matching Giacomo Meanti et.al. 2506.14605 link
2025-06-17 Exploring Diffusion with Test-Time Training on Efficient Image Restoration Rongchang Lu et.al. 2506.14541 null
2025-06-17 GrFormer: A Novel Transformer on Grassmann Manifold for Infrared and Visible Image Fusion Huan Kang et.al. 2506.14384 null
2025-06-18 DREAM: On hallucinations in AI-generated content for nuclear medicine imaging Menghua Xia et.al. 2506.13995 null
2025-06-16 Robust Recursive Fusion of Multiresolution Multispectral Images with Location-Aware Neural Networks Haoqing Li et.al. 2506.13733 null
2025-06-16 Exploiting the Exact Denoising Posterior Score in Training-Free Guidance of Diffusion Models Gregory Bellchambers et.al. 2506.13614 null
2025-06-16 A Semantically-Aware Relevance Measure for Content-Based Medical Image Retrieval Evaluation Xiaoyang Wei et.al. 2506.13509 null
2025-06-17 Hierarchical Multi-Positive Contrastive Learning for Patent Image Retrieval Kshitij Kavimandan et.al. 2506.13496 null
2025-06-16 EmbodiedPlace: Learning Mixture-of-Features with Embodied Constraints for Visual Place Recognition Bingxi Liu et.al. 2506.13133 null
2025-06-15 Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution Hang Xu et.al. 2506.12738 null
2025-06-14 An Iterative PDE Based Illumination Restoration Scheme for Image Enhancement Dragos-Patru Covei et.al. 2506.12560 null
2025-06-14 UniDet-D: A Unified Dynamic Spectral Attention Model for Object Detection under Adverse Weathers Yuantao Wang et.al. 2506.12324 null
2025-06-11 Towards a general-purpose foundation model for fMRI analysis Cheng Wang et.al. 2506.11167 null
2025-06-10 Adaptive Object Detection with ESRGAN-Enhanced Resolution & Faster R-CNN Divya Swetha K et.al. 2506.11122 null
2025-06-12 FSATFusion: Frequency-Spatial Attention Transformer for Infrared and Visible Image Fusion Tianpei Zhang et.al. 2506.10366 link
2025-06-11 Improving Personalized Search with Regularized Low-Rank Parameter Updates Fiona Ryan et.al. 2506.10182 link
2025-06-10 Safeguarding Multimodal Knowledge Copyright in the RAG-as-a-Service Environment Tianyu Chen et.al. 2506.10030 link
2025-06-11 Text-Aware Image Restoration with Diffusion Models Jaewon Min et.al. 2506.09993 null
2025-06-11 Hierarchical Image Matching for UAV Absolute Visual Localization via Semantic and Structural Constraints Xiangkai Zhang et.al. 2506.09748 null
2025-06-11 Beyond Calibration: Physically Informed Learning for Raw-to-Raw Mapping Peter Grönquist et.al. 2506.08650 null
2025-06-09 PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement Teng Hu et.al. 2506.07848 null
2025-06-09 M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration Yongzhen Wang et.al. 2506.07814 null
2025-06-09 Design and Evaluation of Deep Learning-Based Dual-Spectrum Image Fusion Methods Beining Xu et.al. 2506.07779 null
2025-06-08 Multi-Step Guided Diffusion for Image Restoration on Edge Devices: Toward Lightweight Perception in Embodied AI Aditya Chakravarty et.al. 2506.07286 null
2025-06-08 A PDE-Based Image Restoration Method: Mathematical Analysis and Implementation Dragos-Patru Covei et.al. 2506.07132 null
2025-06-07 Zero Shot Composed Image Retrieval Santhosh Kakarla et.al. 2506.06602 null
2025-06-06 A Deep Learning Approach for Facial Attribute Manipulation and Reconstruction in Surveillance and Reconnaissance Anees Nashath Shaik et.al. 2506.06578 null
2025-06-06 GenIR: Generative Visual Feedback for Mental Image Retrieval Diji Yang et.al. 2506.06220 null
2025-06-06 Bidirectional Image-Event Guided Low-Light Image Enhancement Zhanwen Liu et.al. 2506.06120 null
2025-06-06 NTIRE 2025 Challenge on HR Depth from Images of Specular and Transparent Surfaces Pierluigi Zama Ramirez et.al. 2506.05815 null
2025-06-05 UniRes: Universal Image Restoration for Complex Degradations Mo Zhou et.al. 2506.05599 null
2025-06-05 OpenRR-5k: A Large-Scale Benchmark for Reflection Removal in the Wild Jie Cai et.al. 2506.05482 null
2025-06-05 Degradation-Aware Image Enhancement via Vision-Language Classification Jie Cai et.al. 2506.05450 null
2025-06-05 SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training Jianyi Wang et.al. 2506.05301 null
2025-06-05 Physics Informed Capsule Enhanced Variational AutoEncoder for Underwater Image Enhancement Niki Martinel et.al. 2506.04753 null
2025-06-04 A Poisson-Guided Decomposition Network for Extreme Low-Light Image Enhancement Isha Rao et.al. 2506.04470 null
2025-06-04 WIFE-Fusion:Wavelet-aware Intra-inter Frequency Enhancement for Multi-model Image Fusion Tianpei Zhang et.al. 2506.03555 null
2025-06-03 NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results Xiaohong Liu et.al. 2506.02875 null
2025-06-03 ControlMambaIR: Conditional Controls with State-Space Model for Image Restoration Cheng Yang et.al. 2506.02633 null
2025-06-02 Entity Image and Mixed-Modal Image Retrieval Datasets Cristian-Ioan Blaga et.al. 2506.02291 null
2025-06-04 NTIRE 2025 Challenge on RAW Image Restoration and Super-Resolution Marcos V. Conde et.al. 2506.02197 null
2025-06-02 RAW Image Reconstruction from RGB on Smartphones. NTIRE 2025 Challenge Report Marcos V. Conde et.al. 2506.01947 null
2025-06-02 NTIRE 2025 the 2nd Restore Any Image Model (RAIM) in the Wild Challenge Jie Liang et.al. 2506.01394 null
2025-06-01 Quantization-based Bounds on the Wasserstein Metric Jonathan Bobrutsky et.al. 2506.00976 null
2025-05-31 Image Restoration Learning via Noisy Supervision in the Fourier Domain Haosen Liu et.al. 2506.00564 null
2025-05-30 RT-X Net: RGB-Thermal cross attention network for Low-Light Image Enhancement Raman Jha et.al. 2505.24705 link
2025-05-30 Model-Guided Network with Cluster-Based Operators for Spatio-Spectral Super-Resolution Ivan Pereira-Sánchez et.al. 2505.24605 link
2025-05-30 SORCE: Small Object Retrieval in Complex Environments Chunxu Liu et.al. 2505.24441 link
2025-05-30 IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion Models Hanting Wang et.al. 2505.24406 link
2025-05-30 Boosting All-in-One Image Restoration via Self-Improved Privilege Learning Gang Wu et.al. 2505.24207 link
2025-05-29 Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch Aneeshan Sain et.al. 2505.23763 null
2025-05-29 Proximal Algorithm Unrolling: Flexible and Efficient Reconstruction Networks for Single-Pixel Imaging Ping Wang et.al. 2505.23180 link
2025-05-29 CURVE: CLIP-Utilized Reinforcement Learning for Visual Image Enhancement via Simple Image Processing Yuka Ogino et.al. 2505.23102 null
2025-05-29 URWKV: Unified RWKV Model with Multi-state Perspective for Low-light Image Restoration Rui Xu et.al. 2505.23068 link
2025-05-29 Vision-Based Assistive Technologies for People with Cerebral Visual Impairment: A Review and Focus Study Bhanuka Gamage et.al. 2505.22983 null
2025-05-29 EquiReg: Equivariance Regularized Diffusion for Inverse Problems Bahareh Tolooshams et.al. 2505.22973 null
2025-05-28 From Controlled Scenarios to Real-World: Cross-Domain Degradation Pattern Matching for All-in-One Image Restoration Junyu Fan et.al. 2505.22284 null
2025-05-28 UAVPairs: A Challenging Benchmark for Match Pair Retrieval of Large-scale UAV Images Junhuan Liu et.al. 2505.22098 null
2025-05-28 Fast Feature Matching of UAV Images via Matrix Band Reduction-based GPU Data Schedule San Jiang et.al. 2505.22089 null
2025-05-28 GL-PGENet: A Parameterized Generation Framework for Robust Document Image Enhancement Zhihong Tang et.al. 2505.22021 null
2025-05-28 Reference-Guided Identity Preserving Face Restoration Mo Zhou et.al. 2505.21905 null
2025-05-28 Broadening Our View: Assistive Technology for Cerebral Visual Impairment Bhanuka Gamage et.al. 2505.21875 null
2025-05-27 QuARI: Query Adaptive Retrieval Improvement Eric Xing et.al. 2505.21647 null
2025-05-27 BaryIR: Learning Multi-Source Unified Representation in Continuous Barycenter Space for Generalizable All-in-One Image Restoration Xiaole Tang et.al. 2505.21637 null
2025-05-27 Causality-Driven Infrared and Visible Image Fusion Linli Ma et.al. 2505.20830 null
2025-05-27 ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval Eric Xing et.al. 2505.20764 link
2025-05-28 See through the Dark: Learning Illumination-affined Representations for Nighttime Occupancy Prediction Yuan Wu et.al. 2505.20641 link
2025-05-28 PreP-OCR: A Complete Pipeline for Document Image Restoration and Enhanced OCR Accuracy Shuhao Guan et.al. 2505.20429 null
2025-05-26 Visualized Text-to-Image Retrieval Di Wu et.al. 2505.20291 link
2025-05-26 Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval Rong-Cheng Tu et.al. 2505.19952 null
2025-05-26 Can Visual Encoder Learn to See Arrows? Naoyuki Terashita et.al. 2505.19944 null
2025-05-26 Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning for Underwater Image Enhancement Afrah Shaahid et.al. 2505.19895 null
2025-05-26 A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking Zixiang Zhao et.al. 2505.19858 null
2025-05-26 A Regularization-Guided Equivariant Approach for Image Restoration Yulu Bai et.al. 2505.19799 link
2025-05-26 MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval Rong-Cheng Tu et.al. 2505.19707 null
2025-05-25 Improving Novel view synthesis of 360 $^\circ$ Scenes in Extremely Sparse Views by Jointly Training Hemisphere Sampled Synthetic Images Guangan Chen et.al. 2505.19264 link
2025-05-25 Benchmarking Laparoscopic Surgical Image Restoration and Beyond Jialun Pei et.al. 2505.19161 link
2025-05-25 Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition Xiaoyang Liu et.al. 2505.19120 link
2025-05-24 Manifold-aware Representation Learning for Degradation-agnostic Image Restoration Bin Ren et.al. 2505.18679 null
2025-05-23 RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration Sudarshan Rajagopalan et.al. 2505.18047 null
2025-05-23 DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval Yuxin Yang et.al. 2505.17796 null
2025-05-23 MODEM: A Morton-Order Degradation Estimation Mechanism for Adverse Weather Image Recovery Hainuo Wang et.al. 2505.17581 link
2025-05-23 Dual Ascent Diffusion for Inverse Problems Minseo Kim et.al. 2505.17353 null
2025-05-22 Forward-only Diffusion Probabilistic Models Ziwei Luo et.al. 2505.16733 link
2025-05-22 Clear Nights Ahead: Towards Multi-Weather Nighttime Image Restoration Yuetong Liu et.al. 2505.16479 null
2025-05-22 NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment Shuhao Han et.al. 2505.16314 null
2025-05-22 Deep Learning-Driven Ultra-High-Definition Image Restoration: A Survey Liyan Wang et.al. 2505.16161 link
2025-05-22 Breaking Complexity Barriers: High-Resolution Image Restoration with Rank Enhanced Linear Attention Yuang Ai et.al. 2505.16157 null
2025-05-21 Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval Siting Li et.al. 2505.15877 null
2025-05-21 SCENIR: Visual Semantic Clarity through Unsupervised Scene Graph Retrieval Nikolaos Chaidos et.al. 2505.15867 link
2025-05-22 Continuous Representation Methods, Theories, and Applications: An Overview and Perspectives Yisi Luo et.al. 2505.15222 link
2025-05-20 UHD Image Dehazing via anDehazeFormer with Atmospheric-aware KV Cache Pu Wang et.al. 2505.14010 null
2025-05-20 Multimodal RAG-driven Anomaly Detection and Classification in Laser Powder Bed Fusion using Large Language Models Kiarash Naghavi Khanghah et.al. 2505.13828 null
2025-05-19 Adaptive Image Restoration for Video Surveillance: A Real-Time Approach Muhammad Awais Amin et.al. 2505.13130 null
2025-05-19 LatentINDIGO: An INN-Guided Latent Diffusion Algorithm for Image Restoration Di You et.al. 2505.12935 null
2025-05-19 Towards a Universal Image Degradation Model via Content-Degradation Disentanglement Wenbo Yang et.al. 2505.12860 null
2025-05-19 Degradation-Aware Feature Perturbation for All-in-One Image Restoration Xiangpeng Tian et.al. 2505.12630 link
2025-05-18 Trustworthy Image Super-Resolution via Generative Pseudoinverse Andreas Floros et.al. 2505.12375 link
2025-05-18 SMFusion: Semantic-Preserving Fusion of Multimodal Medical Images for Enhanced Clinical Diagnosis Haozhe Xiang et.al. 2505.12251 null
2025-05-17 Self-Learning Hyperspectral and Multispectral Image Fusion via Adaptive Residual Guided Subspace Diffusion Model Jian Zhu et.al. 2505.11800 link
2025-05-16 Improved Bag-of-Words Image Retrieval with Geometric Constraints for Ground Texture Localization Aaron Wilhelm et.al. 2505.11620 null
2025-05-16 Diff-Unfolding: A Model-Based Score Learning Framework for Inverse Problems Yuanhao Wang et.al. 2505.11393 null
2025-05-16 Entropy-Driven Genetic Optimization for Deep-Feature-Guided Low-Light Image Enhancement Nirjhor Datta et.al. 2505.11246 link
2025-05-16 Redundancy-Aware Pretraining of Vision-Language Foundation Models in Remote Sensing Mathis Jürgen Adler et.al. 2505.11121 null
2025-05-15 torchmfbd: a flexible multi-object multi-frame blind deconvolution code A. Asensio Ramos et.al. 2505.10639 link
2025-05-19 Super-Resolution Generative Adversarial Networks based Video Enhancement Kağan ÇETİN et.al. 2505.10589 null
2025-05-14 PDE: Gene Effect Inspired Parameter Dynamic Evolution for Low-light Image Enhancement Tong Li et.al. 2505.09196 null
2025-05-13 Behind the Noise: Conformal Quantile Regression Reveals Emergent Representations Petrus H. Zwart et.al. 2505.08176 null
2025-05-12 Image Restoration via Integration of Optimal Control Techniques and the Hamilton-Jacobi-Bellman Equation Dragos-Patru Covei et.al. 2505.07699 null
2025-05-12 Generalizable Pancreas Segmentation via a Dual Self-Supervised Learning Framework Jun Li et.al. 2505.07165 null
2025-05-11 Bi-directional Self-Registration for Misaligned Infrared-Visible Image Fusion Timing Li et.al. 2505.06920 null
2025-05-10 UnfoldIR: Rethinking Deep Unfolding Network in Illumination Degradation Image Restoration Chunming He et.al. 2505.06683 null
2025-05-10 MultiTaskVIF: Segmentation-oriented visible and infrared image fusion via multi-task learning Zixian Zhao et.al. 2505.06665 null
2025-05-09 A review of advancements in low-light image enhancement using deep learning Fangxue Liu et.al. 2505.05759 null
2025-05-08 Semantic Style Transfer for Enhancing Animal Facial Landmark Detection Anadil Hussein et.al. 2505.05640 null
2025-05-08 A Preliminary Study for GPT-4o on Image Restoration Hao Yang et.al. 2505.05621 link
2025-05-07 Image Restoration via Multi-domain Learning Xingyu Jiang et.al. 2505.05504 link
2025-05-08 SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation Yonwoo Choi et.al. 2505.05475 link
2025-05-08 EAM: Enhancing Anything with Diffusion Transformers for Blind Super-Resolution Haizhen Xie et.al. 2505.05209 null
2025-05-08 ADNP-15: An Open-Source Histopathological Dataset for Neuritic Plaque Segmentation in Human Brain Whole Slide Images with Frequency Domain Image Enhancement for Stain Normalization Chenxi Zhao et.al. 2505.05041 null
2025-05-07 DFVO: Learning Darkness-free Visible and Infrared Image Disentanglement and Fusion All at Once Qi Zhou et.al. 2505.04526 link
2025-05-08 HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation Teng Hu et.al. 2505.04512 null
2025-05-07 TS-Diff: Two-Stage Diffusion Model for Low-Light RAW Image Enhancement Yi Li et.al. 2505.04281 link
2025-05-07 Regional chemical potential analysis for material surfaces Masahiro Fukuda et.al. 2505.04053 null
2025-05-04 OBD-Finder: Explainable Coarse-to-Fine Text-Centric Oracle Bone Duplicates Discovery Chongsheng Zhang et.al. 2505.03836 link
2025-05-06 DDaTR: Dynamic Difference-aware Temporal Residual Network for Longitudinal Radiology Report Generation Shanshan Song et.al. 2505.03401 link
2025-05-06 Seeing the Abstract: Translating the Abstract Language for Vision Language Models Davide Talon et.al. 2505.03242 link
2025-05-05 MSFNet-CPD: Multi-Scale Cross-Modal Fusion Network for Crop Pest Detection Jiaqi Zhang et.al. 2505.02441 link
2025-05-05 Quaternion Multi-focus Color Image Fusion Weihua Yang et.al. 2505.02365 null
2025-05-05 Quaternion Infrared Visible Image Fusion Weihua Yang et.al. 2505.02364 null
2025-05-04 HiLLIE: Human-in-the-Loop Training for Low-Light Image Enhancement Xiaorui Zhao et.al. 2505.02134 null
2025-05-03 ImageR: Enhancing Bug Report Clarity by Screenshots Xuchen Tan et.al. 2505.01925 null
2025-05-03 Multi-Scale Target-Aware Representation Learning for Fundus Image Enhancement Haofan Wu et.al. 2505.01831 null
2025-05-02 Deblurring fission fragment mass distributions Pierre Nzabahimana et.al. 2505.01294 null
2025-05-02 RD-UIE: Relation-Driven State Space Modeling for Underwater Image Enhancement Kui Jiang et.al. 2505.01224 link
2025-05-01 GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-Resolution Aditya Arora et.al. 2505.00687 null
2025-04-30 DGSolver: Diffusion Generalist Solver with Universal Posterior Sampling for Image Restoration Hebaixu Wang et.al. 2504.21487 link
2025-04-30 VR-FuseNet: A Fusion of Heterogeneous Fundus Data and Explainable Deep Network for Diabetic Retinopathy Classification Shamim Rahim Refat et.al. 2504.21464 null
2025-04-29 Spatial-enhanced Reflective Coded Aperture Snapshot Spectral Imaging Jiayu Di et.al. 2504.20516 null
2025-04-29 TTTFusion: A Test-Time Training-Based Strategy for Multimodal Medical Image Fusion in Surgical Robots Qinhua Xie et.al. 2504.20362 null
2025-04-27 FusionNet: Multi-model Linear Fusion Framework for Low-light Image Enhancement Kangbiao Shi et.al. 2504.19295 null
2025-04-27 Marine Snow Removal Using Internally Generated Pseudo Ground Truth Alexandra Malyugina et.al. 2504.19289 null
2025-04-27 Rendering Anywhere You See: Renderability Field-guided Gaussian Splatting Xiaofeng Jin et.al. 2504.19261 null
2025-04-27 Adaptive Dual-domain Learning for Underwater Image Enhancement Lingtao Peng et.al. 2504.19198 link
2025-04-27 DeepSPG: Exploring Deep Semantic Prior Guidance for Low-light Image Enhancement with Multimodal Learning Jialang Lu et.al. 2504.19127 null
2025-04-25 From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval Yabing Wang et.al. 2504.17990 null
2025-04-24 Dual Prompting Image Restoration with Diffusion Transformers Dehong Kong et.al. 2504.17825 null
2025-04-24 DPMambaIR:All-in-One Image Restoration via Degradation-Aware Prompt State Space Model Zhanwen Liu et.al. 2504.17732 null
2025-04-24 Inverse-Designed Metasurfaces for Wavefront Restoration in Under-Display Camera Systems Jaegang Jo et.al. 2504.17368 null
2025-04-24 I-INR: Iterative Implicit Neural Representations Ali Haider et.al. 2504.17364 null
2025-04-23 Rethinking Vision Transformer for Large-Scale Fine-Grained Image Retrieval Xin Jiang et.al. 2504.16691 null
2025-04-23 RouteWinFormer: A Route-Window Transformer for Middle-range Attention in Image Restoration Qifan Li et.al. 2504.16637 null
2025-04-23 Cross Paradigm Representation and Alignment Transformer for Image Deraining Shun Zou et.al. 2504.16455 null
2025-04-22 Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs Merve Cerit et.al. 2504.16323 link
2025-04-22 AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization Jinda Lu et.al. 2504.15619 null
2025-04-22 SonarT165: A Large-scale Benchmark and STFTrack Framework for Acoustic Object Tracking Yunfeng Li et.al. 2504.15609 link
2025-04-22 InstaRevive: One-Step Image Enhancement via Dynamic Score Matching Yixuan Zhu et.al. 2504.15513 null
2025-04-21 Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration Junyuan Deng et.al. 2504.15159 null
2025-04-21 Structure-guided Diffusion Transformer for Low-Light Image Enhancement Xiangchen Yin et.al. 2504.15054 null
2025-04-21 Distribution-aware Dataset Distillation for Efficient Image Restoration Zhuoran Zheng et.al. 2504.14826 null
2025-04-19 A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling Kyle Buettner et.al. 2504.14359 null
2025-04-19 Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation Bin Ren et.al. 2504.14249 null
2025-04-18 Towards Scale-Aware Low-Light Enhancement via Structure-Guided Transformer Design Wei Dong et.al. 2504.14075 link
2025-04-18 Zebrafish Counting Using Event Stream Data Qianghua Chen et.al. 2504.13692 null
2025-04-21 Circular Image Deturbulence using Quasi-conformal Geometry Chu Chen et.al. 2504.13432 null
2025-04-17 SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs Haoxuan Li et.al. 2504.13172 null
2025-04-17 Saliency-Aware Diffusion Reconstruction for Effective Invisible Watermark Removal Inzamamul Alam et.al. 2504.12809 link
2025-04-17 AdaQual-Diff: Diffusion-Based Image Restoration via Adaptive Quality Prompting Xin Su et.al. 2504.12605 null
2025-04-16 Towards Realistic Low-Light Image Enhancement via ISP Driven Data Modeling Zhihua Wang et.al. 2504.12204 link
2025-04-16 Deep Generative Models for Bayesian Inference on High-Rate Sensor Data: Applications in Automotive Radar and Medical Imaging Tristan S. W. Stevens et.al. 2504.12154 null
2025-04-16 Generalized Visual Relation Detection with Diffusion Models Kaifeng Gao et.al. 2504.12100 null
2025-04-16 R-Meshfusion: Reinforcement Learning Powered Sparse-View Mesh Reconstruction with Diffusion Priors Haoyang Wang et.al. 2504.11946 null
2025-04-16 Learning Physics-Informed Color-Aware Transforms for Low-Light Image Enhancement Xingxing Yang et.al. 2504.11896 null
2025-04-16 HyperKING: Quantum-Classical Generative Adversarial Networks for Hyperspectral Image Restoration Chia-Hsiang Lin et.al. 2504.11782 null
2025-04-15 Efficient Medical Image Restoration via Reliability Guided Learning in Frequency Domain Pengcheng Zheng et.al. 2504.11286 null
2025-04-15 Enhanced Small Target Detection via Multi-Modal Fusion and Attention Mechanisms: A YOLOv5 Approach Xiaoxiao Ma et.al. 2504.11262 null
2025-04-15 Visual Re-Ranking with Non-Visual Side Information Gustav Hanning et.al. 2504.11134 link
2025-04-15 UKDM: Underwater keypoint detection and matching using underwater image enhancement techniques Pedro Diaz-Garcia et.al. 2504.11063 null
2025-04-15 TMCIR: Token Merge Benefits Composed Image Retrieval Chaoyang Wang et.al. 2504.10995 null
2025-04-15 AgentPolyp: Accurate Polyp Segmentation via Image Enhancement Agent Pu Wang et.al. 2504.10978 null
2025-04-15 An Efficient and Mixed Heterogeneous Model for Image Restoration Yubin Gu et.al. 2504.10967 link
2025-04-15 DAAF:Degradation-Aware Adaptive Fusion Framework for Robust Infrared and Visible Images Fusion Tianpei Zhang et.al. 2504.10871 null
2025-04-14 PG-DPIR: An efficient plug-and-play method for high-count Poisson-Gaussian inverse problems Maud Biquard et.al. 2504.10375 null
2025-04-14 Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis Kaiwen Zheng et.al. 2504.10351 null
2025-04-14 VibrantLeaves: A principled parametric image generator for training deep restoration models Raphael Achddou et.al. 2504.10201 link
2025-04-14 Learning to Harmonize Cross-vendor X-ray Images by Non-linear Image Dynamics Correction Yucheng Lu et.al. 2504.10080 null
2025-04-14 Progressive Transfer Learning for Multi-Pass Fundus Image Restoration Uyen Phan et.al. 2504.10025 null
2025-04-14 Beyond Degradation Redundancy: Contrastive Prompt Learning for All-in-One Image Restoration Gang Wu et.al. 2504.09973 link
2025-04-14 Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition Changwei Wang et.al. 2504.09881 link
2025-04-13 Computationally iterative methods for salt-and-pepper denoising Jianwei Ke et.al. 2504.09408 null
2025-04-13 Low-Light Image Enhancement using Event-Based Illumination Estimation Lei Sun et.al. 2504.09379 null
2025-04-12 Beyond Degradation Conditions: All-in-One Image Restoration via HOG Transformers Jiawei Wu et.al. 2504.09377 link
2025-04-11 Hypergraph Vision Transformers: Images are More than Nodes, More than Edges Joshua Fixelle et.al. 2504.08710 null
2025-04-11 ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration Yongsheng Yu et.al. 2504.08591 null
2025-04-11 FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations Cheng-Yu Hsieh et.al. 2504.08368 null
2025-04-11 DreamFuse: Adaptive Image Fusion with Diffusion Transformer Junjia Huang et.al. 2504.08291 null
2025-04-11 VL-UR: Vision-Language-guided Universal Restoration of Images Degraded by Adverse Weather Conditions Ziyan Liu et.al. 2504.08219 null
2025-04-10 Nonlocal Retinex-Based Variational Model and its Deep Unfolding Twin for Low-Light Image Enhancement Daniel Torres et.al. 2504.07810 null
2025-04-10 Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval Zehong Ma et.al. 2504.07718 null
2025-04-10 Multi-Modal Data Fusion for Moisture Content Prediction in Apple Drying Shichen Li et.al. 2504.07465 null
2025-04-10 Synthetic CT Generation from Time-of-Flight Non-Attenutaion-Corrected PET for Whole-Body PET Attenuation Correction Weijie Chen et.al. 2504.07450 null
2025-04-09 Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model Yingjie Zhou et.al. 2504.07148 null
2025-04-09 Distilling Textual Priors from LLM to Efficient Image Fusion Ran Zhang et.al. 2504.07029 link
2025-04-09 Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception Ruotian Peng et.al. 2504.06666 null
2025-04-09 Rethinking LayerNorm in Image Restoration Transformers MinKyu Lee et.al. 2504.06629 null
2025-04-08 AstroClearNet: Deep image prior for multi-frame astronomical image restoration Yashil Sukurdeep et.al. 2504.06463 null
2025-04-09 Robust Fusion Controller: Degradation-aware Image Fusion with Fine-grained Language Instructions Hao Zhang et.al. 2504.05795 null
2025-04-07 Balancing Task-invariant Interaction and Task-specific Adaptation for Unified Image Fusion Xingyu Hu et.al. 2504.05164 null
2025-04-07 DA2Diff: Exploring Degradation-aware Adaptive Diffusion Priors for All-in-One Weather Restoration Jiamei Xiong et.al. 2504.05135 null
2025-04-08 Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision Yuandong Pu et.al. 2504.04903 null
2025-04-07 Content-Aware Transformer for All-in-one Image Restoration Gang Wu et.al. 2504.04869 link
2025-04-07 Inland Waterway Object Detection in Multi-environment: Dataset and Approach Shanshan Wang et.al. 2504.04835 null
2025-04-06 NCL-CIR: Noise-aware Contrastive Learning for Composed Image Retrieval Peng Gao et.al. 2504.04339 null
2025-04-05 JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration Yunlong Lin et.al. 2504.04158 null
2025-04-04 Multimodal Diffusion Bridge with Attention-Based SAR Fusion for Satellite Image Cloud Removal Yuyang Hu et.al. 2504.03607 null
2025-04-04 REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval Shabnam Choudhury et.al. 2504.03169 null
2025-04-04 Finding the Reflection Point: Unpadding Images to Remove Data Augmentation Artifacts in Large Open Source Image Datasets for Machine Learning Lucas Choi et.al. 2504.03168 null
2025-04-03 RoSMM: A Robust and Secure Multi-Modal Watermarking Framework for Diffusion Models ZhongLi Fang et.al. 2504.02640 null
2025-04-03 Noise Calibration and Spatial-Frequency Interactive Network for STEM Image Enhancement Hesong Li et.al. 2504.02555 link
2025-04-03 HPGN: Hybrid Priors-Guided Network for Compressed Low-Light Image Enhancement Hantang Li et.al. 2504.02373 null
2025-04-03 Brightness Perceiving for Recursive Low-Light Image Enhancement Haodian Wang et.al. 2504.02362 link
2025-04-03 SemiISP/SemiIE: Semi-Supervised Image Signal Processor and Image Enhancement Leveraging One-to-Many Mapping sRGB-to-RAW Masakazu Yoshimura et.al. 2504.02345 null
2025-04-02 Bridge the Gap between SNN and ANN for Image Restoration Xin Su et.al. 2504.01755 null
2025-04-02 Prompt-Guided Attention Head Selection for Focus-Oriented Image Retrieval Yuji Nozawa et.al. 2504.01348 null
2025-04-01 IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval Bangwei Liu et.al. 2504.00954 null
2025-04-01 Scaling Prompt Instructed Zero Shot Composed Image Retrieval with Image-Only Data Yiqun Duan et.al. 2504.00812 null
2025-04-01 Deconver: A Deconvolutional Network for Medical Image Segmentation Pooya Ashtari et.al. 2504.00302 link
2025-03-31 InstructRestore: Region-Customized Image Restoration with Human Instructions Shuaizheng Liu et.al. 2503.24357 link
2025-03-31 CIBR: Cross-modal Information Bottleneck Regularization for Robust CLIP Generalization Yingrui Ji et.al. 2503.24182 null
2025-03-31 3D Dental Model Segmentation with Geometrical Boundary Preserving Shufan Xi et.al. 2503.23702 link
2025-03-30 Multiview Image-Based Localization Cameron Fiore et.al. 2503.23577 null
2025-03-30 ControlFusion: A Controllable Image Fusion Framework with Language-Vision Degradation Prompts Linfeng Tang et.al. 2503.23356 null
2025-03-30 DSPFusion: Image Fusion via Degradation and Semantic Dual-Prior Guidance Linfeng Tang et.al. 2503.23355 null
2025-03-29 A GAN-Enhanced Deep Learning Framework for Rooftop Detection from Historical Aerial Imagery Pengyu Chen et.al. 2503.23200 null
2025-03-29 indiSplit: Bringing Severity Cognizance to Image Decomposition in Fluorescence Microscopy Ashesh Ashesh et.al. 2503.22983 null
2025-03-28 RELD: Regularization by Latent Diffusion Models for Image Restoration Pasquale Cascarano et.al. 2503.22563 null
2025-03-27 Q-MambaIR: Accurate Quantized Mamba for Efficient Image Restoration Yujie Chen et.al. 2503.21970 null
2025-03-27 LOCORE: Image Re-ranking with Long-Context Sequence Modeling Zilin Xiao et.al. 2503.21772 link
2025-03-27 Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck Adrian Bulat et.al. 2503.21757 null
2025-03-27 Invert2Restore: Zero-Shot Degradation-Blind Image Restoration Hamadi Chihaoui et.al. 2503.21486 null
2025-03-27 Diffusion Image Prior Hamadi Chihaoui et.al. 2503.21410 null
2025-03-27 FineCIR: Explicit Parsing of Fine-Grained Modification Semantics for Composed Image Retrieval Zixu Li et.al. 2503.21309 link
2025-03-27 Clean Image May be Dangerous: Data Poisoning Attacks Against Deep Hashing Shuai Li et.al. 2503.21236 null
2025-03-26 Underwater Image Enhancement by Convolutional Spiking Neural Networks Vidya Sudevan et.al. 2503.20485 link
2025-03-26 Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration Shihao Zhou et.al. 2503.20174 null
2025-03-25 CoLLM: A Large Language Model for Composed Image Retrieval Chuong Huynh et.al. 2503.19910 link
2025-03-25 LENVIZ: A High-Resolution Low-Exposure Night Vision Benchmark Dataset Manjushree Aithal et.al. 2503.19804 null
2025-03-25 Scene-agnostic Pose Regression for Visual Localization Junwei Zheng et.al. 2503.19543 null
2025-03-25 From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting Zhiwei Huang et.al. 2503.19358 null
2025-03-25 Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval Haoqiang Lin et.al. 2503.19296 link
2025-03-24 LLGS: Unsupervised Gaussian Splatting for Image Enhancement and Reconstruction in Pure Dark Environment Haoran Wang et.al. 2503.18640 null
2025-03-24 OCCO: LVM-guided Infrared and Visible Image Fusion Framework based on Object-aware and Contextual COntrastive Learning Hui Li et.al. 2503.18635 null
2025-03-24 Dig2DIG: Dig into Diffusion Information Gains for Image Fusion Bing Cao et.al. 2503.18627 null
2025-03-24 Exploring State Space Model in Wavelet Domain: An Infrared and Visible Image Fusion Network via Wavelet Transform and State Space Model Tianpei Zhang et.al. 2503.18378 null
2025-03-23 LocDiffusion: Identifying Locations on Earth by Diffusing in the Hilbert Space Zhangyu Wang et.al. 2503.18142 null
2025-03-23 Deep Learning Assisted Denoising of Experimental Micrographs Owais Ahmad et.al. 2503.17945 null
2025-03-23 Cross-Domain Underwater Image Enhancement Guided by No-Reference Image Quality Assessment: A Transfer Learning Approach Zhi Zhang et.al. 2503.17937 null
2025-03-23 Cat-AIR: Content and Task-Aware All-in-One Image Restoration Jiachen Jiang et.al. 2503.17915 null
2025-03-23 What Time Tells Us? An Explorative Study of Time Awareness Learned from Static Images Dongheng Lin et.al. 2503.17899 null
2025-03-22 good4cir: Generating Detailed Synthetic Captions for Composed Image Retrieval Pranavi Kolouju et.al. 2503.17871 null
2025-03-21 Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval Yuanmin Tang et.al. 2503.17109 link
2025-03-21 Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks Haijin Zeng et.al. 2503.16930 null
2025-03-20 Efficient Bayesian Computation Using Plug-and-Play Priors for Poisson Inverse Problems Teresa Klatzer et.al. 2503.16222 null
2025-03-20 3-D Image-to-Image Fusion in Lightsheet Microscopy by Two-Step Adversarial Network: Contribution to the FuseMyCells Challenge Marek Wodzinski et.al. 2503.16075 null
2025-03-20 PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval Qiang Zou et.al. 2503.16064 link
2025-03-20 Automating 3D Dataset Generation with Neural Radiance Fields P. Schulz et.al. 2503.15997 link
2025-03-20 DIPLI: Deep Image Prior Lucky Imaging for Blind Astronomical Image Restoration Suraj Singh et.al. 2503.15984 null
2025-03-21 UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations Debabrata Mandal et.al. 2503.15868 null
2025-03-19 Image Restoration Models with Optimal Transport and Total Variation Regularization Weijia Huang et.al. 2503.14947 null
2025-03-19 MMAIF: Multi-task and Multi-degradation All-in-One for Image Fusion with Language Guidance Zihan Cao et.al. 2503.14944 null
2025-03-19 Degradation Alchemy: Self-Supervised Unknown-to-Known Transformation for Blind Hyperspectral Image Fusion He Huang et.al. 2503.14892 null
2025-03-18 Revisiting Image Fusion for Multi-Illuminant White-Balance Correction David Serrano-Lozano et.al. 2503.14774 null
2025-03-18 SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model Yucheng Mao et.al. 2503.14463 null
2025-03-18 AI-Driven Diabetic Retinopathy Diagnosis Enhancement through Image Processing and Salp Swarm Algorithm-Optimized Ensemble Network Saif Ur Rehman Khan et.al. 2503.14209 null
2025-03-18 Towards properties of adversarial image perturbations Egor Kuznetsov et.al. 2503.14111 null
2025-03-18 Intra and Inter Parser-Prompted Transformers for Effective Image Restoration Cong Wang et.al. 2503.14037 link
2025-03-17 Scale Efficient Training for Large Datasets Qing Zhou et.al. 2503.13385 link
2025-03-17 From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspective Chen Zhao et.al. 2503.13165 null
2025-03-17 All You Need to Know About Training Image Retrieval Models Gabriele Berton et.al. 2503.13045 link
2025-03-17 Decouple to Reconstruct: High Quality UHD Restoration via Active Feature Disentanglement and Reversible Fusion Yidi Liu et.al. 2503.12764 null
2025-03-16 DPF-Net: Physical Imaging Model Embedded Data-Driven Underwater Image Enhancement Han Mei et.al. 2503.12470 link
2025-03-16 Pathology Image Restoration via Mixture of Prompts Jiangdong Cai et.al. 2503.12399 link
2025-03-14 Advancements in Real-Time Oncology Diagnosis: Harnessing AI and Image Fusion Techniques Leila Bagheriye et.al. 2503.11332 null
2025-03-14 Breaking Shallow Limits: Task-Driven Pixel Fusion for Gap-free RGBT Tracking Andong Lu et.al. 2503.11247 null
2025-03-14 Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption Du Chen et.al. 2503.11221 null
2025-03-14 InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences Hongkai Zheng et.al. 2503.11043 null
2025-03-13 ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning Pengfei Luo et.al. 2503.10166 link
2025-03-13 Hybrid Agents for Image Restoration Bingchen Li et.al. 2503.10120 null
2025-03-13 Dream-IF: Dynamic Relative EnhAnceMent for Image Fusion Xingxin Xu et.al. 2503.10109 null
2025-03-12 FDCT: Frequency-Aware Decomposition and Cross-Modal Token-Alignment for Multi-Sensor Target Classification Shoaib Meraj Sami et.al. 2503.09873 null
2025-03-12 Multi-Agent Image Restoration Xu Jiang et.al. 2503.09403 null
2025-03-12 Revisiting Medical Image Retrieval via Knowledge Consolidation Yang Nan et.al. 2503.09370 null
2025-03-12 MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration Zhehui Wu et.al. 2503.09131 link
2025-03-12 Prompt to Restore, Restore to Prompt: Cyclic Prompting for Universal Adverse Weather Removal Rongxin Liao et.al. 2503.09013 link
2025-03-11 QUIET-SR: Quantum Image Enhancement Transformer for Single Image Super-Resolution Siddhant Dutta et.al. 2503.08759 null
2025-03-11 Language-Depth Navigated Thermal and Visible Image Fusion Jinchang Zhang et.al. 2503.08676 null
2025-03-11 PromptLNet: Region-Adaptive Aesthetic Enhancement via Prompt Guidance in Low-Light Enhancement Net Jun Yin et.al. 2503.08276 null
2025-03-11 TSCnet: A Text-driven Semantic-level Controllable Framework for Customized Low-Light Image Enhancement Miao Zhang et.al. 2503.08168 null
2025-03-11 Few-Shot Class-Incremental Model Attribution Using Learnable Representation From CLIP-ViT Features Hanbyul Lee et.al. 2503.08148 null
2025-03-11 Deep Perceptual Enhancement for Medical Image Analysis S M A Sharif et.al. 2503.08027 link
2025-03-10 GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts Minwen Liao et.al. 2503.07417 null
2025-03-10 Retinex-MEF: Retinex-based Glare Effects Aware Unsupervised Multi-Exposure Image Fusion Haowen Bai et.al. 2503.07235 null
2025-03-11 Boosting Diffusion-Based Text Image Super-Resolution Model Towards Generalized Real-World Scenarios Chenglu Pan et.al. 2503.07232 null
2025-03-10 Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization Michael Green et.al. 2503.07038 null
2025-03-10 Zero-Shot Hashing Based on Reconstruction With Part Alignment Yan Jiang et.al. 2503.07037 null
2025-03-10 Learning a Unified Degradation-aware Representation Model for Multi-modal Image Fusion Haolong Ma et.al. 2503.07033 null
2025-03-10 MERLION: Marine ExploRation with Language guIded Online iNformative Visual Sampling and Enhancement Shrutika Vishal Thengane et.al. 2503.06953 link
2025-03-09 RoboDesign1M: A Large-scale Dataset for Robot Design Understanding Tri Le et.al. 2503.06796 null
2025-03-09 StructVPR++: Distill Structural and Semantic Knowledge with Weighting Samples for Visual Place Recognition Yanqing Shen et.al. 2503.06601 link
2025-03-07 Data-Efficient Generalization for Zero-shot Composed Image Retrieval Zining Chen et.al. 2503.05204 null
2025-03-06 RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining Tengfei Zhang et.al. 2503.04653 null
2025-03-06 Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior Haitao Wu et.al. 2503.04207 link
2025-03-05 An Adaptive Underwater Image Enhancement Framework via Multi-Domain Fusion and Color Compensation Yuezhe Tian et.al. 2503.03640 null
2025-03-05 Mineral segmentation using electron microscope images and spectral sampling through multimodal graph neural networks Samuel Repka et.al. 2503.03507 null
2025-03-05 Two-Stream Thermal Imaging Fusion for Enhanced Time of Birth Detection in Neonatal Care Jorge García-Torres et.al. 2503.03244 null
2025-03-03 Hyperspectral Image Restoration and Super-resolution with Physics-Aware Deep Learning for Biomedical Applications Yuchen Xiang et.al. 2503.02908 null
2025-03-04 ERetinex: Event Camera Meets Retinex Theory for Low-Light Image Enhancement Xuejian Guo et.al. 2503.02484 link
2025-03-04 Semantic Prior Distillation with Vision Foundation Model for Enhanced Rapid Bone Scintigraphy Image Restoration Pengchen Liang et.al. 2503.02321 null
2025-03-03 MRI super-resolution reconstruction using efficient diffusion probabilistic model with residual shifting Mojtaba Safari et.al. 2503.01576 link
2025-03-03 Wavelet-Enhanced Desnowing: A Novel Single Image Restoration Approach for Traffic Surveillance under Adverse Weather Conditions Zihan Shen et.al. 2503.01339 null
2025-03-03 Composed Multi-modal Retrieval: A Survey of Approaches and Applications Kun Zhang et.al. 2503.01334 link
2025-03-03 Reconciling Stochastic and Deterministic Strategies for Zero-shot Image Restoration using Diffusion Model in Dual Chong Wang et.al. 2503.01288 link
2025-03-03 Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond Guanyao Wu et.al. 2503.01210 null
2025-03-02 Explainable Classifier for Malignant Lymphoma Subtyping via Cell Graph and Image Fusion Daiki Nishiyama et.al. 2503.00925 null
2025-03-01 Self-supervision via Controlled Transformation and Unpaired Self-conditioning for Low-light Image Enhancement Aupendu Kar et.al. 2503.00642 link
2025-03-01 Class-Independent Increment: An Efficient Approach for Multi-label Class-Incremental Learning Songlin Dong et.al. 2503.00515 null
2025-02-28 SEE: See Everything Every Time -- Adaptive Brightness Adjustment for Broad Light Range Images via Events Yunfan Lu et.al. 2502.21120 null
2025-02-28 CoTMR: Chain-of-Thought Multi-Scale Reasoning for Training-Free Zero-Shot Composed Image Retrieval Zelong Sun et.al. 2502.20826 null
2025-02-28 Diffusion Restoration Adapter for Real-World Image Restoration Hanbang Liang et.al. 2502.20679 null
2025-02-28 HVI: A New Color Space for Low-light Image Enhancement Qingsen Yan et.al. 2502.20272 link
2025-02-27 Night-Voyager: Consistent and Efficient Nocturnal Vision-Aided State Estimation in Object Maps Tianxiao Gao et.al. 2502.20054 null
2025-02-27 Striving for Faster and Better: A One-Layer Architecture with Auto Re-parameterization for Low-Light Image Enhancement Nan An et.al. 2502.19867 null
2025-02-27 One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion Chunyang Cheng et.al. 2502.19854 link
2025-02-26 ILACS-LGOT: A Multi-Layer Contrast Enhancement Approach for Palm-Vein Images Kaveen Perera et.al. 2502.19456 null
2025-02-27 On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation Ruben T. Lucassen et.al. 2502.19285 null
2025-02-26 Self-supervised conformal prediction for uncertainty quantification in Poisson imaging problems Bernardin Tamo Amougou et.al. 2502.19194 null
2025-02-26 Multi-level Attention-guided Graph Neural Network for Image Restoration Jiatao Jiang et.al. 2502.19181 null
2025-02-27 RetinaRegen: A Hybrid Model for Readability and Detail Restoration in Fundus Images Yuhan Tang et.al. 2502.19153 null
2025-02-26 Dynamic Degradation Decomposition Network for All-in-One Image Restoration Huiqiang Wang et.al. 2502.19068 null
2025-02-25 Spatial Analysis of Neuromuscular Junctions Activation in Three-Dimensional Histology-based Muscle Reconstructions Alessandro Ascani Orsini et.al. 2502.18646 link
2025-02-24 Splitting Regularized Wasserstein Proximal Algorithms for Nonsmooth Sampling Problems Fuqun Han et.al. 2502.16773 link
2025-02-23 Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries Yin Wu et.al. 2502.16636 link
2025-02-21 Improved Partial Differential Equation and Fast Approximation Algorithm for Hazy/Underwater/Dust Storm Image Enhancement Uche A. Nnolim et.al. 2502.15986 null
2025-02-21 ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval Guanqi Zhan et.al. 2502.15682 null
2025-02-21 LUMINA-Net: Low-light Upgrade through Multi-stage Illumination and Noise Adaptation Network for Image Enhancement Namrah Siddiqua et.al. 2502.15186 null
2025-02-21 Optimized Pap Smear Image Enhancement: Hybrid PMD Filter-CLAHE Using Spider Monkey Optimization Ach Khozaimi et.al. 2502.15156 null
2025-02-20 Reinforcement Learning for Ultrasound Image Analysis A Comprehensive Review of Advances and Applications Maha Ezzelarab et.al. 2502.14995 null
2025-02-20 CrossFuse: Learning Infrared and Visible Image Fusion by Cross-Sensor Top-K Vision Alignment and Beyond Yukai Shi et.al. 2502.14493 null
2025-02-20 EyeBench: A Call for More Rigorous Evaluation of Retinal Image Enhancement Wenhui Zhu et.al. 2502.14260 null
2025-02-19 RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior Ching-Hua Lee et.al. 2502.13574 null
2025-02-18 Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization Shuo Xing et.al. 2502.13146 link
2025-02-18 Local Flaw Detection with Adaptive Pyramid Image Fusion Across Spatial Sampling Resolution for SWRs Siyu You et.al. 2502.12512 null
2025-02-17 Descriminative-Generative Custom Tokens for Vision-Language Models Pramuditha Perera et.al. 2502.12095 null
2025-02-17 ILIAS: Instance-Level Image retrieval At Scale Giorgos Kordopatis-Zilos et.al. 2502.11748 null
2025-02-17 Adversarially Robust CLIP Models Can Induce Better (Robust) Perceptual Metrics Francesco Croce et.al. 2502.11725 link
2025-02-17 Precise GPS-Denied UAV Self-Positioning via Context-Enhanced Cross-View Geo-Localization Yuanze Xu et.al. 2502.11408 null
2025-02-12 E2LVLM:Evidence-Enhanced Large Vision-Language Model for Multimodal Out-of-Context Misinformation Detection Junjie Wu et.al. 2502.10455 null
2025-02-19 Compression-Aware One-Step Diffusion Model for JPEG Artifact Removal Jinpei Guo et.al. 2502.09873 link
2025-02-13 Source function from two-particle correlation function through entropy-regularized Richardson-Lucy deblurring C. K. Tam et.al. 2502.09478 null
2025-02-13 ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation Rotem Shalev-Arkushin et.al. 2502.09411 null
2025-02-12 Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions Prajwal Gatti et.al. 2502.08438 null
2025-02-13 MRS: A Fast Sampler for Mean Reverting Diffusion based on ODE and SDE Solvers Ao Li et.al. 2502.07856 null
2025-02-11 Captured by Captions: On Memorization and its Mitigation in CLIP Models Wenhao Wang et.al. 2502.07830 null
2025-02-11 Multi-Task-oriented Nighttime Haze Imaging Enhancer for Vision-driven Measurement Systems Ai Chen et.al. 2502.07351 link
2025-02-11 Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos Haowen Gao et.al. 2502.07327 null
2025-02-11 PDV: Prompt Directional Vectors for Zero-shot Composed Image Retrieval Osman Tursun et.al. 2502.07215 null
2025-02-10 AstroLoc: Robust Space to Ground Image Localizer Gabriele Berton et.al. 2502.07003 null
2025-02-10 UniDemoiré: Towards Universal Image Demoiréing with Data Generation and Synthesis Zemin Yang et.al. 2502.06324 null
2025-02-09 A Comprehensive Survey on Image Signal Processing Approaches for Low-Illumination Image Enhancement Muhammad Turab et.al. 2502.05995 null
2025-02-09 Uni-Retrieval: A Multi-Style Retrieval Framework for STEM's Education Yanhao Jia et.al. 2502.05863 null
2025-02-11 UniDB: A Unified Diffusion Bridge Framework via Stochastic Optimal Control Kaizhen Zhu et.al. 2502.05749 link
2025-02-07 Self-supervised Conformal Prediction for Uncertainty Quantification in Imaging Problems Jasper M. Everink et.al. 2502.05127 null
2025-02-07 Performance Evaluation of Image Enhancement Techniques on Transfer Learning for Touchless Fingerprint Recognition S Sreehari et.al. 2502.04680 null
2025-02-07 HetSSNet: Spatial-Spectral Heterogeneous Graph Learning Network for Panchromatic and Multispectral Images Fusion Mengting Ma et.al. 2502.04623 null
2025-02-06 Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion Marco Mistretta et.al. 2502.04263 link
2025-02-05 All-in-One Image Compression and Restoration Huimin Zeng et.al. 2502.03649 link
2025-02-05 Efficient Image Restoration via Latent Consistency Flow Matching Elad Cohen et.al. 2502.03500 null
2025-02-05 Human-Aligned Image Models Improve Visual Decoding from the Brain Nona Rajabi et.al. 2502.03081 null
2025-02-04 Blind Visible Watermark Removal with Morphological Dilation Preston K. Robinette et.al. 2502.02676 null
2025-02-04 MATCNN: Infrared and Visible Image Fusion Method Based on Multi-scale CNN with Attention Transformer Jingjing Liu et.al. 2502.01959 link
2025-02-03 Deep Unfolding Multi-modal Image Fusion Network via Attribution Analysis Haowen Bai et.al. 2502.01467 null
2025-02-03 Human Body Restoration with One-Step Diffusion Model and A New Benchmark Jue Gong et.al. 2502.01411 null
2025-02-03 ConceptVAE: Self-Supervised Fine-Grained Concept Disentanglement from 2D Echocardiographies Costin F. Ciusdel et.al. 2502.01335 null
2025-02-04 Compressed Image Generation with Denoising Diffusion Codebook Models Guy Ohayon et.al. 2502.01189 null
2025-02-01 A framework for river connectivity classification using temporal image processing and attention based neural networks Timothy James Becker et.al. 2502.00474 null
2025-02-01 Shape from Semantics: 3D Shape Generation from Multi-View Semantics Liangchen Li et.al. 2502.00360 null
2025-01-31 Deep Ensembling with Multimodal Image Fusion for Efficient Classification of Lung Cancer Surochita Pal et.al. 2502.00078 null
2025-01-30 Integrating Spatial and Frequency Information for Under-Display Camera Image Restoration Kyusu Ahn et.al. 2501.18517 null
2025-01-31 MatIR: A Hybrid Mamba-Transformer Image Restoration Model Juan Wen et.al. 2501.18401 link
2025-01-30 Arbitrary Data as Images: Fusion of Patient Data Across Modalities and Irregular Intervals with Vision Transformers Malte Tölle et.al. 2501.18237 null
2025-01-29 Segmentation-Aware Generative Reinforcement Network (GRN) for Tissue Layer Segmentation in 3-D Ultrasound Images for Chronic Low-back Pain (cLBP) Assessment Zixue Zeng et.al. 2501.17690 link
2025-01-28 Text-to-Image Generation for Vocabulary Learning Using the Keyword Method Nuwan T. Attygalle et.al. 2501.17099 null
2025-01-27 Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration Long Peng et.al. 2501.16583 null
2025-01-27 UDBE: Unsupervised Diffusion-based Brightness Enhancement in Underwater Images Tatiana Taís Schein et.al. 2501.16211 link
2025-01-27 Freestyle Sketch-in-the-Loop Image Segmentation Subhadeep Koley et.al. 2501.16022 null
2025-01-27 CausalSR: Structural Causal Model-Driven Super-Resolution with Counterfactual Inference Zhengyang Lu et.al. 2501.15852 link
2025-01-26 Universal Image Restoration Pre-training via Degradation Classification JiaKui Hu et.al. 2501.15510 link
2025-01-26 Zero-Shot Interactive Text-to-Image Retrieval via Diffusion-Augmented Representations Zijun Long et.al. 2501.15379 null
2025-01-24 Enhanced Confocal Laser Scanning Microscopy with Adaptive Physics Informed Deep Autoencoders Zaheer Ahmad et.al. 2501.14709 null
2025-01-24 Bayesian Neural Networks for One-to-Many Mapping in Image Enhancement Guoxi Huang et.al. 2501.14265 link
2025-01-24 CDI: Blind Image Restoration Fidelity Evaluation based on Consistency with Degraded Image Xiaojun Tang et.al. 2501.14264 null
2025-01-23 Revisiting CLIP: Efficient Alignment of 3D MRI and Tabular Data using Domain-Specific Foundation Models Jakob Krogh Petersen et.al. 2501.14051 link
2025-01-23 INDIGO+: A Unified INN-Guided Probabilistic Diffusion Algorithm for Blind and Non-Blind Image Restoration Di You et.al. 2501.14014 null
2025-01-23 Binary Diffusion Probabilistic Model Vitaliy Kinakh et.al. 2501.13915 null
2025-01-23 Where Do You Go? Pedestrian Trajectory Prediction using Scene Features Mohammad Ali Rezaei et.al. 2501.13848 null
2025-01-22 UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior I-Hsiang Chen et.al. 2501.13134 null
2025-01-22 Deep Learning-Based Image Recovery and Pose Estimation for Resident Space Objects Louis Aberdeen et.al. 2501.13009 null
2025-01-22 UniUIR: Considering Underwater Image Restoration as An All-in-One Learner Xu Zhang et.al. 2501.12981 null
2025-01-22 FDG-Diff: Frequency-Domain-Guided Diffusion Framework for Compressed Hazy Image Restoration Ruicheng Zhang et.al. 2501.12832 link
2025-01-21 Quality Enhancement of Radiographic X-ray Images by Interpretable Mapping Hongxu Yang et.al. 2501.12245 null
2025-01-21 DLEN: Dual Branch of Transformer for Low-Light Image Enhancement in Dual Domains Junyu Xia et.al. 2501.12235 null
2025-01-21 Proxies for Distortion and Consistency with Applications for Real-World Image Restoration Sean Man et.al. 2501.12102 null
2025-01-20 SILO: Solving Inverse Problems with Latent Operators Ron Raphaeli et.al. 2501.11746 null
2025-01-19 Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection Zhipeng Yu et.al. 2501.11063 link
2025-01-19 Rethinking Early-Fusion Strategies for Improved Multimodal Image Segmentation Zhengwen Shen et.al. 2501.10958 null
2025-01-18 Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption Jinyuan Liu et.al. 2501.10761 link
2025-01-18 A Resource-Efficient Training Framework for Remote Sensing Text--Image Retrieval Weihang Zhang et.al. 2501.10638 null
2025-01-17 DiffStereo: High-Frequency Aware Diffusion Model for Stereo Image Restoration Huiyun Cao et.al. 2501.10325 null
2025-01-16 FLOL: Fast Baselines for Real-World Low-Light Enhancement Juan C. Benito et.al. 2501.09718 link
2025-01-16 Soft Knowledge Distillation with Multi-Dimensional Cross-Net Attention for Image Restoration Models Compression Yongheng Zhang et.al. 2501.09321 null
2025-01-16 Knowledge Distillation for Image Restoration : Simultaneous Learning from Degraded and Clean Images Yongheng Zhang et.al. 2501.09268 null
2025-01-15 Vision Foundation Models for Computed Tomography Suraj Pai et.al. 2501.09001 link
2025-01-12 SCOT: Self-Supervised Contrastive Pretraining For Zero-Shot Compositional Retrieval Bhavin Jawade et.al. 2501.08347 null
2025-01-14 AI Driven Water Segmentation with deep learning models for Enhanced Flood Monitoring Sanjida Afrin Mou et.al. 2501.08266 link
2025-01-13 Depth and Image Fusion for Road Obstacle Detection Using Stereo Camera Oleg Perezyabov et.al. 2501.07245 null
2025-01-12 Static Segmentation by Tracking: A Frustratingly Label-Efficient Approach to Fine-Grained Segmentation Zhenyang Feng et.al. 2501.06749 null
2025-01-11 Natural Language Supervision for Low-light Image Enhancement Jiahui Tang et.al. 2501.06546 null
2025-01-10 Underwater Image Enhancement using Generative Adversarial Networks: A Survey Kancharagunta Kishan Babu et.al. 2501.06273 null
2025-01-09 HipyrNet: Hypernet-Guided Feature Pyramid network for mixed-exposure correction Shaurya Singh Rathore et.al. 2501.05195 null
2025-01-09 ResPanDiff: Diffusion Model with Disentangled Modulations for Image Fusion Shiqi Cao et.al. 2501.05091 null
2025-01-09 IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation Qi Chen et.al. 2501.04995 link
2025-01-08 Color Correction Meets Cross-Spectral Refinement: A Distribution-Aware Diffusion for Underwater Image Restoration Laibin Chang et.al. 2501.04740 null
2025-01-14 HyFusion: Enhanced Reception Field Transformer for Hyperspectral Image Fusion Chia-Ming Lee et.al. 2501.04665 null
2025-01-08 FrontierNet: Learning Visual Cues to Explore Boyang Sun et.al. 2501.04597 link
2025-01-08 MB-TaylorFormer V2: Improved Multi-branch Linear Transformer Expanded by Taylor Formula for Image Restoration Zhi Jin et.al. 2501.04486 link
2025-01-08 Recognition-Oriented Low-Light Image Enhancement based on Global and Pixelwise Optimization Seitaro Ono et.al. 2501.04210 null
2025-01-07 Fixed Points of Deep Neural Networks: Emergence, Stability, and Applications L. Berlyand et.al. 2501.04182 null
2025-01-07 Convergent Primal-Dual Plug-and-Play Image Restoration: A General Algorithm and Applications Yodai Suzuki et.al. 2501.03780 link
2025-01-06 ImageMM: Joint multi-frame image restoration and super-resolution Yashil Sukurdeep et.al. 2501.03002 null
2025-01-06 Integrating Language-Image Prior into EEG Decoding for Cross-Task Zero-Calibration RSVP-BCI Xujin Li et.al. 2501.02841 null
2025-01-06 Underwater Image Restoration Through a Prior Guided Hybrid Sense Approach and Extensive Benchmark Analysis Xiaojiao Guo et.al. 2501.02701 link
2025-01-03 iCBIR-Sli: Interpretable Content-Based Image Retrieval with 2D Slice Embeddings Shuhei Tomoshige et.al. 2501.01642 null
2025-01-02 Domain-invariant feature learning in brain MR imaging for content-based image retrieval Shuya Tobari et.al. 2501.01326 null
2025-01-03 Conditional Consistency Guided Image Translation and Enhancement Amil Bhagat et.al. 2501.01223 link
2025-01-02 Generalized Task-Driven Medical Image Quality Enhancement with Gradient Promotion Dong Zhang et.al. 2501.01114 null
2024-12-30 Text-to-Image GAN with Pretrained Representations Xiaozhou You et.al. 2501.00116 null
2024-12-30 Varformer: Adapting VAR's Generative Prior for Image Restoration Siyang Wang et.al. 2412.21063 link
2024-12-30 Low-Light Image Enhancement via Generative Perceptual Priors Han Zhou et.al. 2412.20916 link
2024-12-29 Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond) Tomer Garber et.al. 2412.20596 link
2024-12-28 Injecting Explainability and Lightweight Design into Weakly Supervised Video Anomaly Detection Systems Wen-Dong Jiang et.al. 2412.20201 null
2024-12-28 UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity Jingbo Lin et.al. 2412.20157 link
2024-12-28 MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration Boyun Li et.al. 2412.20066 link
2024-12-28 An Ordinary Differential Equation Sampler with Stochastic Start for Diffusion Bridge Models Yuang Wang et.al. 2412.19992 null
2024-12-27 Generative Adversarial Network on Motion-Blur Image Restoration Zhengdong Li et.al. 2412.19479 null
2024-12-25 FOR: Finetuning for Object Level Open Vocabulary Image Retrieval Hila Levi et.al. 2412.18806 null
2024-12-24 Underwater Image Restoration via Polymorphic Large Kernel CNNs Xiaojiao Guo et.al. 2412.18459 link
2024-12-24 UNet--: Memory-Efficient and Feature-Enhanced Network Architecture based on U-Net with Reduced Skip-Connections Lingxiao Yin et.al. 2412.18276 null
2024-12-24 SDM-Car: A Dataset for Small and Dim Moving Vehicles Detection in Satellite Videos Zhen Zhang et.al. 2412.18214 link
2024-12-24 ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval Le Dong et.al. 2412.18136 link
2024-12-22 Where am I? Cross-View Geo-localization with Natural Language Descriptions Junyan Ye et.al. 2412.17007 null
2024-12-21 Optoelectronic generative adversarial networks Jumin Qiu et.al. 2412.16672 link
2024-12-21 Complementary Advantages: Exploiting Cross-Field Frequency Correlation for NIR-Assisted Image Denoising Yuchen Wang et.al. 2412.16645 null
2024-12-24 Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling Daichi Yashima et.al. 2412.16576 link
2024-12-21 Rethinking Model Redundancy for Low-light Image Enhancement Tong Li et.al. 2412.16459 null
2024-12-20 SeagrassFinder: Deep Learning for Eelgrass Detection and Coverage Estimation in the Wild Jannik Elsäßer et.al. 2412.16147 null
2024-12-20 NeuroPump: Simultaneous Geometric and Color Rectification for Underwater Images Yue Guo et.al. 2412.15890 null
2024-12-20 Multi-dimensional Visual Prompt Enhanced Image Restoration via Mamba-Transformer Aggregation Aiwen Jiang et.al. 2412.15845 link
2024-12-20 A New Method to Capturing Compositional Knowledge in Linguistic Space Jiahe Wan et.al. 2412.15632 null
2024-12-20 Stabilizing Laplacian Inversion in Fokker-Planck Image Retrieval using the Transport-of-Intensity Equation Samantha J Alloo et.al. 2412.15513 null
2024-12-19 Learning Visual Composition through Improved Semantic Guidance Austin Stone et.al. 2412.15396 null
2024-12-19 Unified Image Restoration and Enhancement: Degradation Calibrated Cycle Reconstruction Diffusion Model Minglong Xue et.al. 2412.14630 link
2024-12-19 MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Junjie Zhou et.al. 2412.14475 null
2024-12-18 Personalized Generative Low-light Image Denoising and Enhancement Xijun Wang et.al. 2412.14327 null
2024-12-18 Distilled Pooling Transformer Encoder for Efficient Realistic Image Dehazing Le-Anh Tran et.al. 2412.14220 link
2024-12-18 Adversarial Hubness in Multi-Modal Retrieval Tingwei Zhang et.al. 2412.14113 link
2024-12-18 Maybe you are looking for CroQS: Cross-modal Query Suggestion for Text-to-Image Retrieval Giacomo Pacini et.al. 2412.13834 null
2024-12-18 Fed-AugMix: Balancing Privacy and Utility via Data Augmentation Haoyang Li et.al. 2412.13818 null
2024-12-18 Multi-Exposure Image Fusion via Distilled 3D LUT Grid with Editable Mode Xin Su et.al. 2412.13749 link
2024-12-18 VIIS: Visible and Infrared Information Synthesis for Severe Low-light Image Enhancement Chen Zhao et.al. 2412.13655 link
2024-12-18 DarkIR: Robust Low-Light Image Restoration Daniel Feijoo et.al. 2412.13443 link
2024-12-18 Zero-Shot Low Light Image Enhancement with Diffusion Prior Joshua Cho et.al. 2412.13401 link
2024-12-17 Consistent Diffusion: Denoising Diffusion Model with Data-Consistent Training for Image Restoration Xinlong Cheng et.al. 2412.12550 null
2024-12-17 Three Things to Know about Deep Metric Learning Yash Patel et.al. 2412.12432 null
2024-12-16 Expanded Comprehensive Robotic Cholecystectomy Dataset (CRCD) Ki-Hwan Oh et.al. 2412.12238 link
2024-12-16 Ultra-High-Definition Dynamic Multi-Exposure Image Fusion via Infinite Pixel Learning Xingchi Chen et.al. 2412.11685 null
2024-12-16 CLIP-SR: Collaborative Linguistic and Image Processing for Super-Resolution Bingwen Hu et.al. 2412.11609 null
2024-12-15 Leveraging Large Vision-Language Model as User Intent-aware Encoder for Composed Image Retrieval Zelong Sun et.al. 2412.11087 null
2024-12-15 Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval Yuanmin Tang et.al. 2412.11077 link
2024-12-15 Towards Context-aware Convolutional Network for Image Restoration Fangwei Hao et.al. 2412.11008 null
2024-12-14 Boosting ViT-based MRI Reconstruction from the Perspectives of Frequency Modulation, Spatial Purification, and Scale Diversification Yucong Meng et.al. 2412.10776 null
2024-12-16 Matrix Completion via Residual Spectral Matching Ziyuan Chen et.al. 2412.10005 null
2024-12-13 $\textrm{A}^{\textrm{2}}$ RNet: Adversarial Attack Resilient Network for Robust Infrared and Visible Image Fusion Jiawei Li et.al. 2412.09954 link
2024-12-12 OFTSR: One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs Yuanzhi Zhu et.al. 2412.09465 link
2024-12-13 Are Conditional Latent Diffusion Models Effective for Image Restoration? Yunchen Yuan et.al. 2412.09324 null
2024-12-13 MVC-VPR: Mutual Learning of Viewpoint Classification and Visual Place Recognition Qiwen Gu et.al. 2412.09199 null
2024-12-12 ExpRDiff: Short-exposure Guided Diffusion Model for Realistic Local Motion Deblurring Zhongbao Yang et.al. 2412.09193 null
2024-12-12 Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration Yunshuai Zhou et.al. 2412.08939 link
2024-12-12 A Flexible Plug-and-Play Module for Generating Variable-Length Liyang He et.al. 2412.08922 link
2024-12-11 Image Retrieval Methods in the Dissimilarity Space Madhu Kiran et.al. 2412.08618 null
2024-12-11 Convergence Analysis of a Proximal Stochastic Denoising Regularization Algorithm Marien Renaud et.al. 2412.08262 null
2024-12-11 Visible and Infrared Image Fusion Using Encoder-Decoder Network Ferhat Can Ataman et.al. 2412.08073 link
2024-12-11 BSAFusion: A Bidirectional Stepwise Feature Alignment Network for Unaligned Medical Image Fusion Huafeng Li et.al. 2412.08050 link
2024-12-10 Image Retrieval with Intra-Sweep Representation Learning for Neck Ultrasound Scanning Guidance Wanwen Chen et.al. 2412.07741 null
2024-12-10 Leveraging Content and Context Cues for Low-Light Image Enhancement Igor Morawski et.al. 2412.07693 link
2024-12-10 Analytical-Heuristic Modeling and Optimization for Low-Light Image Enhancement Axel Martinez et.al. 2412.07659 null
2024-12-10 Deep Joint Unrolling for Deblurring and Low-Light Image Enhancement (JUDE).pdf Tu Vo et.al. 2412.07527 null
2024-12-10 Modeling Dual-Exposure Quad-Bayer Patterns for Joint Denoising and Deblurring Yuzhi Zhao et.al. 2412.07256 link
2024-12-10 EchoIR: Advancing Image Restoration with Echo Upsampling and Bi-Level Optimization Yuhan He et.al. 2412.07225 null
2024-12-10 A Progressive Image Restoration Network for High-order Degradation Imaging in Remote Sensing Yujie Feng et.al. 2412.07195 null
2024-12-09 InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention Howard Zhang et.al. 2412.06753 null
2024-12-09 EchoSim4D: A Proof-of-Concept Gamified XR Echocardiography Training Simulator for Neonates using 4D Ultrasound Volume Deepthy Rose Jose et.al. 2412.06271 null
2024-12-08 A Review on Multisensor Data Fusion for Wearable Health Monitoring Arlene John et.al. 2412.05895 null
2024-12-07 Compositional Image Retrieval via Instruction-Aware Contrastive Learning Wenliang Zhong et.al. 2412.05756 link
2024-12-07 Enhancing Sample Generation of Diffusion Models using Noise Level Correction Abulikemu Abuduweili et.al. 2412.05488 null
2024-12-06 Equivariant Denoisers for Image Restoration Marien Renaud et.al. 2412.05343 null
2024-12-06 ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration Chi-Wei Hsiao et.al. 2412.05043 null
2024-12-06 DEYOLO: Dual-Feature-Enhancement YOLO for Cross-Modality Object Detection Yishuo Chen et.al. 2412.04931 link
2024-12-06 DAug: Diffusion-based Channel Augmentation for Radiology Image Retrieval and Classification Ying Jin et.al. 2412.04828 null
2024-12-06 Modality Decoupling is All You Need: A Simple Solution for Unsupervised Hyperspectral Image Fusion Songcheng Du et.al. 2412.04802 link
2024-12-05 Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise Brayan Monroy et.al. 2412.04648 link
2024-12-05 MetaFormer: High-fidelity Metalens Imaging via Aberration Correcting Transformers Byeonghyeon Lee et.al. 2412.04591 null
2024-12-05 Hipandas: Hyperspectral Image Joint Denoising and Super-Resolution by Image Fusion with the Panchromatic Image Shuang Xu et.al. 2412.04201 null
2024-12-05 Deep priors for satellite image restoration with accurate uncertainties Biquard Maud et.al. 2412.04130 null
2024-12-05 Blind Underwater Image Restoration using Co-Operational Regressor Networks Ozer Can Devecioglu et.al. 2412.03995 null
2024-12-05 LL-ICM: Image Compression for Low-level Machine Vision via Large Vision-Language Model Yuan Xue et.al. 2412.03841 null
2024-12-05 Exploring Real&Synthetic Dataset and Linear Attention in Image Restoration Yuzhen Du et.al. 2412.03814 null
2024-12-04 Composed Image Retrieval for Training-Free Domain Conversion Nikos Efthymiadis et.al. 2412.03297 link
2024-12-04 Task-driven Image Fusion with Learnable Fusion Loss Haowen Bai et.al. 2412.03240 null
2024-12-04 Semantic Segmentation Prior for Diffusion-Based Real-World Super-Resolution Jiahua Xiao et.al. 2412.02960 null
2024-12-03 Active Learning via Classifier Impact and Greedy Selection for Interactive Image Retrieval Leah Bar et.al. 2412.02310 link
2024-12-03 Relaxed and Inertial Nonlinear Forward-Backward with Momentum Fernando Roldán et.al. 2412.02045 link
2024-12-02 Optimizing Domain-Specific Image Retrieval: A Benchmark of FAISS and Annoy with Fine-Tuned Features MD Shaikh Rahman et.al. 2412.01555 null
2024-12-02 Phaseformer: Phase-based Attention Mechanism for Underwater Image Restoration and Beyond MD Raqib Khan et.al. 2412.01456 link
2024-12-02 FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration Hao Li et.al. 2412.01427 null
2024-12-02 Neuron Abandoning Attention Flow: Visual Explanation of Dynamics inside CNN Models Yi Liao et.al. 2412.01202 null
2024-12-01 Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration Haoze Sun et.al. 2412.00878 null
2024-12-01 DMFourLLIE: Dual-Stage and Multi-Branch Fourier Network for Low-Light Image Enhancement Tongshun Zhang et.al. 2412.00683 link
2024-12-01 MambaNUT: Nighttime UAV Tracking via Mamba and Adaptive Curriculum Learning You Wu et.al. 2412.00626 link
2024-11-30 Blind Inverse Problem Solving Made Easy by Text-to-Image Latent Diffusion Michail Dontas et.al. 2412.00557 null
2024-11-29 Self-Supervised Denoiser Framework Emilien Valat et.al. 2411.19593 null
2024-11-27 Optimizing Image Retrieval with an Extended b-Metric Space Abdelkader Belhenniche et.al. 2411.18800 null
2024-11-27 Hierarchical Information Flow for Generalized Efficient Image Restoration Yawei Li et.al. 2411.18588 null
2024-11-27 Complexity Experts are Task-Discriminative Learners for Any Image Restoration Eduard Zamfir et.al. 2411.18466 null
2024-11-27 Adaptive Blind All-in-One Image Restoration David Serrano-Lozano et.al. 2411.18412 link
2024-11-29 HUPE: Heuristic Underwater Perceptual Enhancement with Semantic Collaborative Learning Zengxi Zhang et.al. 2411.18296 link
2024-11-27 TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution Linwei Dong et.al. 2411.18263 link
2024-12-02 Pixel-aligned RGB-NIR Stereo Imaging and Dataset for Robot Vision Jinnyeong Kim et.al. 2411.18025 null
2024-11-26 Low-rank Adaptation-based All-Weather Removal for Autonomous Navigation Sudarshan Rajagopalan et.al. 2411.17814 null
2024-11-26 GenDeg: Diffusion-Based Degradation Synthesis for Generalizable All-in-One Image Restoration Sudarshan Rajagopalan et.al. 2411.17687 null
2024-11-26 Learning Visual Hierarchies with Hyperbolic Embeddings Ziwei Wang et.al. 2411.17490 null
2024-11-26 Puzzle Similarity: A Perceptually-guided No-Reference Metric for Artifact Detection in 3D Scene Reconstructions Nicolai Hermann et.al. 2411.17489 null
2024-11-26 MWFormer: Multi-Weather Image Restoration Using Degradation-Aware Transformers Ruoxi Zhu et.al. 2411.17226 link
2024-11-25 Mixed Degradation Image Restoration via Local Dynamic Optimization and Conditional Embedding Yubin Gu et.al. 2411.16217 null
2024-11-25 U2NeRF: Unsupervised Underwater Image Restoration and Neural Radiance Fields Vinayak Gupta et.al. 2411.16172 null
2024-11-25 Image Generation Diversity Issues and How to Tame Them Mischa Dombrowski et.al. 2411.16171 link
2024-11-24 PromptHSI: Universal Hyperspectral Image Restoration Framework for Composite Degradation Chia-Ming Lee et.al. 2411.15922 link
2024-11-24 MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking Chunhui Zhang et.al. 2411.15761 link
2024-11-24 LTCF-Net: A Transformer-Enhanced Dual-Channel Fourier Framework for Low-Light Image Restoration Gaojing Zhang et.al. 2411.15740 null
2024-11-22 Frequency-Guided Posterior Sampling for Diffusion-Based Image Restoration Darshan Thaker et.al. 2411.15295 null
2024-11-22 MambaIRv2: Attentive State Space Restoration Hang Guo et.al. 2411.15269 link
2024-11-22 Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval Zengbao Sun et.al. 2411.14704 link
2024-11-21 Unveiling the Hidden: A Comprehensive Evaluation of Underwater Image Enhancement and Its Impact on Object Detection Ali Awad et.al. 2411.14626 link
2024-11-21 Zero-Shot Low-Light Image Enhancement via Joint Frequency Domain Priors Guided Diffusion Jinhong He et.al. 2411.13961 link
2024-11-20 Analysis and Synthesis Denoisers for Forward-Backward Plug-and-Play Algorithms Matthieu Kowalski et.al. 2411.13276 null
2024-11-20 Globally Correlation-Aware Hard Negative Generation Wenjie Peng et.al. 2411.13145 link
2024-11-19 Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution Yang Zou et.al. 2411.12530 link
2024-11-19 Frequency-Aware Guidance for Blind Image Restoration via Diffusion Models Jun Xiao et.al. 2411.12450 null
2024-11-19 Versatile Cataract Fundus Image Restoration Model Utilizing Unpaired Cataract and High-quality Images Zheng Gong et.al. 2411.12278 null
2024-11-16 GeoGround: A Unified Large Vision-Language Model. for Remote Sensing Visual Grounding Yue Zhou et.al. 2411.11904 link
2024-11-18 Edge-Enhanced Dilated Residual Attention Network for Multimodal Medical Image Fusion Meng Zhou et.al. 2411.11799 link
2024-11-18 Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment Zhendong Liu et.al. 2411.11543 null
2024-11-17 Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method Yan Zheng et.al. 2411.11135 null
2024-11-19 TSFormer: A Robust Framework for Efficient UHD Image Restoration Xin Su et.al. 2411.10951 null
2024-11-16 AllRestorer: All-in-One Transformer for Image Restoration under Composite Degradations Jiawei Mao et.al. 2411.10708 null
2024-11-16 Underwater Image Enhancement with Cascaded Contrastive Learning Yi Liu et.al. 2411.10682 link
2024-11-16 SPDFusion: An Infrared and Visible Image Fusion Network Based on a Non-Euclidean Representation of Riemannian Manifolds Huan Kang et.al. 2411.10679 null
2024-11-15 Probabilistic Prior Driven Attention Mechanism Based on Diffusion Model for Imaging Through Atmospheric Turbulence Guodong Sun et.al. 2411.10321 null
2024-11-15 Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting Ziqi Xie et.al. 2411.10309 link
2024-11-15 Rethinking Normalization Strategies and Convolutional Kernels for Multimodal Image Fusion Dan He et.al. 2411.10036 null
2024-11-14 Instruction-Driven Fusion of Infrared-Visible Images: Tailoring for Diverse Downstream Tasks Zengyi Yang et.al. 2411.09387 null
2024-11-13 Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval Saul Santos et.al. 2411.08590 link
2024-11-13 Saliency Map-based Image Retrieval using Invariant Krawtchouk Moments Ashkan Nejad et.al. 2411.08567 link
2024-11-12 CT-Mamba: A Hybrid Convolutional State Space Model for Low-Dose CT Denoising Linxuan Li et.al. 2411.07930 link
2024-11-12 Joint multi-dimensional dynamic attention and transformer for general image restoration Huan Zhang et.al. 2411.07893 link
2024-11-12 All-in-one Weather-degraded Image Restoration via Adaptive Degradation-aware Self-prompting Model Yuanbo Wen et.al. 2411.07445 null
2024-11-11 Multi-scale Frequency Enhancement Network for Blind Image Deblurring Yawen Xiang et.al. 2411.06893 null
2024-11-10 Dropout the High-rate Downsampling: A Novel Design Paradigm for UHD Image Restoration Chen Wu et.al. 2411.06456 null
2024-11-08 A Modular Conditional Diffusion Framework for Image Reconstruction Magauiya Zhussip et.al. 2411.05993 null
2024-11-05 From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing Xintian Sun et.al. 2411.05826 null
2024-11-07 Dynamic Brightness Adaptation for Robust Multi-modal Image Fusion Yiming Sun et.al. 2411.04697 link
2024-11-07 l0-Regularized Sparse Coding-based Interpretable Network for Multi-Modal Image Fusion Gargi Panda et.al. 2411.04519 null
2024-11-05 Test-Time Dynamic Image Fusion Bing Cao et.al. 2411.02840 link
2024-11-05 ERUP-YOLO: Enhancing Object Detection Robustness for Adverse Weather Condition by Unified Image-Adaptive Processing Yuka Ogino et.al. 2411.02799 null
2024-11-04 TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives Maitreya Patel et.al. 2411.02545 null
2024-11-11 INQUIRE: A Natural World Text-to-Image Retrieval Benchmark Edward Vendrow et.al. 2411.02537 link
2024-11-04 Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models Sharat Agarwal et.al. 2411.01925 null
2024-11-03 Degradation-Aware Residual-Conditioned Optimal Transport for Unified Image Restoration Xiaole Tang et.al. 2411.01656 link
2024-11-03 Conditional Controllable Image Fusion Bing Cao et.al. 2411.01573 link
2024-11-03 Efficient Medical Image Retrieval Using DenseNet and FAISS for BIRADS Classification MD Shaikh Rahman et.al. 2411.01473 null
2024-11-03 TPOT: Topology Preserving Optimal Transport in Retinal Fundus Image Enhancement Xuanzhao Dong et.al. 2411.01403 link
2024-11-02 Medical X-Ray Image Enhancement Using Global Contrast-Limited Adaptive Histogram Equalization Sohrab Namazi Nia et.al. 2411.01373 null
2024-11-01 Identifying Implicit Social Biases in Vision-Language Models Kimia Hamidieh et.al. 2411.00997 null
2024-10-31 Aquatic-GS: A Hybrid 3D Representation for Underwater Scenes Shaohua Liu et.al. 2411.00239 null
2024-10-31 Chasing Better Deep Image Priors between Over- and Under-parameterization Qiming Wu et.al. 2410.24187 link
2024-10-31 Nearest Neighbor Normalization Improves Multimodal Retrieval Neil Chowdhury et.al. 2410.24114 link
2024-10-31 Image Synthesis with Class-Aware Semantic Diffusion Models for Surgical Scene Segmentation Yihang Zhou et.al. 2410.23962 null
2024-10-31 Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model Hao Zhang et.al. 2410.23905 link
2024-10-31 MoTaDual: Modality-Task Dual Alignment for Enhanced Zero-shot Composed Image Retrieval Haiwen Li et.al. 2410.23736 null
2024-10-31 Cycle-Constrained Adversarial Denoising Convolutional Network for PET Image Denoising: Multi-Dimensional Validation on Large Datasets with Reader Study and Real Low-Dose Data Yucun Hou et.al. 2410.23628 null
2024-10-31 MS-Glance: Non-semantic context vectors and the applications in supervising image reconstruction Ziqi Gao et.al. 2410.23577 link
2024-10-30 Decoupling Semantic Similarity from Spatial Alignment for Neural Networks Tassilo Wald et.al. 2410.23107 link
2024-10-30 EnsIR: An Ensemble Algorithm for Image Restoration via Gaussian Mixture Models Shangquan Sun et.al. 2410.22959 link
2024-10-30 SFDFusion: An Efficient Spatial-Frequency Domain Fusion Network for Infrared and Visible Image Fusion Kun Hu et.al. 2410.22837 link
2024-10-30 Analyzing Noise Models and Advanced Filtering Algorithms for Image Enhancement Sahil Ali Akbar et.al. 2410.21946 link
2024-10-29 Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications Monica Riedler et.al. 2410.21943 link
2024-10-28 Kandinsky 3: Text-to-Image Synthesis for Multifunctional Generative Framework Vladimir Arkhipkin et.al. 2410.21061 link
2024-10-27 Wavelet-based Mamba with Fourier Adjustment for Low-light Image Enhancement Junhao Tan et.al. 2410.20314 link
2024-10-27 Deep Learning, Machine Learning -- Digital Signal and Image Processing: From Theory to Application Weiche Hsieh et.al. 2410.20304 null
2024-10-24 HUE Dataset: High-Resolution Event and Frame Sequences for Low-Light Vision Burak Ercan et.al. 2410.19164 null
2024-10-24 ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval Zijia Zhao et.al. 2410.18715 link
2024-10-29 DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation Yuang Ai et.al. 2410.18666 link
2024-10-23 DREB-Net: Dual-stream Restoration Embedding Blur-feature Fusion Network for High-mobility UAV Object Detection Qingpeng Li et.al. 2410.17822 link
2024-10-23 An Intelligent Agentic System for Complex Image Restoration Problems Kaiwen Zhu et.al. 2410.17809 link
2024-10-23 A variational approach to nonlocal image restoration flows Harsh Prasad et.al. 2410.17649 null
2024-10-23 Diffusion Priors for Variational Likelihood Estimation and Image Denoising Jun Cheng et.al. 2410.17521 link
2024-10-22 Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval Yuanmin Tang et.al. 2410.17393 null
2024-10-20 LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration Yuang Ai et.al. 2410.15385 link
2024-10-20 GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning Haiwen Diao et.al. 2410.15266 link
2024-10-19 A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends Junjun Jiang et.al. 2410.15067 link
2024-10-19 Visual Navigation of Digital Libraries: Retrieval and Classification of Images in the National Library of Norway's Digitised Book Collection Marie Roald et.al. 2410.14969 link
2024-10-16 Development of Image Collection Method Using YOLO and Siamese Network Chan Young Shin et.al. 2410.12561 null
2024-10-16 Towards Flexible and Efficient Diffusion Low Light Enhancer Guanzhou Lan et.al. 2410.12346 null
2024-10-16 Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond Pengwei Liang et.al. 2410.12274 null
2024-10-15 Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos Zhouxia Wang et.al. 2410.11828 null
2024-10-15 LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images Yuzhou Cheng et.al. 2410.11505 null
2024-10-13 Fusion Based Hand Geometry Recognition Using Dempster-Shafer Theory Asish Bera et.al. 2410.09842 null
2024-10-13 LoLI-Street: Benchmarking Low-Light Image Enhancement and Beyond Md Tanvir Islam et.al. 2410.09831 link
2024-10-14 LIME-Eval: Rethinking Low-light Image Enhancement Evaluation via Object Detection Mingjia Li et.al. 2410.08810 link
2024-10-11 Chain-of-Restoration: Multi-Task Image Restoration Models are Zero-Shot Step-by-Step Universal Image Restorers Jin Cao et.al. 2410.08688 link
2024-10-16 Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP Eunji Kim et.al. 2410.08469 null
2024-10-11 A Unified Deep Semantic Expansion Framework for Domain-Generalized Person Re-identification Eugene P. W. Ang et.al. 2410.08456 null
2024-10-10 TANet: Triplet Attention Network for All-In-One Adverse Weather Image Restoration Hsing-Hua Wang et.al. 2410.08177 link
2024-10-10 A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks Hoin Jung et.al. 2410.07593 link
2024-10-09 Exploiting Distribution Constraints for Scalable and Efficient Image Retrieval Mohammad Omama et.al. 2410.07022 null
2024-10-09 Rethinking the Evaluation of Visible and Infrared Image Fusion Dayan Guan et.al. 2410.06811 link
2024-10-09 InstantIR: Blind Image Restoration with Instant Generative Reference Jen-Yuan Huang et.al. 2410.06551 null
2024-10-09 MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging Noel C. F. Codella et.al. 2410.06542 null
2024-10-08 Temporal Image Caption Retrieval Competition -- Description and Results Jakub Pokrywka et.al. 2410.06314 null
2024-10-08 GSLoc: Visual Localization with 3D Gaussian Splatting Kazii Botashev et.al. 2410.06165 null
2024-10-08 Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning Ayush Singh et.al. 2410.05928 null
2024-10-08 ReFIR: Grounding Large Restoration Models with Retrieval Augmentation Hang Guo et.al. 2410.05601 link
2024-10-09 LoTLIP: Improving Language-Image Pre-training for Long Text Understanding Wei Wu et.al. 2410.05249 null
2024-10-07 Learning Efficient and Effective Trajectories for Differential Equation-based Image Restoration Zhiyu Zhu et.al. 2410.04811 link
2024-10-06 Generalizability analysis of deep learning predictions of human brain responses to augmented and semantically novel visual stimuli Valentyn Piskovskyi et.al. 2410.04497 null
2024-10-06 SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems Ismail Alkhouri et.al. 2410.04479 link
2024-10-05 Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model Keda Tao et.al. 2410.04161 null
2024-10-04 Diffusion State-Guided Projected Gradient for Inverse Problems Rayhan Zirvi et.al. 2410.03463 link
2024-10-03 PnP-Flow: Plug-and-Play Image Restoration with Flow Matching Ségolène Martin et.al. 2410.02423 link
2024-10-03 Can Capacitive Touch Images Enhance Mobile Keyboard Decoding? Piyawat Lertvittayakumjorn et.al. 2410.02264 link
2024-10-02 Posterior sampling via Langevin dynamics based on generative priors Vishal Purohit et.al. 2410.02078 null
2024-10-03 EUFCC-CIR: a Composed Image Retrieval Dataset for GLAM Collections Francesc Net et.al. 2410.01536 link
2024-10-04 CSIM: A Copula-based similarity index sensitive to local changes for Image quality assessment Safouane El Ghazouali et.al. 2410.01411 link
2024-10-01 Three-Operator Splitting Method with Two-Step Inertial Extrapolation Olaniyi S. Iyiola et.al. 2410.01099 null
2024-10-01 GMT: Enhancing Generalizable Neural Rendering via Geometry-Driven Multi-Reference Texture Transfer Youngho Yoon et.al. 2410.00672 link
2024-10-01 Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration Guy Ohayon et.al. 2410.00418 link
2024-10-01 GLMHA A Guided Low-rank Multi-Head Self-Attention for Efficient Image Restoration and Spectral Reconstruction Zaid Ilyas et.al. 2410.00380 null
2024-09-30 Class-Agnostic Visio-Temporal Scene Sketch Semantic Segmentation Aleyna Kütük et.al. 2410.00266 null
2024-09-30 A Survey on Diffusion Models for Inverse Problems Giannis Daras et.al. 2410.00083 null
2024-09-30 UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation Cheng Zhang et.al. 2409.20197 link
2024-09-29 Underwater Organism Color Enhancement via Color Code Decomposition, Adaptation and Interpolation Xiaofeng Cong et.al. 2409.19685 link
2024-09-28 Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration Chu-Jie Qin et.al. 2409.19403 link
2024-09-28 VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition Ahmad Khaliq et.al. 2409.19293 link
2024-09-28 PDCFNet: Enhancing Underwater Images through Pixel Difference Convolution Song Zhang et.al. 2409.19269 link
2024-09-28 Extending Depth of Field for Varifocal Multiview Images Zhilong Li et.al. 2409.19220 null
2024-09-27 MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion Bardienus Duisterhof et.al. 2409.19152 null
2024-09-27 Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors Yunlong Lin et.al. 2409.18899 null
2024-09-26 Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval Mankeerat Sidhu et.al. 2409.18733 null
2024-09-27 Multi-modal Medical Image Fusion For Non-Small Cell Lung Cancer Classification Salma Hassan et.al. 2409.18715 null
2024-09-27 Underwater Image Enhancement with Physical-based Denoising Diffusion Implicit Models Nguyen Gia Bach et.al. 2409.18476 link
2024-09-27 SinoSynth: A Physics-based Domain Randomization Approach for Generalizable CBCT Image Enhancement Yunkui Pang et.al. 2409.18355 link
2024-09-26 Toward Efficient Deep Blind RAW Image Restoration Marcos V. Conde et.al. 2409.18204 link
2024-09-26 Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs Qinpeng Cui et.al. 2409.17778 link
2024-09-25 Morphological-consistent Diffusion Network for Ultrasound Coronal Image Enhancement Yihao Zhou et.al. 2409.16661 null
2024-09-25 Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement Guanlin Li et.al. 2409.16604 link
2024-09-24 Proactive Schemes: A Survey of Adversarial Attacks for Social Good Vishal Asnani et.al. 2409.16491 null
2024-09-24 Liger at W.M. Keck Observatory: imager structural analysis, fabrication, and characterization plan James Wiley et.al. 2409.16263 null
2024-09-23 PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions Weifeng Lin et.al. 2409.15278 link
2024-09-23 FusionRF: High-Fidelity Satellite Neural Radiance Fields from Multispectral and Panchromatic Acquisitions Michael Sprintson et.al. 2409.15132 null
2024-09-22 Low-Light Enhancement Effect on Classification and Detection: An Empirical Study Xu Wu et.al. 2409.14461 null
2024-09-22 Quantitative and Qualitative Evaluation of NLM and Wavelet Methods in Image Enhancement Cameron Khanpour et.al. 2409.14334 null
2024-09-20 Efficient and Discriminative Image Feature Extraction for Universal Image Retrieval Morris Florek et.al. 2409.13513 link
2024-09-19 Deep Learning-Based Detection of Referable Diabetic Retinopathy and Macular Edema Using Ultra-Widefield Fundus Imaging Philippe Zhang et.al. 2409.12854 null
2024-09-19 Fundus image enhancement through direct diffusion bridges Sehui Kim et.al. 2409.12377 link
2024-09-18 Denoising diffusion models for high-resolution microscopy image restoration Pamela Osuna-Vargas et.al. 2409.12078 null
2024-09-18 DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image Fusion Jian Xu et.al. 2409.11642 link
2024-09-17 Ultrasound Image Enhancement with the Variance of Diffusion Models Yuxin Zhang et.al. 2409.11380 link
2024-09-17 Improving the Efficiency of Visually Augmented Language Models Paula Ontalvilla et.al. 2409.11148 link
2024-09-17 CUNSB-RFIE: Context-aware Unpaired Neural Schrödinger Bridge in Retinal Fundus Image Enhancement Xuanzhao Dong et.al. 2409.10966 link
2024-09-16 Taming Diffusion Models for Image Restoration: A Review Ziwei Luo et.al. 2409.10353 null
2024-09-17 Fuse4Seg: Image-Level Fusion Based Multi-Modality Medical Image Segmentation Yuchen Guo et.al. 2409.10328 null
2024-09-16 Garment Attribute Manipulation with Multi-level Attention Vittorio Casula et.al. 2409.10206 null
2024-09-16 DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion Yuchen Guo et.al. 2409.10080 null
2024-09-15 Underwater Image Enhancement via Dehazing and Color Restoration Chengqin Wu et.al. 2409.09779 null
2024-09-15 Unsupervised Hyperspectral and Multispectral Image Blind Fusion Based on Deep Tucker Decomposition Network with Spatial-Spectral Manifold Learning He Wang et.al. 2409.09670 link
2024-09-14 Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image Retrieval Amirreza Mahbod et.al. 2409.09430 link
2024-09-14 Infrared and Visible Image Fusion with Hierarchical Human Perception Guang Yang et.al. 2409.09291 null
2024-09-12 Context-Aware Optimal Transport Learning for Retinal Fundus Image Enhancement Vamsi Krishna Vasa et.al. 2409.07862 null
2024-09-12 Quaternion Nuclear Norm minus Frobenius Norm Minimization for color image reconstruction Yu Guo et.al. 2409.07797 null
2024-09-11 FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process Yang Luo et.al. 2409.07451 null
2024-09-11 Retinex-RAWMamba: Bridging Demosaicing and Denoising for Low-Light RAW Image Enhancement Xianmin Chen et.al. 2409.07040 link
2024-09-11 PanAdapter: Two-Stage Fine-Tuning with Spatial-Spectral Priors Injecting for Pansharpening RuoCheng Wu et.al. 2409.06980 null
2024-09-10 Modeling Image Tone Dichotomy with the Power Function Axel Martinez et.al. 2409.06764 null
2024-09-10 Lightweight Multiscale Feature Fusion Super-Resolution Network Based on Two-branch Convolution and Transformer Li Ke et.al. 2409.06590 null
2024-09-10 Unrevealed Threats: A Comprehensive Study of the Adversarial Robustness of Underwater Image Enhancement Models Siyu Zhai et.al. 2409.06420 null
2024-09-10 A Cross-Font Image Retrieval Network for Recognizing Undeciphered Oracle Bone Inscriptions Zhicong Wu et.al. 2409.06381 null
2024-09-10 Multi-Weather Image Restoration via Histogram-Based Transformer Feature Enhancement Yang Wen et.al. 2409.06334 null
2024-09-10 AgileIR: Memory-Efficient Group Shifted Windows Attention for Agile Image Restoration Hongyi Cai et.al. 2409.06206 null
2024-09-09 Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding Bram Willemsen et.al. 2409.05721 link
2024-09-09 Open-World Dynamic Prompt and Continual Visual Representation Learning Youngeun Kim et.al. 2409.05312 null
2024-09-09 Rethinking the Atmospheric Scattering-driven Attention via Channel and Gamma Correction Priors for Low-Light Image Enhancement Shyang-En Weng et.al. 2409.05274 link
2024-09-07 Training-free ZS-CIR via Weighted Modality Fusion and Similarity Ren-Di Wu et.al. 2409.04918 link
2024-09-07 Power Line Aerial Image Restoration under dverse Weather: Datasets and Baselines Sai Yang et.al. 2409.04812 link
2024-09-06 Zero-Shot Whole Slide Image Retrieval in Histopathology Using Embeddings of Foundation Models Saghir Alfasly et.al. 2409.04631 null
2024-09-06 Empirical Bayesian image restoration by Langevin sampling with a denoising diffusion implicit prior Charlesquin Kemajou Mbakam et.al. 2409.04384 null
2024-09-06 RCNet: Deep Recurrent Collaborative Network for Multi-View Low-Light Image Enhancement Hao Luo et.al. 2409.04363 link
2024-09-06 Secure Traffic Sign Recognition: An Attention-Enabled Universal Image Inpainting Mechanism against Light Patch Attacks Hangcheng Cao et.al. 2409.04133 null
2024-09-05 Data-free Distillation with Degradation-prompt Diffusion for Multi-weather Image Restoration Pei Wang et.al. 2409.03455 null
2024-09-05 KAN See In the Dark Aoxiang Ning et.al. 2409.03404 link
2024-09-05 Multiple weather images restoration using the task transformer and adaptive mixup strategy Yang Wen et.al. 2409.03249 null
2024-09-05 Why mamba is effective? Exploit Linear Transformer-Mamba Network for Multi-Modality Image Fusion Chenguang Zhu et.al. 2409.03223 null
2024-09-05 Perceptual-Distortion Balanced Image Super-Resolution is a Multi-Objective Optimization Problem Qiwen Zhu et.al. 2409.03179 link
2024-09-04 Design and Evaluation of Camera-Centric Mobile Crowdsourcing Applications Abby Stylianou et.al. 2409.03012 null
2024-09-04 Multi-Head Attention Residual Unfolded Network for Model-Based Pansharpening Ivan Pereira-Sánchez et.al. 2409.02675 link
2024-09-04 NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval Sepanta Zeighami et.al. 2409.02343 link
2024-09-03 Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models Jiaqi Xu et.al. 2409.02101 link
2024-09-03 F2former: When Fractional Fourier Meets Deep Wiener Deconvolution and Selective Frequency Transformer for Image Deblurring Subhajit Paul et.al. 2409.02056 null
2024-09-03 AllWeatherNet:Unified Image enhancement for autonomous driving under adverse weather and lowlight-conditions Chenghao Qian et.al. 2409.02045 link
2024-09-03 Optimizing CLIP Models for Image Retrieval with Maintained Joint-Embedding Alignment Konstantin Schall et.al. 2409.01936 link
2024-09-03 Shuffle Mamba: State Space Models with Random Shuffle for Multi-Modal Image Fusion Ke Cao et.al. 2409.01728 null
2024-09-03 Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement Kun Zhou et.al. 2409.01641 link
2024-09-03 GaussianPU: A Hybrid 2D-3D Upsampling Framework for Enhancing Color Point Clouds via 3D Gaussian Splatting Zixuan Guo et.al. 2409.01581 null
2024-09-02 A Review of Image Retrieval Techniques: Data Augmentation and Adversarial Learning Approaches Kim Jinwoo et.al. 2409.01219 null
2024-08-30 Enhancing Underwater Imaging with 4-D Light Fields: Dataset and Method Yuji Lin et.al. 2408.17339 link
2024-09-02 RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance Avideep Mukherjee et.al. 2408.17095 null
2024-08-30 Efficient Image Restoration through Low-Rank Adaptation and Stable Diffusion XL Haiyang Zhao et.al. 2408.17060 null
2024-08-29 GameIR: A Large-Scale Synthesized Ground-Truth Dataset for Image Restoration over Gaming Content Lebin Zhou et.al. 2408.16866 null
2024-09-02 A Deep-Learning-Based Label-free No-Reference Image Quality Assessment Metric: Application in Sodium MRI Denoising Shuaiyu Yuan et.al. 2408.16481 null
2024-08-29 Enhanced Control for Diffusion Bridge in Image Restoration Conghan Yue et.al. 2408.16303 link
2024-08-29 Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models Kengo Nakata et.al. 2408.16296 null
2024-08-29 LMT-GP: Combined Latent Mean-Teacher and Gaussian Process for Semi-supervised Low-light Image Enhancement Ye Yu et.al. 2408.16235 link
2024-08-28 Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration Xu Zhang et.al. 2408.15994 null
2024-08-28 MMDRFuse: Distilled Mini-Model with Dynamic Refresh for Multi-Modality Image Fusion Yanglin Deng et.al. 2408.15641 link
2024-08-28 Temporal Attention for Cross-View Sequential Image Localization Dong Yuan et.al. 2408.15569 link
2024-08-27 A Preliminary Exploration Towards General Image Restoration Xiangtao Kong et.al. 2408.15143 null
2024-08-27 Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild Tianqi Wei et.al. 2408.14723 null
2024-08-26 FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation Daixun Li et.al. 2408.13980 null
2024-08-25 LowCLIP: Adapting the CLIP Model Architecture for Low-Resource Languages in Multimodal Image Retrieval Task Ali Asgarov et.al. 2408.13909 link
2024-08-23 O-Mamba: O-shape State-Space Model for Underwater Image Enhancement Chenyu Dong et.al. 2408.12816 link
2024-08-22 CODE: Confident Ordinary Differential Editing Bastien van Delft et.al. 2408.12418 link
2024-08-22 Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement Lingyu Zhu et.al. 2408.12316 link
2024-08-21 Visual Localization in 3D Maps: Comparing Point Cloud, Mesh, and NeRF Representations Lintong Zhang et.al. 2408.11966 null
2024-08-21 OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal Qiao Mo et.al. 2408.11480 link
2024-08-21 UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation Xiangyu Zhao et.al. 2408.11305 link
2024-08-21 Taming Generative Diffusion for Universal Blind Image Restoration Siwei Tu et.al. 2408.11287 null
2024-08-20 Prompt-Guided Image-Adaptive Neural Implicit Lookup Tables for Interpretable Image Enhancement Satoshi Kosugi et.al. 2408.11055 link
2024-08-20 SDI-Net: Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement Linlin Hu et.al. 2408.10934 null
2024-08-20 UIE-UnFold: Deep Unfolding Network with Color Priors and Vision Transformer for Underwater Image Enhancement Yingtie Lei et.al. 2408.10653 link
2024-08-19 BrewCLIP: A Bifurcated Representation Learning Framework for Audio-Visual Retrieval Zhenyu Lu et.al. 2408.10383 null
2024-08-19 Multi-Scale Representation Learning for Image Restoration with State-Space Model Yuhong He et.al. 2408.10145 null
2024-08-19 Harnessing Multi-resolution and Multi-scale Attention for Underwater Image Restoration Alik Pramanick et.al. 2408.09912 link
2024-08-19 Fashion Image-to-Image Translation for Complementary Item Retrieval Matteo Attimonelli et.al. 2408.09847 link
2024-08-19 ExpoMamba: Exploiting Frequency SSM Blocks for Efficient and Effective Image Enhancement Eashan Adhikarla et.al. 2408.09650 link
2024-08-17 Re-boosting Self-Collaboration Parallel Prompt GAN for Unsupervised Image Restoration Xin Lin et.al. 2408.09241 link
2024-08-16 DFT-Based Adversarial Attack Detection in MRI Brain Imaging: Enhancing Diagnostic Accuracy in Alzheimer's Case Studies Mohammad Hossein Najafi et.al. 2408.08489 null
2024-08-15 Unsupervised Variational Translator for Bridging Image Restoration and High-Level Vision Tasks Jiawei Wu et.al. 2408.08149 link
2024-08-15 HAIR: Hypernetworks-based All-in-One Image Restoration Jin Cao et.al. 2408.08091 link
2024-08-15 DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions Ryosuke Korekata et.al. 2408.07910 null
2024-08-13 Review Learning: Advancing All-in-One Ultra-High-Definition Image Restoration Training Method Xin Su et.al. 2408.06709 null
2024-08-12 Wavelet based inpainting detection Barglazan Adrian-Alin et.al. 2408.06429 null
2024-08-12 Latent Disentanglement for Low Light Image Enhancement Zhihao Zheng et.al. 2408.06245 null
2024-08-10 Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network Junyan Ye et.al. 2408.05475 link
2024-08-10 Greedy randomized block Kaczmarz method for matrix equation AXB=C and its applications in color image restoration Wenli Wang et.al. 2408.05444 null
2024-08-08 Physical prior guided cooperative learning framework for joint turbulence degradation estimation and infrared video restoration Ziran Zhang et.al. 2408.04227 null
2024-08-08 MultiColor: Image Colorization by Learning from Multiple Color Spaces Xiangcheng Du et.al. 2408.04172 null
2024-08-06 AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval Pavel Suma et.al. 2408.03282 link
2024-08-05 Multi-weather Cross-view Geo-localization Using Denoising Diffusion Models Tongtong Feng et.al. 2408.02408 null
2024-08-02 On Validation of Search & Retrieval of Tissue Images in Digital Pathology H. R. Tizhoosh et.al. 2408.01570 null
2024-08-02 Underwater Object Detection Enhancement via Channel Stabilization Muhammad Ali et.al. 2408.01293 link
2024-08-02 Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement Wenbin Zou et.al. 2408.01276 link
2024-08-02 Contribution-based Low-Rank Adaptation with Pre-training Model for Real Image Restoration Donwon Park et.al. 2408.01099 null
2024-08-02 FCDFusion: a Fast, Low Color Deviation Method for Fusing Visible and Infrared Image Pairs Hesong Li et.al. 2408.01080 null
2024-08-01 A Prior Embedding-Driven Architecture for Long Distance Blind Iris Recognition Qi Xiong et.al. 2408.00210 null
2024-07-30 UniProcessor: A Text-induced Unified Low-level Image Processor Huiyu Duan et.al. 2407.20928 link
2024-07-27 Inverse Problems with Diffusion Models: A MAP Estimation Perspective Sai bharath chandra Gutha et.al. 2407.20784 link
2024-07-29 ALEN: A Dual-Approach for Uniform and Non-Uniform Low-Light Image Enhancement Ezequiel Perez-Zarate et.al. 2407.19708 link
2024-07-31 Mamba-UIE: Enhancing Underwater Images with Physical Model Constraint Song Zhang et.al. 2407.19248 null
2024-07-27 Multi-Expert Adaptive Selection: Task-Balancing for All-in-One Image Restoration Xiaoyan Yu et.al. 2407.19139 link
2024-07-26 Dilated Strip Attention Network for Image Restoration Fangwei Hao et.al. 2407.18613 null
2024-07-25 RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models Haoyu Chen et.al. 2407.18035 null
2024-07-25 Joint RGB-Spectral Decomposition Model Guided Image Enhancement in Mobile Photography Kailai Zhou et.al. 2407.17996 link
2024-07-23 S-E Pipeline: A Vision Transformer (ViT) based Resilient Classification Pipeline for Medical Imaging Against Adversarial Attacks Neha A S et.al. 2407.17587 null
2024-07-24 Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation Yongqi Li et.al. 2407.17274 null
2024-07-23 CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction Liang Zhao et.al. 2407.16204 null
2024-07-23 Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems Sojin Lee et.al. 2407.16125 link
2024-07-20 Deep Learning CT Image Restoration using System Blur and Noise Models Yijie Yuan et.al. 2407.14983 null
2024-07-23 AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement Yunlong Lin et.al. 2407.14900 null
2024-07-20 Dual High-Order Total Variation Model for Underwater Image Restoration Yuemei Li et.al. 2407.14868 link
2024-07-19 Adaptive Frequency Enhancement Network for Single Image Deraining Fei Yan et.al. 2407.14292 null
2024-07-19 Double-Shot 3D Shape Measurement with a Dual-Branch Network Mingyang Lei et.al. 2407.14198 null
2024-07-19 TaGAT: Topology-Aware Graph Attention Network For Multi-modal Retinal Image Fusion Xin Tian et.al. 2407.14188 link
2024-07-18 Visual Haystacks: Answering Harder Questions About Sets of Images Tsung-Han Wu et.al. 2407.13766 link
2024-07-18 Any Image Restoration with Efficient Automatic Degradation Adaptation Bin Ren et.al. 2407.13372 link
2024-07-18 Training-Free Large Model Priors for Multiple-in-One Image Restoration Xuanhua He et.al. 2407.13181 null
2024-07-18 Unified-EGformer: Exposure Guided Lightweight Transformer for Mixed-Exposure Image Enhancement Eashan Adhikarla et.al. 2407.13170 null
2024-07-21 HPPP: Halpern-type Preconditioned Proximal Point Algorithms and Applications to Image Restoration Shuchang Zhang et.al. 2407.13120 link
2024-07-17 Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations Tomáš Chobola et.al. 2407.12511 link
2024-07-17 GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval Han Zhou et.al. 2407.12431 link
2024-07-17 Towards Revisiting Visual Place Recognition for Joining Submaps in Multimap SLAM Markus Weißflog et.al. 2407.12408 null
2024-07-17 GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity Shuo Cao et.al. 2407.12273 null
2024-07-16 Haze-Aware Attention Network for Single-Image Dehazing Lihan Tong et.al. 2407.11505 null
2024-07-16 EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis Ruijie Yang et.al. 2407.11401 null
2024-07-15 No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations Walter Simoncini et.al. 2407.10964 link
2024-07-15 In-Loop Filtering via Trained Look-Up Tables Zhuoyuan Li et.al. 2407.10926 null
2024-07-15 MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration Yulin Ren et.al. 2407.10833 null
2024-07-15 DINO Pre-training for Vision-based End-to-end Autonomous Driving Shubham Juneja et.al. 2407.10803 null
2024-07-15 Addressing Image Hallucination in Text-to-Image Generation through Factual Image Retrieval Youngsun Lim et.al. 2407.10683 null
2024-07-15 An experimental evaluation of Siamese Neural Networks for robot localization using omnidirectional imaging in indoor environments J. J. Cabrera et.al. 2407.10536 null

(back to top)

Image Matching

Publish Date Title Authors PDF Code
2025-07-22 A Single-step Accurate Fingerprint Registration Method Based on Local Feature Matching Yuwei Jia et.al. 2507.16201 null
2025-07-09 Dual-Granularity Cross-Modal Identity Association for Weakly-Supervised Text-to-Person Image Matching Yafei Zhang et.al. 2507.06744 null
2025-07-05 From Query to Explanation: Uni-RAG for Multi-Modal Retrieval-Augmented Learning in STEM Xinyi Wu et.al. 2507.03868 null
2025-07-02 What does really matter in image goal navigation? Gianluca Monaci et.al. 2507.01667 null
2025-06-30 Efficient and Accurate Image Provenance Analysis: A Scalable Pipeline for Large-scale Images Jiewei Lai et.al. 2506.23707 null
2025-06-29 Dynamic Contrastive Learning for Hierarchical Retrieval: A Case Study of Distance-Aware Cross-View Geo-Localization Suofei Zhang et.al. 2506.23077 null
2025-06-27 MatChA: Cross-Algorithm Matching with Feature Augmentation Paula Carbó Cubero et.al. 2506.22336 null
2025-07-22 Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs Shaojie Zhang et.al. 2506.22139 null
2025-06-27 ZeroReg3D: A Zero-shot Registration Pipeline for 3D Consecutive Histopathology Image Reconstruction Juming Xiong et.al. 2506.21923 null
2025-06-25 Fast entropy-regularized SDP relaxations for permutation synchronization Michael Lindsey et.al. 2506.20191 null
2025-06-18 ReSeDis: A Dataset for Referring-based Object Search across Large-Scale Image Collections Ziling Huang et.al. 2506.15180 null
2025-06-16 EmbodiedPlace: Learning Mixture-of-Features with Embodied Constraints for Visual Place Recognition Bingxi Liu et.al. 2506.13133 null
2025-06-12 RealKeyMorph: Keypoints in Real-world Coordinates for Resolution-agnostic Image Registration Mina C. Moghadam et.al. 2506.10344 null
2025-06-11 Hierarchical Image Matching for UAV Absolute Visual Localization via Semantic and Structural Constraints Xiangkai Zhang et.al. 2506.09748 null
2025-06-11 ScaleLSD: Scalable Deep Line Segment Detection Streamlined Zeran Ke et.al. 2506.09369 link
2025-05-21 Anti-interrupted sampling repeater jamming via linear canonical Wigner distribution lightweight LFM detection Jia-Mian Li et.al. 2506.06302 null
2025-06-05 Vanishing arcs for isolated plane curve singularities Hanwool Bae et.al. 2506.04917 null
2025-06-05 Deep Learning Reforms Image Matching: A Survey and Outlook Shihua Zhang et.al. 2506.04619 null
2025-06-20 SR3D: Unleashing Single-view 3D Reconstruction for Transparent and Specular Object Grasping Mingxu Zhang et.al. 2505.24305 null
2025-06-05 Universal Domain Adaptation for Semantic Segmentation Seun-An Choe et.al. 2505.22458 null
2025-05-23 To Glue or Not to Glue? Classical vs Learned Image Matching for Mobile Mapping Cameras to Textured Semantic 3D Building Models Simone Gaisbauer et.al. 2505.17973 link
2025-05-16 Multi-view dense image matching with similarity learning and geometry priors Mohamed Ali Chebbi et.al. 2505.11264 null
2025-05-12 Boosting Global-Local Feature Matching via Anomaly Synthesis for Multi-Class Point Cloud Anomaly Detection Yuqi Cheng et.al. 2505.07375 link
2025-05-04 OBD-Finder: Explainable Coarse-to-Fine Text-Centric Oracle Bone Duplicates Discovery Chongsheng Zhang et.al. 2505.03836 link
2025-05-06 LiftFeat: 3D Geometry-Aware Local Feature Matching Yepeng Liu et.al. 2505.03422 link
2025-05-04 Focus What Matters: Matchability-Based Reweighting for Local Feature Matching Dongyue Li et.al. 2505.02161 null
2025-05-15 Mitigating Modality Bias in Multi-modal Entity Alignment from a Causal Perspective Taoyu Su et.al. 2504.19458 link
2025-04-28 Dynamic Arthroscopic Navigation System for Anterior Cruciate Ligament Reconstruction Based on Multi-level Memory Architecture Shuo Wang et.al. 2504.19398 null
2025-04-23 Road Similarity-Based BEV-Satellite Image Matching for UGV Localization Zhenping Sun et.al. 2504.16346 null
2025-04-18 Outlier-Robust Multi-Model Fitting on Quantum Annealers Saurabh Pandey et.al. 2504.13836 null
2025-04-11 Geometric Consistency Refinement for Single Image Novel View Synthesis via Test-Time Adaptation of Diffusion Models Josef Bengtson et.al. 2504.08348 null
2025-04-10 Image registration of 2D optical thin sections in a 3D porous medium: Application to a Berea sandstone digital rock image Jaehong Chung et.al. 2504.06604 link
2025-04-22 To Match or Not to Match: Revisiting Image Matching for Reliable Visual Place Recognition Davide Sferrazza et.al. 2504.06116 link
2025-04-10 Learning Affine Correspondences by Integrating Geometric Constraints Pengju Sun et.al. 2504.04834 link
2025-04-01 Scaling Prompt Instructed Zero Shot Composed Image Retrieval with Image-Only Data Yiqun Duan et.al. 2504.00812 null
2025-03-31 CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching Zizhuo Li et.al. 2503.23925 null
2025-03-28 Pairwise Matching of Intermediate Representations for Fine-grained Explainability Lauren Shrack et.al. 2503.22881 link
2025-03-26 Multimodal Image Matching based on Frequency-domain Information of Local Energy Response Meng Yang et.al. 2503.20827 null
2025-03-22 Normalized Matching Transformer Abtin Pourhadi et.al. 2503.17715 link
2025-03-20 Loop Closure from Two Views: Revisiting PGO for Scalable Trajectory Estimation through Monocular Priors Tian Yi Lim et.al. 2503.16275 null
2025-03-20 MapGlue: Multimodal Remote Sensing Image Matching Peihao Wu et.al. 2503.16185 link
2025-03-19 PAPI-Reg: Patch-to-Pixel Solution for Efficient Cross-Modal Registration between LiDAR Point Cloud and Camera Image Yuanchao Yue et.al. 2503.15285 null
2025-04-07 Less Biased Noise Scale Estimation for Threshold-Robust RANSAC Johan Edstedt et.al. 2503.13433 null
2025-03-17 SatDepth: A Novel Dataset for Satellite Image Matching Rahul Deshmukh et.al. 2503.12706 link
2025-03-14 Refining Image Edge Detection via Linear Canonical Riesz Transforms Shuhui Yang et.al. 2503.11148 null
2025-03-13 Speedy MASt3R Jingxing Li et.al. 2503.10017 null
2025-03-11 Keypoint Detection and Description for Raw Bayer Images Jiakai Lin et.al. 2503.08673 null
2025-03-06 Learning 3D Medical Image Models From Brain Functional Connectivity Network Supervision For Mental Disorder Diagnosis Xingcan Hu et.al. 2503.04205 null
2025-03-07 Diff-Reg v2: Diffusion-Based Matching Matrix Estimation for Image Matching and 3D Registration Qianliang Wu et.al. 2503.04127 null
2025-03-05 JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba Xiaoyong Lu et.al. 2503.03437 null
2025-02-28 CNSv2: Probabilistic Correspondence Encoded Neural Image Servo Anzhe Chen et.al. 2503.00132 null
2025-02-27 A2-GNN: Angle-Annular GNN for Visual Descriptor-free Camera Relocalization Yejun Zhang et.al. 2502.20036 link
2025-02-27 RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges Thibaut Loiseau et.al. 2502.19955 null
2025-02-26 BEV-LIO(LC): BEV Image Assisted LiDAR-Inertial Odometry with Loop Closure Haoxin Cai et.al. 2502.19242 link
2025-02-25 PromptMID: Modal Invariant Descriptors Based on Diffusion and Vision Foundation Models for Optical-SAR Image Matching Han Nie et.al. 2502.18104 link
2025-02-25 Improving Transformer Based Line Segment Detection with Matched Predicting and Re-ranking Xin Tong et.al. 2502.17766 null
2025-03-04 Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model Yaxuan Huang et.al. 2502.16779 null
2025-02-16 FeaKM: Robust Collaborative Perception under Noisy Pose Conditions Jiuwu Hao et.al. 2502.11003 link
2025-02-24 Enhancing Ground-to-Aerial Image Matching for Visual Misinformation Detection Using Semantic Segmentation Emanuele Mule et.al. 2502.06288 link
2025-02-04 Muographic Image Upsampling with Machine Learning for Built Infrastructure Applications William O'Donnell et.al. 2502.02624 null
2025-02-01 MambaGlue: Fast and Robust Local Feature Matching With Mamba Kihwan Ryoo et.al. 2502.00462 link
2025-01-24 Dense-SfM: Structure from Motion with Dense Consistent Matching JongMin Lee et.al. 2501.14277 null
2025-01-20 MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching Yepeng Liu et.al. 2501.11299 null
2025-01-13 MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training Xingyi He et.al. 2501.07556 null
2025-01-13 Matching Free Depth Recovery from Structured Light Zhuohang Yu et.al. 2501.07113 null
2025-01-02 Sparis: Neural Implicit Surface Reconstruction of Indoor Scenes from Sparse Views Yulun Wu et.al. 2501.01196 null
2024-12-31 Towards Real-Time 2D Mapping: Harnessing Drones, AI, and Computer Vision for Advanced Insights Bharath Kumar Agnur et.al. 2412.20210 null
2024-12-27 MINIMA: Modality Invariant Image Matching Xingyu Jiang et.al. 2412.19412 link
2024-12-24 GIMS: Image Matching System Based on Adaptive Graph Construction and Graph Neural Network Xianfeng Song et.al. 2412.18221 link
2024-12-17 Bringing Multimodality to Amazon Visual Search System Xinliang Zhu et.al. 2412.13364 null
2024-12-04 Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis Siyoon Jin et.al. 2412.03150 null
2024-11-20 DT-LSD: Deformable Transformer-based Line Segment Detection Sebastian Janampa et.al. 2411.13005 link
2024-11-15 Image Matching Filtering and Refinement by Planes and Beyond Fabio Bellavia et.al. 2411.09484 link
2024-11-11 XPoint: A Self-Supervised Visual-State-Space based Architecture for Multispectral Image Registration Ismail Can Yagmur et.al. 2411.07430 link
2024-11-07 The Impact of Semi-Supervised Learning on Line Segment Detection Johanna Engman et.al. 2411.04596 link
2024-11-04 Silver medal Solution for Image Matching Challenge 2024 Yian Wang et.al. 2411.01851 null
2024-10-30 Variable Resolution Sampling and Deep Learning Image Recovery for Accelerated Multi-Spectral MRI Near Metal Implants Azadeh Sharafi et.al. 2410.23329 null
2024-11-05 RelationBooth: Towards Relation-Aware Customized Object Generation Qingyu Shi et.al. 2410.23280 null
2024-10-31 ETO:Efficient Transformer-based Local Feature Matching by Organizing Multiple Homography Hypotheses Junjie Ni et.al. 2410.22733 null
2024-10-30 LoFLAT: Local Feature Matching using Focused Linear Attention Transformer Naijian Cao et.al. 2410.22710 null
2024-10-26 Generative Adversarial Patches for Physical Attacks on Cross-Modal Pedestrian Re-Identification Yue Su et.al. 2410.20097 null
2024-10-01 A Robust Multisource Remote Sensing Image Matching Method Utilizing Attention and Feature Enhancement Against Noise Interference Yuan Li et.al. 2410.11848 null
2024-10-15 LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images Yuzhou Cheng et.al. 2410.11505 null
2024-10-12 Leveraging Semantic Cues from Foundation Vision Models for Enhanced Local Feature Correspondence Felipe Cadar et.al. 2410.09533 link
2024-09-27 Exploiting Motion Prior for Accurate Pose Estimation of Dashboard Cameras Yipeng Lu et.al. 2409.18673 null
2024-09-25 Game4Loc: A UAV Geo-Localization Benchmark from Game Data Yuxiang Ji et.al. 2409.16925 link
2024-09-24 Automatic Registration of SHG and H&E Images with Feature-based Initial Alignment and Intensity-based Instance Optimization: Contribution to the COMULIS Challenge Marek Wodzinski et.al. 2409.15931 null
2024-09-10 Weakly-supervised Camera Localization by Ground-to-satellite Image Registration Yujiao Shi et.al. 2409.06471 link
2024-09-05 Enabling Practical and Privacy-Preserving Image Processing Chao Wang et.al. 2409.03568 null
2024-09-20 A General Albedo Recovery Approach for Aerial Photogrammetric Images through Inverse Rendering Shuang Song et.al. 2409.03032 link
2024-08-29 Super-Resolution works for coastal simulations Zhi-Song Liu et.al. 2408.16553 null
2024-09-15 Mismatched: Evaluating the Limits of Image Matching Approaches and Benchmarks Sierra Bonilla et.al. 2408.16445 link
2024-08-26 Affine steerers for structured keypoint description Georg Bökman et.al. 2408.14186 link
2024-08-25 TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers Chuanrui Zhang et.al. 2408.13770 null
2024-09-11 Coarse-to-fine Alignment Makes Better Speech-image Retrieval Lifeng Zhou et.al. 2408.13119 null
2024-08-19 BrewCLIP: A Bifurcated Representation Learning Framework for Audio-Visual Retrieval Zhenyu Lu et.al. 2408.10383 null
2024-08-14 RSD-DOG : A New Image Descriptor based on Second Order Derivatives Darshan Venkatrayappa et.al. 2408.07687 null
2024-08-09 One Shot is Enough for Sequential Infrared Small Target Segmentation Bingbing Dan et.al. 2408.04823 link
2024-08-07 PRISM: PRogressive dependency maxImization for Scale-invariant image Matching Xudong Cai et.al. 2408.03598 null
2024-08-05 ConDL: Detector-Free Dense Image Matching Monika Kwiatkowski et.al. 2408.02766 null
2024-08-04 Improving Neural Surface Reconstruction with Feature Priors from Multi-View Image Xinlin Ren et.al. 2408.02079 link
2024-07-29 Image-text matching for large-scale book collections Artemis Llabrés et.al. 2407.19812 link
2024-07-26 PIV3CAMS: a multi-camera dataset for multiple computer vision problems and its application to novel view-point synthesis Sohyeong Kim et.al. 2407.18695 null
2024-07-22 RADA: Robust and Accurate Feature Learning with Domain Adaptation Jingtai He et.al. 2407.15791 null
2024-07-17 GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection Jingwen Yu et.al. 2407.11736 link
2024-07-16 REMM:Rotation-Equivariant Framework for End-to-End Multimodal Image Matching Han Nie et.al. 2407.11637 link
2024-07-16 A Self-Correcting Strategy of the Digital Volume Correlation Displacement Field Based on Image Matching: Application to Poor Speckles Quality and Complex-Large Deformation Chengsheng Li et.al. 2407.11287 null
2024-07-14 Raising the Ceiling: Conflict-Free Local Feature Matching with Dynamic View Switching Xiaoyong Lu et.al. 2407.07789 null
2024-07-10 Mutual Information calculation on different appearances Jiecheng Liao et.al. 2407.07410 null
2024-07-15 SfM on-the-fly: Get better 3D from What You Capture Zongqian Zhan et.al. 2407.03939 null
2024-07-03 IMC 2024 Methods & Solutions Review Shyam Gupta et.al. 2407.03172 null
2024-06-21 High Resolution Surface Reconstruction of Cultural Heritage Objects Using Shape from Polarization Method F. S. Mortazavi et.al. 2406.15121 null
2024-06-16 Light Up the Shadows: Enhance Long-Tailed Entity Grounding with Concept-Guided Vision-Language Models Yikai Zhang et.al. 2406.10902 link
2024-06-14 Grounding Image Matching in 3D with MASt3R Vincent Leroy et.al. 2406.09756 link

(back to top)

MutilModal

Publish Date Title Authors PDF Code
2025-07-23 See the Forest and the Trees: A Synergistic Reasoning Framework for Knowledge-Based Visual Question Answering Junjie Wang et.al. 2507.17659 null
2025-07-23 Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning Xinyao Liu et.al. 2507.17539 null
2025-07-23 ERMV: Editing 4D Robotic Multi-view images to enhance embodied agents Chang Nie et.al. 2507.17462 null
2025-07-23 HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs Zhaolin Cai et.al. 2507.17394 null
2025-07-23 A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model Zhe Xu et.al. 2507.17303 null
2025-07-23 Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation Zixuan Wang et.al. 2507.17204 null
2025-07-22 Toward Scalable Video Narration: A Training-free Approach Using Multimodal Large Language Models Tz-Ying Wu et.al. 2507.17050 null
2025-07-22 HOComp: Interaction-Aware Human-Object Composition Dong Liang et.al. 2507.16813 null
2025-07-22 Enhancing Remote Sensing Vision-Language Models Through MLLM and LLM-Based High-Quality Image-Text Dataset Generation Yiguo He et.al. 2507.16716 null
2025-07-22 Self-Contradiction as Self-Improvement: Mitigating the Generation-Understanding Gap in MLLMs Yujin Han et.al. 2507.16663 null
2025-07-22 Pixels to Principles: Probing Intuitive Physics Understanding in Multimodal Language Models Mohamad Ballout et.al. 2507.16572 null
2025-07-22 Spatial 3D-LLM: Exploring Spatial Awareness in 3D Vision-Language Models Xiaoyan Wang et.al. 2507.16524 null
2025-07-22 C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning Xiuwei Chen et.al. 2507.16518 null
2025-07-22 MONITRS: Multimodal Observations of Natural Incidents Through Remote Sensing Shreelekha Revankar et.al. 2507.16228 null
2025-07-21 True Multimodal In-Context Learning Needs Attention to the Visual Context Shuo Chen et.al. 2507.15807 null
2025-07-21 Surfacing Variations to Calibrate Perceived Reliability of MLLM-generated Image Descriptions Meng Chen et.al. 2507.15692 null
2025-07-21 Extracting Visual Facts from Intermediate Layers for Mitigating Hallucinations in Multimodal Large Language Models Haoran Zhou et.al. 2507.15652 null
2025-07-21 DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding Xiaoyi Bao et.al. 2507.15569 null
2025-07-21 FreeCus: Free Lunch Subject-driven Customization in Diffusion Transformers Yanbing Zhang et.al. 2507.15249 null
2025-07-20 Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction Ce Zhang et.al. 2507.15130 null
2025-07-20 Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression Roy H. Jennings et.al. 2507.14997 null
2025-07-20 Open-set Cross Modal Generalization via Multimodal Unified Representation Hai Huang et.al. 2507.14935 null
2025-07-20 U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs Xiaojie Li et.al. 2507.14902 null
2025-07-20 LeAdQA: LLM-Driven Context-Aware Temporal Grounding for Video Question Answering Xinxin Dong et.al. 2507.14784 null
2025-07-18 Moodifier: MLLM-Enhanced Emotion-Driven Image Editing Jiarong Ye et.al. 2507.14024 null
2025-07-17 "PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models Jing Gu et.al. 2507.13428 null
2025-07-17 Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark Junsu Kim et.al. 2507.13314 null
2025-07-17 Automating Steering for Safe Multimodal Large Language Models Lyucheng Wu et.al. 2507.13255 null
2025-07-17 Analysis of Image-and-Text Uncertainty Propagation in Multimodal Large Language Models with Cardiac MR-Based Applications Yucheng Tang et.al. 2507.12945 null
2025-07-17 AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning Yiming Ren et.al. 2507.12841 null
2025-07-17 MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval Jeong-Woo Park et.al. 2507.12819 null
2025-07-17 DeQA-Doc: Adapting DeQA-Score to Document Image Quality Assessment Junjie Gao et.al. 2507.12796 null
2025-07-17 City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning Penglei Sun et.al. 2507.12795 null
2025-07-16 InSight: AI Mobile Screening Tool for Multiple Eye Disease Detection using Multimodal Fusion Ananya Raghu et.al. 2507.12669 null
2025-07-16 Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models Gen Luo et.al. 2507.12566 null
2025-07-16 Mitigating Object Hallucinations via Sentence-Level Early Intervention Shangpin Peng et.al. 2507.12455 null
2025-07-16 Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker Rachna Saxena et.al. 2507.12378 null
2025-07-16 Can LLMs Find Fraudsters? Multi-level LLM Enhanced Graph Fraud Detection Tairan Huang et.al. 2507.11997 null
2025-07-16 Watch, Listen, Understand, Mislead: Tri-modal Adversarial Attacks on Short Videos for Content Appropriateness Evaluation Sahid Hossain Mustakim et.al. 2507.11968 null
2025-07-16 Hyperphantasia: A Benchmark for Evaluating the Mental Visualization Capabilities of Multimodal LLMs Mohammad Shahab Sepehri et.al. 2507.11932 null
2025-07-16 MNO : A Multi-modal Neural Operator for Parametric Nonlinear BVPs Vamshi C. Madala et.al. 2507.11870 null
2025-07-15 Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification Moises Andrade et.al. 2507.11662 null
2025-07-15 MapIQ: Benchmarking Multimodal Large Language Models for Map Question Answering Varun Srivastava et.al. 2507.11625 null
2025-07-15 NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation Models X. Feng et.al. 2507.11245 null
2025-07-15 How Far Have Medical Vision-Language Models Come? A Comprehensive Benchmarking Study Che Liu et.al. 2507.11200 null
2025-07-15 KptLLM++: Towards Generic Keypoint Comprehension with Large Language Model Jie Yang et.al. 2507.11102 null
2025-07-15 Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation Yanbo Wang et.al. 2507.11001 null
2025-07-14 Warehouse Spatial Question Answering with LLM Agent Hsiang-Wei Huang et.al. 2507.10778 null
2025-07-14 Vision Language Action Models in Robotic Manipulation: A Systematic Review Muhayy Ud Din et.al. 2507.10672 null
2025-07-14 Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI Jiangkai Wu et.al. 2507.10510 null
2025-07-16 Text-Visual Semantic Constrained AI-Generated Image Quality Assessment Qiang Li et.al. 2507.10432 null
2025-07-14 DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs Jiahe Zhao et.al. 2507.10302 null
2025-07-14 FaceLLM: A Multimodal Large Language Model for Face Understanding Hatef Otroshi Shahreza et.al. 2507.10300 null
2025-07-14 Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection Jinglun Li et.al. 2507.10225 null
2025-07-14 A Training-Free, Task-Agnostic Framework for Enhancing MLLM Performance on High-Resolution Images Jaeseong Lee et.al. 2507.10202 null
2025-07-14 FIX-CLIP: Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text Bingchao Wang et.al. 2507.10095 null
2025-07-14 ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism Zedong Liu et.al. 2507.10069 null
2025-07-14 The Man Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Model Agents Lixu Wang et.al. 2507.10016 null
2025-07-14 Deep Hidden Cognition Facilitates Reliable Chain-of-Thought Reasoning Zijun Chen et.al. 2507.10007 null
2025-07-11 ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way Rajarshi Roy et.al. 2507.08679 null
2025-07-11 Introspection of Thought Helps AI Agents Haoran Sun et.al. 2507.08664 null
2025-07-11 DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images Haoran Sun et.al. 2507.08648 null
2025-07-11 Visual Semantic Description Generation with MLLMs for Image-Text Matching Junyu Chen et.al. 2507.08590 null
2025-07-11 Advancing Multimodal LLMs by Large-Scale 3D Visual Instruction Dataset Generation Liu He et.al. 2507.08513 null
2025-07-14 Efficient Deployment of Vision-Language Models on Mobile Devices: A Case Study on OnePlus 13R Pablo Robin Guerrero et.al. 2507.08505 null
2025-07-11 Multi-modal Mutual-Guidance Conditional Prompt Learning for Vision-Language Models Shijun Yang et.al. 2507.08410 null
2025-07-11 MM-Gesture: Towards Precise Micro-Gesture Recognition through Multimodal Fusion Jihao Gu et.al. 2507.08344 null
2025-07-11 Improving MLLM's Document Image Machine Translation via Synchronously Self-reviewing Its OCR Proficiency Yupu Liang et.al. 2507.08309 null
2025-07-11 M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning Inclusion AI et.al. 2507.08306 null
2025-07-10 PyVision: Agentic Vision with Dynamic Tooling Shitian Zhao et.al. 2507.07998 null
2025-07-10 OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding JingLi Lin et.al. 2507.07984 null
2025-07-10 MIRA: A Novel Framework for Fusing Modalities in Medical RAG Jinhong Wang et.al. 2507.07902 null
2025-07-10 SpatialViz-Bench: Automatically Generated Spatial Visualization Reasoning Tasks for MLLMs Siting Wang et.al. 2507.07610 null
2025-07-10 Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation Yupu Liang et.al. 2507.07572 null
2025-07-11 StarDojo: Benchmarking Open-Ended Behaviors of Agentic Multimodal LLMs in Production-Living Simulations with Stardew Valley Weihao Tan et.al. 2507.07445 null
2025-07-10 Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning Jingjing Jiang et.al. 2507.07424 null
2025-07-09 Robust Multimodal Large Language Models Against Modality Conflict Zongmeng Zhang et.al. 2507.07151 null
2025-07-09 Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor Vatsal Agarwal et.al. 2507.07106 null
2025-07-09 Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs Yahan Yu et.al. 2507.06999 null
2025-07-09 Omni-Video: Democratizing Unified Video Understanding and Generation Zhiyu Tan et.al. 2507.06119 null
2025-07-08 Enhancing Synthetic CT from CBCT via Multimodal Fusion and End-To-End Registration Maximilian Tschuchnig et.al. 2507.06067 null
2025-07-08 BlueLM-2.5-3B Technical Report Baojiao Xiong et.al. 2507.05934 null
2025-07-08 From ID-based to ID-free: Rethinking ID Effectiveness in Multimodal Collaborative Filtering Recommendation Guohao Li et.al. 2507.05715 null
2025-07-08 MLlm-DR: Towards Explainable Depression Recognition with MultiModal Large Language Models Wei Zhang et.al. 2507.05591 null
2025-07-07 MindFlow: Revolutionizing E-commerce Customer Support with Multimodal LLM Agents Ming Gong et.al. 2507.05330 null
2025-07-07 Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing Chun-Hsiao Yeh et.al. 2507.05259 null
2025-07-07 Spatio-Temporal LLM: Reasoning about Environments and Actions Haozhen Zheng et.al. 2507.05258 null
2025-07-07 Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Yana Wei et.al. 2507.05255 null
2025-07-07 Differential Attention for Multimodal Crisis Event Analysis Nusrat Munia et.al. 2507.05165 null
2025-07-07 Robust Incomplete-Modality Alignment for Ophthalmic Disease Grading and Diagnosis via Labeled Optimal Transport Qinkai Yu et.al. 2507.04999 null
2025-07-07 ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation Chenchen Zhang et.al. 2507.04952 null
2025-07-07 ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding Jianjiang Yang et.al. 2507.04943 null
2025-07-07 HV-MMBench: Benchmarking MLLMs for Human-Centric Video Understanding Yuxuan Cai et.al. 2507.04909 null
2025-07-07 Spatial and Semantic Embedding Integration for Stereo Sound Event Localization and Detection in Regular Videos Davide Berghi et.al. 2507.04845 null
2025-07-07 From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection Zexi Jia et.al. 2507.04769 null
2025-07-03 Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation Jiaer Xia et.al. 2507.02859 null
2025-07-03 Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection Ziqi Miao et.al. 2507.02844 null
2025-07-03 Multimodal Mathematical Reasoning with Diverse Solving Perspective Wenhao Shi et.al. 2507.02804 null
2025-07-03 AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models Ziyin Zhou et.al. 2507.02664 null
2025-07-03 VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning Siran Chen et.al. 2507.02626 null
2025-07-03 AuroraLong: Bringing RNNs Back to Efficient Open-Ended Video Understanding Weili Xu et.al. 2507.02591 null
2025-07-03 LaCo: Efficient Layer-wise Compression of Visual Tokens for Multimodal Large Language Models Juntao Liu et.al. 2507.02279 null
2025-07-03 SurgVisAgent: Multimodal Agentic Model for Versatile Surgical Visual Enhancement Zeyu Lei et.al. 2507.02252 null
2025-07-02 Kwai Keye-VL Technical Report Kwai Keye Team et.al. 2507.01949 null
2025-07-02 Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning Qingdong He et.al. 2507.01908 null
2025-07-02 TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types Yuhao Lin et.al. 2507.01857 null
2025-07-02 Token Communication in the Era of Large Models: An Information Bottleneck-Based Approach Hao Wei et.al. 2507.01728 null
2025-07-02 AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness Zixin Chen et.al. 2507.01702 null
2025-07-02 SAILViT: Towards Robust and Generalizable Visual Backbones for MLLMs via Gradual Feature Refinement Weijie Yin et.al. 2507.01643 null
2025-07-02 SafePTR: Token-Level Jailbreak Defense in Multimodal LLMs via Prune-then-Restore Mechanism Beitao Chen et.al. 2507.01513 null
2025-07-02 AVC-DPO: Aligned Video Captioning via Direct Preference Optimization Jiyang Tang et.al. 2507.01492 null
2025-07-02 AI Agents and Agentic AI-Navigating a Plethora of Concepts for Future Manufacturing Yinwang Ren et.al. 2507.01376 null
2025-07-02 Dynamical Multimodal Fusion with Mixture-of-Experts for Localizations Bohao Wang et.al. 2507.01337 null
2025-07-01 Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives Sixun Dong et.al. 2506.24124 null
2025-06-30 DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World Xiangtai Li et.al. 2506.24102 null
2025-06-30 A Survey on Vision-Language-Action Models for Autonomous Driving Sicong Jiang et.al. 2506.24044 null
2025-07-01 Graft: Integrating the Domain Knowledge via Efficient Parameter Synergy for MLLMs Yang Dai et.al. 2506.23940 null
2025-06-30 VAP-Diffusion: Enriching Descriptions with MLLMs for Enhanced Medical Image Generation Peng Huang et.al. 2506.23641 null
2025-06-30 Unified Multimodal Understanding via Byte-Pair Visual Encoding Wanpeng Zhang et.al. 2506.23639 null
2025-06-30 PGOV3D: Open-Vocabulary 3D Semantic Segmentation with Partial-to-Global Curriculum Shiqi Zhang et.al. 2506.23607 null
2025-06-30 MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI Huanjin Yao et.al. 2506.23563 null
2025-06-30 Reinforcement Fine-Tuning Enables MLLMs Learning Novel Tasks Stably Zhihao Zhang et.al. 2506.23508 null
2025-06-30 Evaluation of Geolocation Capabilities of Multimodal Large Language Models and Analysis of Associated Privacy Risks Xian Zhang et.al. 2506.23481 null
2025-06-27 Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs Shaojie Zhang et.al. 2506.22139 null
2025-06-27 Towards Scalable and Robust White Matter Lesion Localization via Multimodal Deep Learning Julia Machnio et.al. 2506.22041 null
2025-06-27 R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning Biao Wang et.al. 2506.21980 null
2025-06-27 Grounding-Aware Token Pruning: Recovering from Drastic Performance Drops in Visual Grounding Caused by Pruning Tzu-Chun Chien et.al. 2506.21873 null
2025-06-27 DeepTalk: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE Hang Shao et.al. 2506.21864 null
2025-06-26 FOCUS: Internal MLLM Representations for Efficient Fine-Grained Visual Question Answering Liangyu Zhong et.al. 2506.21710 null
2025-06-26 APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization Minjie Hong et.al. 2506.21655 null
2025-06-26 Exploring the Design Space of 3D MLLMs for CT Report Generation Mohammed Baharoon et.al. 2506.21535 null
2025-06-26 TableMoE: Neuro-Symbolic Routing for Structured Expert Reasoning in Multimodal Table Understanding Junwen Zhang et.al. 2506.21393 null
2025-06-26 SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning Melanie Rieff et.al. 2506.21355 null
2025-06-27 FairyGen: Storied Cartoon Video from a Single Child-Drawn Character Jiayi Zheng et.al. 2506.21272 null
2025-06-26 Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents Tianyi Men et.al. 2506.21252 null
2025-06-26 Task-Aware KV Compression For Cost-Effective Long Video Understanding Minghao Qin et.al. 2506.21184 null
2025-06-26 OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography Caoshuo Li et.al. 2506.21101 null
2025-06-26 V2X-REALM: Vision-Language Model-Based Robust End-to-End Cooperative Autonomous Driving with Adaptive Long-Tail Modeling Junwei You et.al. 2506.21041 null
2025-06-26 Evidence-based diagnostic reasoning with multi-agent copilot for human pathology Chengkuan Chen et.al. 2506.20964 null
2025-06-26 E-FreeM2: Efficient Training-Free Multi-Scale and Cross-Modal News Verification via MLLMs Van-Hoang Phan et.al. 2506.20944 null
2025-06-25 UniCode $^2$ : Cascaded Large-scale Codebooks for Unified Multimodal Understanding and Generation Yanzhe Chen et.al. 2506.20214 null
2025-06-25 BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos Jiahao Lin et.al. 2506.20103 null
2025-06-24 MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection Zhengxiang Huang et.al. 2506.19884 null
2025-06-24 Multimodal large language models and physics visual tasks: comparative analysis of performance and costs Giulia Polverini et.al. 2506.19662 null
2025-06-24 Surgery-R1: Advancing Surgical-VQLA with Reasoning Multimodal Large Language Model via Reinforcement Learning Pengfei Hao et.al. 2506.19469 null
2025-06-24 Mem4Nav: Boosting Vision-and-Language Navigation in Urban Environments with a Hierarchical Spatial-Cognition Long-Short Memory System Lixuan He et.al. 2506.19433 null
2025-06-24 Memory-Augmented Incomplete Multimodal Survival Prediction via Cross-Slide and Gene-Attentive Hypergraph Learning Mingcheng Qu et.al. 2506.19324 null
2025-06-24 MSR-Align: Policy-Grounded Multimodal Alignment for Safety-Aware Reasoning in Vision-Language Models Yinan Xia et.al. 2506.19257 null
2025-06-24 Video-XL-2: Towards Very Long-Video Understanding Through Task-Aware KV Sparsification Minghao Qin et.al. 2506.19225 null
2025-06-24 MedErr-CT: A Visual Question Answering Benchmark for Identifying and Correcting Errors in CT Reports Sunggu Kyung et.al. 2506.19217 null
2025-06-23 MOSCARD -- Causal Reasoning and De-confounding for Multimodal Opportunistic Screening of Cardiovascular Adverse Events Jialu Pi et.al. 2506.19174 null
2025-06-23 Universal Video Temporal Grounding with Generative Multi-modal Large Language Models Zeqian Li et.al. 2506.18883 null
2025-06-23 TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting Zhongbin Guo et.al. 2506.18862 null
2025-06-23 SIM-Net: A Multimodal Fusion Network Using Inferred 3D Object Shape Point Clouds from RGB Images for 2D Classification Youcef Sklab et.al. 2506.18683 null
2025-06-24 Object-aware Sound Source Localization via Audio-Visual Scene Understanding Sung Jin Um et.al. 2506.18557 null
2025-06-23 MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis Yuting Zhang et.al. 2506.18512 null
2025-06-23 Generalizing Vision-Language Models to Novel Domains: A Comprehensive Survey Xinyao Li et.al. 2506.18504 null
2025-06-23 AViLA: Asynchronous Vision-Language Agent for Streaming Multimodal Data Interaction Gengyuan Zhang et.al. 2506.18472 null
2025-06-23 What You Think Is What You Get: Bridge User Intent and Transfer Function Design through Multimodal Large Language Models Yiyao Wang et.al. 2506.18407 null
2025-06-23 RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language Models Yeongtak Oh et.al. 2506.18369 null
2025-06-24 Multimodal Fusion SLAM with Fourier Attention Youjie Zhou et.al. 2506.18204 null
2025-06-20 MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models Xiaolong Wang et.al. 2506.17046 null
2025-06-20 MM-AttacKG: A Multimodal Approach to Attack Graph Construction with Large Language Models Yongheng Zhang et.al. 2506.16968 null
2025-06-20 Enhancing Step-by-Step and Verifiable Medical Reasoning in MLLMs Haoran Sun et.al. 2506.16962 link
2025-06-20 LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models Fanfei Li et.al. 2506.16950 null
2025-06-20 Multimodal Fused Learning for Solving the Generalized Traveling Salesman Problem in Robotic Task Planning Jiaqi Chen et.al. 2506.16931 null
2025-06-20 With Limited Data for Multimodal Alignment, Let the STRUCTURE Guide You Fabian Gröger et.al. 2506.16895 null
2025-06-20 IsoNet: Causal Analysis of Multimodal Transformers for Neuromuscular Gesture Classification Eion Tyacke et.al. 2506.16744 null
2025-06-19 How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering? Giuseppe Lando et.al. 2506.16450 null
2025-06-19 GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning Yi Chen et.al. 2506.16141 link
2025-06-18 Demystifying the Visual Quality Paradox in Multimodal Large Language Models Shuo Xing et.al. 2506.15645 null
2025-06-18 Creating User-steerable Projections with Interactive Semantic Mapping Artur André Oliveira et.al. 2506.15479 null
2025-06-18 Multimodal Large Language Models for Medical Report Generation via Customized Prompt Tuning Chunlei Li et.al. 2506.15477 null
2025-06-18 Understanding GUI Agent Localization Biases through Logit Sharpness Xingjian Tao et.al. 2506.15425 null
2025-06-18 MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering Xinqi Fan et.al. 2506.15298 null
2025-06-18 From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem Yanxu Mao et.al. 2506.15170 null
2025-06-17 ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM Yujun Wang et.al. 2506.14766 null
2025-06-17 Exploring MLLMs Perception of Network Visualization Principles Jacob Miller et.al. 2506.14611 null
2025-06-17 M2BeamLLM: Multimodal Sensing-empowered mmWave Beam Prediction with Large Language Models Can Zheng et.al. 2506.14532 null
2025-06-17 LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops Jiyuan Fu et.al. 2506.14493 null
2025-06-17 GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies Jingqi Yang et.al. 2506.14477 link
2025-06-17 Dense360: Dense Understanding from Omnidirectional Panoramas Yikang Zhou et.al. 2506.14471 null
2025-06-17 Vela: Scalable Embeddings with Voice Large Language Models for Multimodal Retrieval Ruofan Hu et.al. 2506.14445 null
2025-06-17 From Black Boxes to Transparent Minds: Evaluating and Enhancing the Theory of Mind in Multimodal Large Language Models Xinyang Li et.al. 2506.14224 null
2025-06-17 A multi-stage augmented multimodal interaction network for fish feeding intensity quantification Shulong Zhang et.al. 2506.14170 null
2025-06-17 SceneAware: Scene-Constrained Pedestrian Trajectory Prediction with LLM-Guided Walkability Juho Bai et.al. 2506.14144 null
2025-06-16 Discrete Diffusion in Large Language and Multimodal Models: A Survey Runpeng Yu et.al. 2506.13759 link
2025-06-16 TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning Junru Zhang et.al. 2506.13705 link
2025-06-16 DesignCoder: Hierarchy-Aware and Self-Correcting UI Code Generation with Large Language Models Yunnong Chen et.al. 2506.13663 null
2025-06-16 Omni-AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented for Efficient Long Video Understanding Zhucun Xue et.al. 2506.13589 null
2025-06-16 RealHiTBench: A Comprehensive Realistic Hierarchical Table Benchmark for Evaluating LLM-Based Table Analysis Pengzuo Wu et.al. 2506.13405 null
2025-06-16 VIS-Shepherd: Constructing Critic for LLM-based Data Visualization Generation Bo Pan et.al. 2506.13326 link
2025-06-16 ZINA: Multimodal Fine-grained Hallucination Detection and Editing Yuiga Wada et.al. 2506.13130 null
2025-06-16 Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning Haibo Qiu et.al. 2506.13056 null
2025-06-16 CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model Jiangtong Li et.al. 2506.13055 null
2025-06-15 SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language Models Xinyi Zhao et.al. 2506.12992 link
2025-06-16 VGR: Visual Grounded Reasoning Jiacong Wang et.al. 2506.11991 null
2025-06-13 Are Multimodal Large Language Models Pragmatically Competent Listeners in Simple Reference Resolution Tasks? Simeon Junker et.al. 2506.11807 null
2025-06-13 Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization Wenqi Liu et.al. 2506.11712 null
2025-06-13 Dynamic Mixture of Curriculum LoRA Experts for Continual Multimodal Instruction Tuning Chendi Ge et.al. 2506.11672 null
2025-06-13 VFaith: Do Large Multimodal Models Really Reason on Seen Images Rather than Previous Memories? Jiachen Yu et.al. 2506.11571 null
2025-06-13 DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs Bo-Cheng Chiu et.al. 2506.11558 null
2025-06-13 Investigating Vulnerabilities and Defenses Against Audio-Visual Attacks: A Comprehensive Survey Emphasizing Multimodal Models Jinming Wen et.al. 2506.11521 null
2025-06-13 Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs Xiao Xu et.al. 2506.11515 null
2025-06-13 Stop learning it all to mitigate visual hallucination, Focus on the hallucination target Dokyoon Yoon et.al. 2506.11417 null
2025-06-12 Combining Log Data and Collaborative Dialogue Features to Predict Project Quality in Middle School AI Education Conrad Borchers et.al. 2506.11326 null
2025-06-12 Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs Qizhe Zhang et.al. 2506.10967 link
2025-06-12 Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification? Fei Lin et.al. 2506.10912 null
2025-06-12 VideoDeepResearch: Long Video Understanding With Agentic Tool Using Huaying Yuan et.al. 2506.10821 link
2025-06-13 Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning Yuhao Zhou et.al. 2506.10521 null
2025-06-12 MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models Yu Huang et.al. 2506.10465 null
2025-06-12 Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts Guowei Zhong et.al. 2506.10452 link
2025-06-12 MF2Summ: Multimodal Fusion for Video Summarization with Temporal Alignment Shuo wang et.al. 2506.10430 null
2025-06-12 Can Sound Replace Vision in LLaVA With Token Substitution? Ali Vosoughi et.al. 2506.10416 null
2025-06-12 Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences? Yingjin Song et.al. 2506.10415 null
2025-06-12 Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series Ching Chang et.al. 2506.10412 null
2025-06-11 OctoNav: Towards Generalist Embodied Navigation Chen Gao et.al. 2506.09839 null
2025-06-11 MMME: A Spontaneous Multi-Modal Micro-Expression Dataset Enabling Visual-Physiological Fusion Chuang Maa et.al. 2506.09834 link
2025-06-11 Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning Yuting Li et.al. 2506.09736 link
2025-06-11 HSENet: Hybrid Spatial Encoding Network for 3D Medical Vision-Language Understanding Yanzhao Shi et.al. 2506.09634 null
2025-06-11 AD^2-Bench: A Hierarchical CoT Benchmark for MLLM in Autonomous Driving under Adverse Conditions Zhaoyang Wei et.al. 2506.09557 null
2025-06-10 BioLangFusion: Multimodal Fusion of DNA, mRNA, and Protein Language Models Amina Mollaysa et.al. 2506.08936 null
2025-06-10 What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities Wendong Bu et.al. 2506.08933 null
2025-06-10 Enhancing Synthetic CT from CBCT via Multimodal Fusion: A Study on the Impact of CBCT Quality and Alignment Maximilian Tschuchnig et.al. 2506.08716 null
2025-06-10 From Pixels to Graphs: using Scene and Knowledge Graphs for HD-EPIC VQA Challenge Agnese Taluzzi et.al. 2506.08553 null
2025-06-09 Serendipitous Recommendation with Multimodal LLM Haoting Wang et.al. 2506.08283 null
2025-06-09 Instruction-Tuned Video-Audio Models Elucidate Functional Specialization in the Brain Subba Reddy Oota et.al. 2506.08277 link
2025-06-09 GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior Penghao Wu et.al. 2506.08012 null
2025-06-09 Play to Generalize: Learning to Reason Through Game Play Yunfei Xie et.al. 2506.08011 link
2025-06-09 CyberV: Cybernetics for Test-time Scaling in Video Understanding Jiahao Meng et.al. 2506.07971 link
2025-06-09 SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence Ziyang Gong et.al. 2506.07966 link
2025-06-09 WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning Jie Yang et.al. 2506.07905 link
2025-06-09 PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement Teng Hu et.al. 2506.07848 null
2025-06-09 HAIBU-ReMUD: Reasoning Multimodal Ultrasound Dataset and Model Bridging to General Specific Domains Shijie Wang et.al. 2506.07837 link
2025-06-09 WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code Zhiyu Lin et.al. 2506.07818 link
2025-06-09 Evaluating Visual Mathematics in Multimodal LLMs: A Multilingual Benchmark Based on the Kangaroo Tests Arnau Igualde Sáez et.al. 2506.07418 null
2025-06-08 Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification Tianyi Bai et.al. 2506.07235 null
2025-06-06 DesignBench: A Comprehensive Benchmark for MLLM-based Front-end Code Generation Jingyu Xiao et.al. 2506.06251 link
2025-06-06 VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning Zikang Wang et.al. 2506.06097 null
2025-06-06 MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems? Zhitao He et.al. 2506.06034 null
2025-06-06 Object Navigation with Structure-Semantic Reasoning-Based Multi-level Map and Multimodal Decision-Making LLM Chongshang Yan et.al. 2506.05896 null
2025-06-06 Human-AI Alignment of Multimodal Large Language Models with Speech-Language Pathologists in Parent-Child Interactions Weiyan Shi et.al. 2506.05879 null
2025-06-09 Heartcare Suite: Multi-dimensional Understanding of ECG with Raw Multi-lead Signal Modeling Yihan Xie et.al. 2506.05831 null
2025-06-06 Pts3D-LLM: Studying the Impact of Token Structure for 3D Scene Understanding With Large Language Models Hugues Thomas et.al. 2506.05689 null
2025-06-05 MLLM-CL: Continual Learning for Multimodal Large Language Models Hongbo Zhao et.al. 2506.05453 null
2025-06-05 SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs Jiahui Wang et.al. 2506.05344 link
2025-06-05 AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs Lidong Lu et.al. 2506.05328 null
2025-06-05 EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World? Yuqian Yuan et.al. 2506.05287 null
2025-06-05 MokA: Multimodal Low-Rank Adaptation for MLLMs Yake Wei et.al. 2506.05191 null
2025-06-05 On the Comprehensibility of Multi-structured Financial Documents using LLMs and Pre-processing Tools Shivani Upadhyay et.al. 2506.05182 link
2025-06-05 The NTNU System at the S&I Challenge 2025 SLA Open Track Hong-Yun Lin et.al. 2506.05121 null
2025-06-05 FinMultiTime: A Four-Modal Bilingual Dataset for Financial Time-Series Analysis Wenyan Xu et.al. 2506.05019 link
2025-06-05 TextVidBench: A Benchmark for Long Video Scene Text Understanding Yangyang Zhong et.al. 2506.04983 null
2025-06-05 APVR: Hour-Level Long Video Understanding with Adaptive Pivot Visual Information Retrieval Hong Gao et.al. 2506.04953 null
2025-06-05 From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes Tianxu Wang et.al. 2506.04897 null
2025-06-04 Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning Shuang Chen et.al. 2506.04207 null
2025-06-04 MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos Kejian Zhu et.al. 2506.04141 null
2025-06-04 Multimodal Tabular Reasoning with Privileged Structured Information Jun-Peng Jiang et.al. 2506.04088 null
2025-06-04 Vision Remember: Alleviating Visual Forgetting in Efficient MLLM with Vision Feature Resample Ze Feng et.al. 2506.03928 null
2025-06-04 HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models Zhaolu Kang et.al. 2506.03922 link
2025-06-04 ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning Feng Han et.al. 2506.03596 link
2025-06-04 Resolving Task Objective Conflicts in Unified Multimodal Understanding and Generation via Task-Aware Mixture-of-Experts Jiaxing Zhang et.al. 2506.03591 null
2025-06-04 WIFE-Fusion:Wavelet-aware Intra-inter Frequency Enhancement for Multi-model Image Fusion Tianpei Zhang et.al. 2506.03555 null
2025-06-05 Geometric Visual Fusion Graph Neural Networks for Multi-Person Human-Object Interaction Recognition in Videos Tanqiu Qiao et.al. 2506.03440 null
2025-06-03 A Multimodal, Multilingual, and Multidimensional Pipeline for Fine-grained Crowdsourcing Earthquake Damage Evaluation Zihui Ma et.al. 2506.03360 link
2025-06-03 MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query Wei Chow et.al. 2506.03144 null
2025-06-03 AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation Lu Qiu et.al. 2506.03126 null
2025-06-03 Kernel-based Unsupervised Embedding Alignment for Enhanced Visual Representation in Vision-language Models Shizhan Gong et.al. 2506.02557 null
2025-06-03 VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning Hao Yan et.al. 2506.02537 null
2025-06-03 Minos: A Multimodal Evaluation Model for Bidirectional Generation Between Image and Text Junzhe Zhang et.al. 2506.02494 null
2025-06-02 From Street Views to Urban Science: Discovering Road Safety Factors with Multimodal Large Language Models Yihong Tang et.al. 2506.02242 null
2025-06-02 MLLMs Need 3D-Aware Representation Supervision for Scene Understanding Xiaohu Huang et.al. 2506.01946 null
2025-06-02 Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency Hongyu Li et.al. 2506.01908 link
2025-06-02 MoDA: Modulation Adapter for Fine-Grained Visual Grounding in Instructional MLLMs Wayner Barrios et.al. 2506.01850 null
2025-06-02 FaceCoT: A Benchmark Dataset for Face Anti-Spoofing with Chain-of-Thought Reasoning Honglu Zhang et.al. 2506.01783 null
2025-05-30 Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents Yaxin Luo et.al. 2505.24878 link
2025-05-30 MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning Yiqing Liang et.al. 2505.24871 null
2025-05-30 SiLVR: A Simple Language-based Video Reasoning Framework Ce Zhang et.al. 2505.24869 link
2025-05-30 FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation Junyu Luo et.al. 2505.24714 link
2025-05-30 Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors Duo Zheng et.al. 2505.24625 null
2025-05-30 Mixpert: Mitigating Multimodal Learning Conflicts with Efficient Mixture-of-Vision-Experts Xin He et.al. 2505.24541 null
2025-05-30 Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model Yuting Zhang et.al. 2505.24476 link
2025-05-30 SORCE: Small Object Retrieval in Complex Environments Chunxu Liu et.al. 2505.24441 link
2025-05-30 KEVER^2: Knowledge-Enhanced Visual Emotion Reasoning and Retrieval Fanhang Man et.al. 2505.24342 null
2025-06-02 MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM Bowen Dong et.al. 2505.24238 null
2025-05-29 Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought Yunze Man et.al. 2505.23766 null
2025-05-29 MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence Sihan Yang et.al. 2505.23764 null
2025-05-29 Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence Diankun Wu et.al. 2505.23747 null
2025-05-29 PixelThink: Towards Efficient Chain-of-Pixel Reasoning Song Wang et.al. 2505.23727 null
2025-05-29 VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos Tingyu Song et.al. 2505.23693 link
2025-05-29 Human Empathy as Encoder: AI-Assisted Depression Assessment in Special Education Boning Zhao et.al. 2505.23631 null
2025-05-29 A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis Shengyuan Liu et.al. 2505.23601 null
2025-05-29 MAPLE: A Mobile Assistant with Persistent Finite State Machines for Recovery Reasoning Linqiang Guo et.al. 2505.23596 null
2025-05-29 Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles Zifu Wang et.al. 2505.23590 link
2025-05-29 OmniEarth-Bench: Towards Holistic Evaluation of Earth's Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data Fengxiang Wang et.al. 2505.23522 null
2025-05-28 Spatial Knowledge Graph-Guided Multimodal Synthesis Yida Xue et.al. 2505.22633 null
2025-05-28 RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction Yuchi Wang et.al. 2505.22613 null
2025-05-28 Multi-MLLM Knowledge Distillation for Out-of-Context News Detection Yimeng Gu et.al. 2505.22517 null
2025-05-28 A Closer Look at Multimodal Representation Collapse Abhra Chaudhuri et.al. 2505.22483 null
2025-05-28 Fostering Video Reasoning via Next-Event Prediction Haonan Wang et.al. 2505.22457 null
2025-05-28 Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO Lai Wei et.al. 2505.22453 link
2025-05-28 Privacy-preserving Prompt Personalization in Federated Learning for Multimodal Large Language Models Sizai Hou et.al. 2505.22447 null
2025-05-28 Zooming from Context to Cue: Hierarchical Preference Optimization for Multi-Image MLLMs Xudong Li et.al. 2505.22396 null
2025-05-28 Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start Lai Wei et.al. 2505.22334 link
2025-05-28 CADReview: Automatically Reviewing CAD Programs with Error Detection and Correction Jiali Chen et.al. 2505.22304 null
2025-05-27 UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents Han Xiao et.al. 2505.21496 link
2025-05-27 Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment Xiaojun Jia et.al. 2505.21494 link
2025-05-27 Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO Muzhi Zhu et.al. 2505.21457 null
2025-05-27 AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs Xuanwen Ding et.al. 2505.21389 link
2025-05-27 Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning? Junhao Cheng et.al. 2505.21374 link
2025-05-27 MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios Yang Shi et.al. 2505.21333 null
2025-05-27 MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs Jiakang Yuan et.al. 2505.21327 null
2025-05-27 SOLIDGEO: Measuring Multimodal Spatial Math Reasoning in Solid Geometry Peijie Wang et.al. 2505.21177 null
2025-05-27 IKMo: Image-Keyframed Motion Generation with Trajectory-Pose Conditioned Motion Diffusion Model Yang Zhao et.al. 2505.21146 null
2025-05-27 Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts Yue Zhang et.al. 2505.21079 null
2025-05-27 MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents Ziming Wei et.al. 2505.20148 link
2025-05-26 FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities Jin Wang et.al. 2505.20147 null
2025-05-26 Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion Zheqi Lv et.al. 2505.20053 link
2025-05-26 Correlating instruction-tuning (in multimodal models) with vision-language processing (in the brain) Subba Reddy Oota et.al. 2505.20029 link
2025-05-26 ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving Xueyi Liu et.al. 2505.20024 link
2025-05-26 NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID Shihao Li et.al. 2505.20001 null
2025-05-27 Dynamic-I2V: Exploring Image-to-Video Generation Models via Multimodal LLM Peng Liu et.al. 2505.19901 null
2025-05-26 Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging Yongxian Wei et.al. 2505.19892 link
2025-05-26 Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought Chao Huang et.al. 2505.19877 link
2025-05-26 Efficient Multi-modal Long Context Learning for Training-free Adaptation Zehong Ma et.al. 2505.19812 link
2025-05-23 Few-Shot Learning from Gigapixel Images via Hierarchical Vision-Language Alignment and Modeling Bryan Wong et.al. 2505.17982 null
2025-05-23 T2I-Eval-R1: Reinforcement Learning-Driven Reasoning for Interpretable Text-to-Image Evaluation Zi-Ao Ma et.al. 2505.17897 null
2025-05-23 Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities Ziwei Zhou et.al. 2505.17862 link
2025-05-23 Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM Donghwan Chi et.al. 2505.17726 null
2025-05-23 HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning Chuhao Zhou et.al. 2505.17645 null
2025-05-23 RoHyDR: Robust Hybrid Diffusion Recovery for Incomplete Multimodal Emotion Recognition Yuehan Jin et.al. 2505.17501 null
2025-05-23 The Coherence Trap: When MLLM-Crafted Narratives Exploit Manipulated Visual Contexts Yuchen Zhang et.al. 2505.17476 null
2025-05-23 FinRAGBench-V: A Benchmark for Multimodal RAG with Visual Citation in the Financial Domain Suifeng Zhao et.al. 2505.17471 null
2025-05-23 FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow Haoyu Sun et.al. 2505.17399 link
2025-05-23 Chart-to-Experience: Benchmarking Multimodal LLMs for Predicting Experiential Impact of Charts Seon Gyeom Kim et.al. 2505.17374 null
2025-05-22 GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning Chengqi Duan et.al. 2505.17022 link
2025-05-22 Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework Chenhao Zhang et.al. 2505.17019 link
2025-05-22 SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward Kaixuan Fan et.al. 2505.17018 link
2025-05-22 Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models Runsen Xu et.al. 2505.17015 null
2025-05-22 SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding Haoning Wu et.al. 2505.17012 link
2025-05-22 LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning Zebin You et.al. 2505.16933 null
2025-05-22 Backdoor Cleaning without External Guidance in MLLM Fine-tuning Xuankun Rong et.al. 2505.16916 link
2025-05-22 GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent Bin Xie et.al. 2505.16827 link
2025-05-22 Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs Zeping Yu et.al. 2505.16703 null
2025-05-22 R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO Huanjin Yao et.al. 2505.16673 link
2025-05-20 UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation Rui Tian et.al. 2505.14682 null
2025-05-20 Representation Learning for Semantic Alignment of Language, Audio, and Visual Modalities Parthasaarathy Sudarsanam et.al. 2505.14562 null
2025-05-20 ModRWKV: Transformer Multimodality in Linear Time Jiale Kang et.al. 2505.14505 link
2025-05-20 Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents Pengzhou Cheng et.al. 2505.14418 null
2025-05-20 ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations Xuecheng Wu et.al. 2505.14404 null
2025-05-20 TF-Mamba: Text-enhanced Fusion Mamba with Missing Modalities for Robust Multimodal Sentiment Analysis Xiang Li et.al. 2505.14329 link
2025-05-20 Speculative Decoding Reimagined for Multimodal Large Language Models Luxi Lin et.al. 2505.14260 link
2025-05-20 UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning Sule Bai et.al. 2505.14231 null
2025-05-20 Towards Omnidirectional Reasoning with 360-R1: A Dataset, Benchmark, and GRPO-based Method Xinshen Zhang et.al. 2505.14197 null
2025-05-20 Texts or Images? A Fine-grained Analysis on the Effectiveness of Input Representations and Models for Table Question Answering Wei Zhou et.al. 2505.14131 null
2025-05-19 MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision Lingxiao Du et.al. 2505.13427 link
2025-05-19 FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning Zhuozhao Hu et.al. 2505.13419 link
2025-05-19 MR. Judge: Multimodal Reasoner as a Judge Renjie Pi et.al. 2505.13403 null
2025-05-19 MultiActor-Audiobook: Zero-Shot Audiobook Generation with Faces and Voices of Multiple Speakers Kyeongman Park et.al. 2505.13082 null
2025-05-19 Walking the Tightrope: Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning Xiaoyu Yang et.al. 2505.13081 null
2025-05-19 Advancing Sequential Numerical Prediction in Autoregressive Models Xiang Fei et.al. 2505.13077 link
2025-05-19 FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models Hengxing Cai et.al. 2505.12835 link
2025-05-19 Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering Jianfeng Cai et.al. 2505.12826 null
2025-05-19 Correspondence of high-dimensional emotion structures elicited by video clips between humans and Multimodal LLMs Haruka Asanuma et.al. 2505.12746 null
2025-05-19 Shadow-FT: Tuning Instruct via Base Taiqiang Wu et.al. 2505.12716 link
2025-05-16 GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art Chenkai Zhang et.al. 2505.11436 link
2025-05-16 Visual Planning: Let's Think Only with Images Yi Xu et.al. 2505.11409 link
2025-05-16 EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models Bohao Xing et.al. 2505.11405 link
2025-05-19 TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs Pengju Xu et.al. 2505.11275 link
2025-05-16 A Step towards Interpretable Multimodal AI Models with MultiFIX Mafalda Malafaia et.al. 2505.11262 null
2025-05-16 CompAlign: Improving Compositional Text-to-Image Generation with a Complex Benchmark and Fine-Grained Feedback Yixin Wan et.al. 2505.11178 null
2025-05-16 Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans Yansheng Qiu et.al. 2505.11141 null
2025-05-16 WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild? An-Lan Wang et.al. 2505.11015 null
2025-05-16 ToDMA: Large Model-Driven Token-Domain Multiple Access for Semantic Communications Li Qiao et.al. 2505.10946 null
2025-05-16 VISTA: Enhancing Vision-Text Alignment in MLLMs via Cross-Modal Mutual Information Maximization Mingxiao Li et.al. 2505.10917 null
2025-05-15 Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis Pengfei Wang et.al. 2505.10541 link
2025-05-15 Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence Xiang He et.al. 2505.10176 link
2025-05-15 Why 1 + 1 < 1 in Visual Token Pruning: Beyond Naive Integration via Multi-Objective Balanced Covering Yangfu Li et.al. 2505.10118 null
2025-05-15 CartoAgent: a multimodal large language model-powered multi-agent cartographic framework for map style transfer and evaluation Chenglong Wang et.al. 2505.09936 null
2025-05-15 UICopilot: Automating UI Synthesis via Hierarchical Code Generation from Webpage Designs Yi Gui et.al. 2505.09904 link
2025-05-14 A Multimodal Multi-Agent Framework for Radiology Report Generation Ziruo Yi et.al. 2505.09787 null
2025-05-14 FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models Hongyang Wang et.al. 2505.09415 null
2025-05-14 Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping Yinuo Wang et.al. 2505.09252 link
2025-05-14 AMSnet 2.0: A Large AMS Database with AI Segmentation for Net Detection Yichen Shi et.al. 2505.09155 null
2025-05-13 Multimodal Fusion of Glucose Monitoring and Food Imagery for Caloric Content Prediction Adarsh Kumar et.al. 2505.09018 null
2025-05-14 Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology Yatai Ji et.al. 2505.08765 null
2025-05-12 Visually Interpretable Subtask Reasoning for Visual Question Answering Yu Cheng et.al. 2505.08084 null
2025-05-12 MilChat: Introducing Chain of Thought Reasoning and GRPO to a Multimodal Small Language Model for Remote Sensing Aybora Koksal et.al. 2505.07984 null
2025-05-12 Multimodal Assessment of Classroom Discourse Quality: A Text-Centered Attention-Based Multi-Task Learning Approach Ruikun Hou et.al. 2505.07902 null
2025-05-12 Multimodal Survival Modeling in the Age of Foundation Models Steven Song et.al. 2505.07683 link
2025-05-12 Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning Xiaokun Wang et.al. 2505.07263 null
2025-05-11 DriveSOTIF: Advancing Perception SOTIF Through Multimodal Large Language Models Shucheng Huang et.al. 2505.07084 link
2025-05-11 ParaView-MCP: An Autonomous Visualization Agent with Direct Tool Use Shusen Liu et.al. 2505.07064 null
2025-05-11 MELLM: Exploring LLM-Powered Micro-Expression Understanding Enhanced by Subtle Motion Perception Zhengye Zhang et.al. 2505.07007 link
2025-05-11 Visual Evolutionary Optimization on Combinatorial Problems with Multimodal Large Language Models: A Case Study of Influence Maximization Jie Zhao et.al. 2505.06850 null
2025-05-11 Visual Instruction Tuning with Chain of Region-of-Interest Yixin Chen et.al. 2505.06840 null
2025-05-09 Is your multimodal large language model a good science tutor? Ming Liu et.al. 2505.06418 null
2025-05-09 NSF-MAP: Neurosymbolic Multimodal Fusion for Robust and Interpretable Anomaly Prediction in Assembly Pipelines Chathurangi Shyalika et.al. 2505.06333 link
2025-05-09 MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills Niladri Shekhar Dutt et.al. 2505.06176 null
2025-05-09 The Application of Deep Learning for Lymph Node Segmentation: A Systematic Review Jingguo Qu et.al. 2505.06118 null
2025-05-09 ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding Shuai Wang et.al. 2505.06020 null
2025-05-09 BMMDetect: A Multimodal Deep Learning Framework for Comprehensive Biomedical Misconduct Detection Yize Zhou et.al. 2505.05763 null
2025-05-08 Fine-Tuning Video-Text Contrastive Model for Primate Behavior Retrieval from Unlabeled Raw Videos Giulio Cesare Mastrocinque Santo et.al. 2505.05681 null
2025-05-08 Looking Beyond Language Priors: Enhancing Visual Comprehension and Attention in Multimodal Models Aarti Ghatkesar et.al. 2505.05626 null
2025-05-08 Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding Han Xiao et.al. 2505.05446 link
2025-05-09 EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation Biao Yi et.al. 2505.05440 null
2025-05-08 Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization Sooyoung Park et.al. 2505.05343 link
2025-05-08 PADriver: Towards Personalized Autonomous Driving Genghua Kou et.al. 2505.05240 null
2025-05-08 X-Driver: Explainable Autonomous Driving with Vision-Language Models Wei Liu et.al. 2505.05098 null
2025-05-08 Learning Item Representations Directly from Multimodal Features for Effective Recommendation Xin Zhou et.al. 2505.04960 link
2025-05-07 EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning Zhenghao Xing et.al. 2505.04623 link
2025-05-07 On Path to Multimodal Generalist: General-Level and General-Bench Hao Fei et.al. 2505.04620 null
2025-05-07 M2Rec: Multi-scale Mamba for Efficient Sequential Recommendation Qianru Zhang et.al. 2505.04445 null
2025-05-06 VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model Zuwei Long et.al. 2505.03739 link
2025-05-06 Multi-Agent System for Comprehensive Soccer Understanding Jiayuan Rao et.al. 2505.03735 null
2025-05-06 RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration Huajie Tan et.al. 2505.03673 link
2025-05-06 ReGraP-LLaVA: Reasoning enabled Graph-based Personalized Large Language and Vision Assistant Yifan Xiang et.al. 2505.03654 link
2025-05-06 LogisticsVLN: Vision-Language Navigation For Low-Altitude Terminal Delivery Based on Agentic UAVs Xinyuan Zhang et.al. 2505.03460 null
2025-05-06 Reinforced Correlation Between Vision and Language for Precise Medical AI Assistant Haonan Wang et.al. 2505.03380 null
2025-05-05 R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning Yi-Fan Zhang et.al. 2505.02835 link
2025-05-06 MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation Mingcheng Li et.al. 2505.02648 null
2025-05-05 SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning Jinpeng Chen et.al. 2505.02486 link
2025-05-07 Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction Inclusion AI et.al. 2505.02471 link
2025-05-05 Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection Sungheon Jeong et.al. 2505.02393 link
2025-05-04 Retrieval-augmented in-context learning for multimodal large language models in disease classification Zaifu Zhan et.al. 2505.02087 null
2025-05-06 RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video Shuhang Xun et.al. 2505.02064 link
2025-05-04 R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation Meng-Hao Guo et.al. 2505.02018 null
2025-05-04 MLLM-Enhanced Face Forgery Detection: A Vision-Language Fusion Solution Siran Peng et.al. 2505.02013 null
2025-05-02 VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos Zongxia Li et.al. 2505.01481 link
2025-05-02 FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors Chenxi Li et.al. 2505.01322 null
2025-05-02 Multimodal Transformers are Hierarchical Modal-wise Heterogeneous Graphs Yijie Jin et.al. 2505.01068 null
2025-05-02 Efficient Vocabulary-Free Fine-Grained Visual Recognition in the Age of Multimodal LLMs Hari Chandana Kuchibhotla et.al. 2505.01064 null
2025-05-01 Multi-Modal Language Models as Text-to-Image Model Evaluators Jiahui Chen et.al. 2505.00759 null
2025-05-01 InstructAttribute: Fine-grained Object Attributes editing with Instruction Xingxi Yin et.al. 2505.00751 null
2025-05-01 A Methodological and Structural Review of Parkinsons Disease Detection Across Diverse Data Modalities Abu Saleh Musa Miah et.al. 2505.00525 null
2025-05-01 Toward Automated Regulatory Decision-Making: Trustworthy Medical Device Risk Classification with Multimodal Transformers and Self-Training Yu Han et.al. 2505.00422 null
2025-04-30 Audo-Sight: Enabling Ambient Interaction For Blind And Visually Impaired Individuals Bhanuja Ainary et.al. 2505.00153 null
2025-04-30 GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling Siqi Li et.al. 2505.00063 null
2025-04-30 COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning Xindi Wu et.al. 2504.21850 null
2025-04-30 Visual Text Processing: A Comprehensive Review and Unified Evaluation Yan Shu et.al. 2504.21682 link
2025-04-30 Rethinking Visual Layer Selection in Multimodal LLMs Haoran Chen et.al. 2504.21447 null
2025-04-30 SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding Chenkai Zhang et.al. 2504.21435 link
2025-04-30 Nexus-Gen: A Unified Model for Image Understanding, Generation, and Editing Hong Zhang et.al. 2504.21356 link
2025-04-30 UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation Linshan Wu et.al. 2504.21336 link
2025-04-30 Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models Guanghao Zhou et.al. 2504.21277 null
2025-04-29 ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification Ziqing Fan et.al. 2504.20930 link
2025-04-29 AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation Jeongsoo Choi et.al. 2504.20629 null
2025-04-29 A Summary on GUI Agents with Foundation Models Enhanced by Reinforcement Learning Jiahao Li et.al. 2504.20464 null
2025-04-29 APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech Zhicheng Lian et.al. 2504.20447 null
2025-04-29 MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation Amaan Izhar et.al. 2504.20343 link
2025-04-28 A Transformer-based Multimodal Fusion Model for Efficient Crowd Counting Using Visual and Wireless Signals Zhe Cui et.al. 2504.20178 null
2025-04-28 CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback Chenhan Jiang et.al. 2504.19860 null
2025-04-28 SRMF: A Data Augmentation and Multimodal Fusion Approach for Long-Tail UHR Satellite Image Segmentation Yulong Guo et.al. 2504.19839 null
2025-04-28 DEEMO: De-identity Multimodal Emotion Recognition and Reasoning Deng Li et.al. 2504.19549 null
2025-04-28 LR-IAD:Mask-Free Industrial Anomaly Detection with Logical Reasoning Peijian Zeng et.al. 2504.19524 null
2025-04-26 Deep Learning-Based Multi-Modal Fusion for Robust Robot Perception and Navigation Delun Lai et.al. 2504.19002 null
2025-04-26 Advancing Face-to-Face Emotion Communication: A Multimodal Dataset (AFFEC) Meisam J. Sekiavandi et.al. 2504.18969 link
2025-04-26 Feature Fusion Revisited: Multimodal CTR Prediction for MMCTR Challenge Junjie Zhou et.al. 2504.18961 link
2025-04-25 Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization Kesen Zhao et.al. 2504.18397 link
2025-04-25 ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding Yi-Xing Peng et.al. 2504.18152 null
2025-04-25 DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models Jianyu Liu et.al. 2504.18053 link
2025-04-27 Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models Xu Ma et.al. 2504.17789 null
2025-04-24 Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs Tiancheng Gu et.al. 2504.17432 null
2025-04-25 TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation Ling You et.al. 2504.17365 null
2025-04-24 V $^2$ R-Bench: Holistically Evaluating LVLM Robustness to Fundamental Visual Variations Zhiyuan Fan et.al. 2504.16727 null
2025-04-24 Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark Hanlei Zhang et.al. 2504.16427 link
2025-04-23 EEmo-Bench: A Benchmark for Multi-modal Large Language Models on Image Evoked Emotion Assessment Lancheng Gao et.al. 2504.16405 null
2025-04-22 Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs Merve Cerit et.al. 2504.16323 link
2025-04-21 Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends Mohammad Abu Tami et.al. 2504.16134 null
2025-04-22 TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving Daocheng Fu et.al. 2504.15780 null
2025-04-22 FaceInsight: A Multimodal Large Language Model for Face Perception Jingzhi Li et.al. 2504.15624 null
2025-04-22 AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization Jinda Lu et.al. 2504.15619 null
2025-04-21 IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs David Ma et.al. 2504.15415 link
2025-04-21 Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs Chun-Hsiao Yeh et.al. 2504.15280 link
2025-04-21 VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Weiye Xu et.al. 2504.15279 null
2025-04-21 A Call for New Recipes to Enhance Spatial Reasoning in MLLMs Huanyu Zhang et.al. 2504.15037 null
2025-04-21 IoT-AMLHP: Aligned Multimodal Learning of Header-Payload Representations for Resource-Efficient Malicious IoT Traffic Classification Fengyuan Nie et.al. 2504.14833 null
2025-04-20 Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens Kaihang Pan et.al. 2504.14666 null
2025-04-20 Relation-R1: Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relational Comprehension Lin Li et.al. 2504.14642 null
2025-04-20 Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction Wenke Xia et.al. 2504.14588 link
2025-04-19 Towards Explainable Fake Image Detection with Multi-Modal Large Language Models Yikun Ji et.al. 2504.14245 link
2025-04-19 InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners Yuhang Liu et.al. 2504.14239 link
2025-04-18 Scaling LLaNA: Advancing NeRF-Language Understanding Through Large-Scale Training Andrea Amaduzzi et.al. 2504.13995 null
2025-04-18 Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing Joowon Kim et.al. 2504.13490 null
2025-04-17 SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs Haoxuan Li et.al. 2504.13172 null
2025-04-17 Hadamard product in deep learning: Introduction, Advances and Challenges Grigorios G Chrysos et.al. 2504.13112 null
2025-04-17 EventVAD: Training-Free Event-Aware Video Anomaly Detection Yihua Shao et.al. 2504.13092 null
2025-04-18 SkyReels-V2: Infinite-length Film Generative Model Guibin Chen et.al. 2504.13074 link
2025-04-17 ChatEXAONEPath: An Expert-level Multimodal Large Language Model for Histopathology Using Whole Slide Images Sangwook Kim et.al. 2504.13023 null
2025-04-17 EarthGPT-X: Enabling MLLMs to Flexibly and Comprehensively Understand Multi-Source Remote Sensing Imagery Wei Zhang et.al. 2504.12795 null
2025-04-17 Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration Yicheng Pan et.al. 2504.12773 link
2025-04-17 SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding Qianqian Sun et.al. 2504.12704 null
2025-04-17 GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning Liangyu Xu et.al. 2504.12597 null
2025-04-16 Multimodal LLM Augmented Reasoning for Interpretable Visual Perception Analysis Shravan Chaudhari et.al. 2504.12511 null
2025-04-16 Towards Explainable Fusion and Balanced Learning in Multimodal Sentiment Analysis Miaosen Luo et.al. 2504.12151 null
2025-04-16 Instruction-augmented Multimodal Alignment for Image-Text and Element Matching Xinli Yue et.al. 2504.12018 null
2025-04-16 AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection Yuhao Chao et.al. 2504.11914 null
2025-04-16 Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation Julia Kreutzer et.al. 2504.11829 null
2025-04-15 DeepMLF: Multimodal language model with learnable tokens for deep fusion in sentiment analysis Efthymios Georgiou et.al. 2504.11082 null
2025-04-15 Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Human-like Audiobook Generation Yan Rong et.al. 2504.11002 null
2025-04-14 CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates Ankit Kumar Shaw et.al. 2504.10738 null
2025-04-14 Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization Darryl Hannan et.al. 2504.10727 null
2025-04-14 Relation-Rich Visual Document Generator for Visual Information Extraction Zi-Han Jiang et.al. 2504.10659 link
2025-04-15 InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Jinguo Zhu et.al. 2504.10479 link
2025-04-14 Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding Tao Zhang et.al. 2504.10465 link
2025-04-14 The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer Weixian Lei et.al. 2504.10462 link
2025-04-14 FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos Rui Chen et.al. 2504.10358 null
2025-04-14 CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation Junchen Fu et.al. 2504.10307 link
2025-04-14 PRM-BAS: Enhancing Multimodal Reasoning through PRM-guided Beam Annealing Search Pengfei Hu et.al. 2504.10222 null
2025-04-14 The Future of MLLM Prompting is Adaptive: A Comprehensive Experimental Evaluation of Prompt Engineering Methods for Robust Multimodal Performance Anwesha Mohanty et.al. 2504.10179 null
2025-04-14 COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts Jiansheng Li et.al. 2504.10158 null
2025-04-14 CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography I-Sheng Fang et.al. 2504.10090 null
2025-04-15 MMKB-RAG: A Multi-Modal Knowledge-Based Retrieval-Augmented Generation Framework Zihan Ling et.al. 2504.10074 null
2025-04-11 Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images Boyang Deng et.al. 2504.08727 null
2025-04-10 POEM: Precise Object-level Editing via MLLM control Marco Schouten et.al. 2504.08111 null
2025-04-10 GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation Lang Lin et.al. 2504.07962 null
2025-04-10 MM-IFEngine: Towards Multimodal Instruction Following Shengyuan Ding et.al. 2504.07957 link
2025-04-10 Perception-R1: Pioneering Perception Policy with Reinforcement Learning En Yu et.al. 2504.07954 link
2025-04-10 MARS: a Multimodal Alignment and Ranking System for Few-Shot Segmentation Nico Catalano et.al. 2504.07942 null
2025-04-10 VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding Henghao Zhao et.al. 2504.07519 null
2025-04-10 How Can Objects Help Video-Language Understanding? Zitian Tang et.al. 2504.07454 null
2025-04-10 Routing to the Right Expertise: A Trustworthy Judge for Instruction-based Image Editing Chenxi Sun et.al. 2504.07424 null
2025-04-10 Leveraging LLMs for Multimodal Retrieval-Augmented Radiology Report Generation via Key Phrase Extraction Kyoyun Choi et.al. 2504.07415 null
2025-04-09 Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning Ashutosh Chaubey et.al. 2504.07198 null
2025-04-10 VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning Xinhao Li et.al. 2504.06958 null
2025-04-09 MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking Chang Nie et.al. 2504.06863 null
2025-04-09 Integrating Cognitive Processing Signals into Language Models: A Review of Advances, Applications and Future Directions Angela Lopez-Cardona et.al. 2504.06843 null
2025-04-09 Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception Ruotian Peng et.al. 2504.06666 null
2025-04-09 Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program Minghe Gao et.al. 2504.06606 link
2025-04-08 Mind the Gap: Evaluating Vision Systems in Small Data Applications Samuel Stevens et.al. 2504.06486 link
2025-04-08 Transfer between Modalities with MetaQueries Xichen Pan et.al. 2504.06256 null
2025-04-08 V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models Xiangxi Zheng et.al. 2504.06148 link
2025-04-08 MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models Pengfei Zhou et.al. 2504.05782 link
2025-04-08 On the Suitability of Reinforcement Fine-Tuning to Visual Tasks Xiaxu Chen et.al. 2504.05682 null
2025-04-07 URECA: Unique Region Caption Anything Sangbeom Lim et.al. 2504.05305 null
2025-04-07 LiveVQA: Live Visual Knowledge Seeking Mingyang Fu et.al. 2504.05288 null
2025-04-07 Explaining Low Perception Model Competency with High-Competency Counterfactuals Sara Pohland et.al. 2504.05254 null
2025-04-07 Towards Visual Text Grounding of Multimodal Large Language Model Ming Li et.al. 2504.04974 null
2025-04-07 Video-Bench: Human-Aligned Video Generation Benchmark Hui Han et.al. 2504.04907 null
2025-04-07 OrderChain: A General Prompting Paradigm to Improve Ordinal Understanding Ability of MLLM Jinhong Wang et.al. 2504.04801 null
2025-04-07 OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance Chaoyi Wang et.al. 2504.04781 null
2025-04-07 Enhancing Compositional Reasoning in Vision-Language Models with Synthetic Preference Data Samarth Mishra et.al. 2504.04740 link
2025-04-07 LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts Yimu Wang et.al. 2504.04653 null
2025-04-06 Advancing Egocentric Video Question Answering with Multimodal Large Language Models Alkesh Patel et.al. 2504.04550 null
2025-04-04 MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models Wulin Xie et.al. 2504.03641 null
2025-04-03 Hummus: A Dataset of Humorous Multimodal Metaphor Use Xiaoyu Tong et.al. 2504.02983 link
2025-04-03 Enhancing Chart-to-Code Generation in Multimodal Large Language Models via Iterative Dual Preference Learning Zhihan Zhang et.al. 2504.02906 link
2025-04-03 Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision Xiaofeng Han et.al. 2504.02477 null
2025-04-03 The Plot Thickens: Quantitative Part-by-Part Exploration of MLLM Visualization Literacy Matheus Valentim et.al. 2504.02217 null
2025-04-03 ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement Runhui Huang et.al. 2504.01934 null
2025-04-02 Spatial-R1: Enhancing MLLMs in Video Spatial Reasoning Kun Ouyang et.al. 2504.01805 link
2025-04-02 PiCo: Jailbreaking Multimodal Large Language Models via $\textbf{Pi}$ctorial $\textbf{Co}$ de Contextualization Aofan Liu et.al. 2504.01444 null
2025-04-02 Slow-Fast Architecture for Video Multi-Modal Large Language Models Min Shi et.al. 2504.01328 link
2025-04-01 AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction Junhao Cheng et.al. 2504.01014 link
2025-04-01 IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval Bangwei Liu et.al. 2504.00954 null
2025-04-02 Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning Ram Ramrakhya et.al. 2504.00907 null
2025-04-01 Improved Visual-Spatial Reasoning via R1-Zero-Like Training Zhenyi Liao et.al. 2504.00883 null
2025-04-01 Context-Aware Human Behavior Prediction Using Multimodal Large Language Models: Challenges and Insights Yuchen Liu et.al. 2504.00839 null
2025-04-01 QG-VTC: Question-Guided Visual Token Compression in MLLMs for Efficient VQA Shuai Li et.al. 2504.00654 null
2025-03-31 Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Shengqiong Wu et.al. 2503.24379 null
2025-03-31 Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 Yi Chen et.al. 2503.24376 link
2025-03-31 H2VU-Benchmark: A Comprehensive Benchmark for Hierarchical Holistic Video Understanding Qi Wu et.al. 2503.24008 null
2025-03-31 BeMERC: Behavior-Aware MLLM-based Framework for Multimodal Emotion Recognition in Conversation Yumeng Fu et.al. 2503.23990 null
2025-03-31 Boosting MLLM Reasoning with Text-Debiased Hint-GRPO Qihan Huang et.al. 2503.23905 null
2025-04-01 Evaluating small vision-language models as AI assistants for radio astronomical source analysis tasks S. Riggi et.al. 2503.23859 link
2025-03-31 OrchMLLM: Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training Yijie Zheng et.al. 2503.23830 null
2025-03-31 XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery? Fengxiang Wang et.al. 2503.23771 null
2025-03-31 STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding? Yun Li et.al. 2503.23765 null
2025-03-31 AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization Yiyang Du et.al. 2503.23733 link
2025-03-28 Q-Insight: Understanding Image Quality via Visual Reinforcement Learning Weiqi Li et.al. 2503.22679 link
2025-03-28 Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users Antonia Karamolegkou et.al. 2503.22610 null
2025-03-28 NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving Fuhao Li et.al. 2503.22436 null
2025-03-31 Agent-Centric Personalized Multiple Clustering with Multi-Modal LLMs Ziye Chen et.al. 2503.22241 null
2025-03-28 Learning to Instruct for Visual Instruction Tuning Zhihan Zhou et.al. 2503.22215 null
2025-03-28 DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos Yunming Liang et.al. 2503.22208 null
2025-03-28 EgoToM: Benchmarking Theory of Mind Reasoning from Egocentric Videos Yuxuan Li et.al. 2503.22152 link
2025-03-28 Tokenization of Gaze Data Tim Rolff et.al. 2503.22145 null
2025-03-28 A Survey on Remote Sensing Foundation Models: From Vision to Multimodality Ziyue Huang et.al. 2503.22081 link
2025-03-27 Video-R1: Reinforcing Video Reasoning in MLLMs Kaituo Feng et.al. 2503.21776 link
2025-03-27 3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models Yuhan Zhang et.al. 2503.21745 null
2025-03-27 UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning Zhengxi Lu et.al. 2503.21620 link
2025-03-27 FusionSegReID: Advancing Person Re-Identification with Multimodal Retrieval and Precise Segmentation Jincheng Yan et.al. 2503.21595 null
2025-03-27 FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs Xiaoqin Wang et.al. 2503.21457 link
2025-03-27 InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression Dongchen Lu et.al. 2503.21307 link
2025-03-26 ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and Prediction Yiqiao Jin et.al. 2503.20978 null
2025-03-26 MATHGLANCE: Multimodal Large Language Models Do Not Know Where to Look in Mathematical Diagrams Yanpeng Sun et.al. 2503.20745 null
2025-03-26 Vision as LoRA Han Wang et.al. 2503.20680 link
2025-03-26 Beyond Intermediate States: Explaining Visual Redundancy through Language Dingchen Yang et.al. 2503.20540 link
2025-03-26 Vision-Amplified Semantic Entropy for Hallucination Detection in Medical Visual Question Answering Zehui Liao et.al. 2503.20504 null
2025-03-26 MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning Yiwei Ma et.al. 2503.20502 null
2025-03-26 From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment Yucheng Suo et.al. 2503.20472 null
2025-03-26 MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation Rongyu Zhang et.al. 2503.20384 null
2025-03-26 Dynamic Pyramid Network for Efficient Multimodal Large Language Model Hao Ai et.al. 2503.20322 null
2025-03-26 Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs Zitian Wang et.al. 2503.20309 null
2025-03-25 LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? Kexian Tang et.al. 2503.19990 null
2025-03-25 CoLLM: A Large Language Model for Composed Image Retrieval Chuong Huynh et.al. 2503.19910 link
2025-03-25 Scaling Vision Pre-Training to 4K Resolution Baifeng Shi et.al. 2503.19903 null
2025-03-25 Perception-Enhanced Multitask Multimodal Semantic Communication for UAV-Assisted Integrated Sensing and Communication System Ziji Guo et.al. 2503.19594 null
2025-03-25 DomainCQA: Crafting Expert-Level QA from Domain-Specific Charts Ling Zhong et.al. 2503.19498 null
2025-03-25 ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning Jiaqi Liao et.al. 2503.19312 null
2025-03-24 MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks Wenhao You et.al. 2503.19134 null
2025-03-24 LLaVAction: evaluating and training multi-modal large language models for action recognition Shaokai Ye et.al. 2503.18712 link
2025-03-25 Commander-GPT: Fully Unleashing the Sarcasm Detection Capability of Multi-Modal Large Language Models Yazhou Zhang et.al. 2503.18681 null
2025-03-24 Boosting Virtual Agent Learning and Reasoning: A Step-wise, Multi-dimensional, and Generalist Reward Model with Benchmark Bingchen Miao et.al. 2503.18665 link
2025-03-24 Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding Xiangrui Liu et.al. 2503.18478 null
2025-03-24 A Simple yet Effective Layout Token in Large Language Models for Document Understanding Zhaoqing Zhu et.al. 2503.18434 null
2025-03-23 Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering Zixin Chen et.al. 2503.18172 null
2025-03-23 MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation Jiaxin Huang et.al. 2503.18135 null
2025-03-23 MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection Yibo Yan et.al. 2503.18132 null
2025-03-23 Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models Qiao Liang et.al. 2503.18034 null
2025-03-22 4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding Wenxuan Zhu et.al. 2503.17827 link
2025-03-21 LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models Jian Liang et.al. 2503.16843 null
2025-03-21 When Tom Eats Kimchi: Evaluating Cultural Bias of Multimodal Large Language Models in Cultural Mixture Contexts Jun Seong Kim et.al. 2503.16826 null
2025-03-20 Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future Directions Hadi Amini et.al. 2503.16585 link
2025-03-20 OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence Long Yuan et.al. 2503.16326 null
2025-03-20 Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data Zijian Li et.al. 2503.16260 null
2025-03-20 CLS-RL: Image Classification with Rule-Based Reinforcement Learning Ming Li et.al. 2503.16188 link
2025-03-20 OThink-MR1: Stimulating multimodal generalized reasoning capabilities through dynamic reinforcement learning Zhiyuan Liu et.al. 2503.16081 null
2025-03-20 Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models Zhihang Liu et.al. 2503.16036 link
2025-03-20 BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models Zenghui Yuan et.al. 2503.16023 null
2025-03-20 DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering Haochen Wang et.al. 2503.15887 null
2025-03-20 A Vision Centric Remote Sensing Benchmark Abduljaleel Adejumo et.al. 2503.15816 null
2025-03-19 LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning Federico Cocchi et.al. 2503.15621 link
2025-03-19 Visual Position Prompt for MLLM based Visual Grounding Wei Tang et.al. 2503.15426 link
2025-03-19 Leveraging Perfect Multimodal Alignment and Gaussian Assumptions for Cross-modal Transfer Abhi Kamboj et.al. 2503.15352 null
2025-03-19 LEGION: Learning to Ground and Explain for Synthetic Image Detection Hengrui Kang et.al. 2503.15264 null
2025-03-20 Benchmarking Large Language Models for Handwritten Text Recognition Giorgia Crosilla et.al. 2503.15195 null
2025-03-19 UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation Qihui Zhang et.al. 2503.14941 null
2025-03-19 VisNumBench: Evaluating Number Sense of Multimodal Large Language Models Tengjin Weng et.al. 2503.14939 null
2025-03-19 FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding Chongjun Tu et.al. 2503.14935 null
2025-03-19 POSTA: A Go-to Framework for Customized Artistic Poster Generation Haoyu Chen et.al. 2503.14908 null
2025-03-19 Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations Shuo Li et.al. 2503.14895 null
2025-03-18 Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives Sara Sarto et.al. 2503.14604 link
2025-03-18 Aligning Multimodal LLM with Human Preference: A Survey Tao Yu et.al. 2503.14504 link
2025-03-19 Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM Xinyu Fang et.al. 2503.14478 link
2025-03-18 VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation Shoubin Yu et.al. 2503.14350 null
2025-03-19 DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies Wei Song et.al. 2503.14324 link
2025-03-18 Towards Harmless Multimodal Assistants with Blind Preference Optimization Yongqi Li et.al. 2503.14189 null
2025-03-18 Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding Zining Wang et.al. 2503.14140 null
2025-03-18 MP-GUI: Modality Perception with MLLMs for GUI Understanding Ziwei Wang et.al. 2503.14021 link
2025-03-18 SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability Jiankang Wang et.al. 2503.13983 null
2025-03-18 Survey of Adversarial Robustness in Multimodal Large Language Models Chengze Jiang et.al. 2503.13962 null
2025-03-18 Conformal Prediction and MLLM aided Uncertainty Quantification in Scene Graph Generation Sayak Nag et.al. 2503.13947 null
2025-03-17 MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research James Burgess et.al. 2503.13399 link
2025-03-17 Cream of the Crop: Harvesting Rich, Scalable and Transferable Multi-Modal Data for Instruction Fine-Tuning Mengyao Lyu et.al. 2503.13383 null
2025-03-17 Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning Hai-Long Sun et.al. 2503.13360 null
2025-03-17 3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o Dingning Liu et.al. 2503.13185 null
2025-03-17 MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs Erik Daxberger et.al. 2503.13111 null
2025-03-17 Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference Hao Yin et.al. 2503.13108 link
2025-03-17 ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models Hao Yin et.al. 2503.13107 link
2025-03-17 Mitigating Cross-Modal Distraction and Ensuring Geometric Feasibility via Affordance-Guided, Self-Consistent MLLMs for Food Preparation Task Planning Yu-Hong Shen et.al. 2503.13055 null
2025-03-17 Efficient Motion-Aware Video MLLM Zijia Zhao et.al. 2503.13016 null
2025-03-17 HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model Haiyang Guo et.al. 2503.12941 null
2025-03-14 VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity Jing Bi et.al. 2503.11557 null
2025-03-14 A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving Tin Stribor Sohn et.al. 2503.11400 null
2025-03-14 Cornstarch: Distributed Multimodal Training Must Be Multimodality-Aware Insu Jang et.al. 2503.11367 link
2025-03-14 Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space Weichen Zhan et.al. 2503.11094 link
2025-03-14 EmbodiedVSR: Dynamic Scene Graph-Guided Chain-of-Thought Reasoning for Visual Spatial Tasks Yi Zhang et.al. 2503.11089 null
2025-03-14 BannerAgency: Advertising Banner Design with Multimodal LLM Agents Heng Wang et.al. 2503.11060 null
2025-03-14 RONA: Pragmatically Diverse Image Captioning with Coherence Relations Aashish Anantha Ramakrishnan et.al. 2503.10997 link
2025-03-13 Learning to Inference Adaptively for Multimodal Large Language Models Zhuoyan Xu et.al. 2503.10905 null
2025-03-13 PiSA: A Self-Augmented Data Engine and Training Strategy for 3D Understanding with Large Models Zilu Guo et.al. 2503.10529 null
2025-03-13 Interactive Multimodal Fusion with Temporal Modeling Jun Yu et.al. 2503.10523 null
2025-03-13 TokenCarve: Information-Preserving Visual Token Compression in Multimodal Large Language Models Xudong Tan et.al. 2503.10501 link
2025-03-13 4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models Wanhua Li et.al. 2503.10437 link
2025-03-13 CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance Yufan Deng et.al. 2503.10391 null
2025-03-13 A Multimodal Fusion Model Leveraging MLP Mixer and Handcrafted Features-based Deep Learning Networks for Facial Palsy Detection Heng Yim Nicole Oo et.al. 2503.10371 null
2025-03-13 IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification Yuhao Wang et.al. 2503.10324 null
2025-03-13 VisualPRM: An Effective Process Reward Model for Multimodal Reasoning Weiyun Wang et.al. 2503.10291 null
2025-03-13 LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents Boyu Chen et.al. 2503.10200 null
2025-03-13 Hybrid Agents for Image Restoration Bingchen Li et.al. 2503.10120 null
2025-03-13 BIMBA: Selective-Scan Compression for Long-Range Video Question Answering Md Mohaiminul Islam et.al. 2503.09590 link
2025-03-12 Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding Haoyu Zhang et.al. 2503.09143 null
2025-03-11 Seeing What's Not There: Spurious Correlation in Multimodal LLMs Parsa Hosseini et.al. 2503.08884 null
2025-03-11 Language-Depth Navigated Thermal and Visible Image Fusion Jinchang Zhang et.al. 2503.08676 null
2025-03-11 SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories Muzhi Zhu et.al. 2503.08625 link
2025-03-11 LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization Xianfeng Wu et.al. 2503.08619 link
2025-03-11 HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding Shehreen Azad et.al. 2503.08585 null
2025-03-11 RAG-Adapter: A Plug-and-Play RAG-enhanced Framework for Long Video Understanding Xichen Tan et.al. 2503.08576 null
2025-03-11 FastCache: Optimizing Multimodal LLM Serving through Lightweight KV-Cache Compression Framework Jianian Zhu et.al. 2503.08461 null
2025-03-11 KAP: MLLM-assisted OCR Text Enhancement for Hybrid Retrieval in Chinese Non-Narrative Documents Hsin-Ling Hsu et.al. 2503.08452 link
2025-03-11 Embodied Crowd Counting Runling Long et.al. 2503.08367 null
2025-03-12 Attention Reallocation: Towards Zero-cost and Controllable Hallucination Mitigation of MLLMs Chongjun Tu et.al. 2503.08342 null
2025-03-11 Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework Zhuo Zhi et.al. 2503.08308 null
2025-03-10 Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts Shiu-hong Kao et.al. 2503.07503 null
2025-03-10 LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition? Bangyan Li et.al. 2503.07487 null
2025-03-10 REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding Yan Tai et.al. 2503.07413 link
2025-03-10 ALLVB: All-in-One Long Video Understanding Benchmark Xichen Tan et.al. 2503.07298 null
2025-03-10 A Novel Ophthalmic Benchmark for Evaluating Multimodal Large Language Models with Fundus Photographs and OCT Images Xiaoyi Liang et.al. 2503.07094 null
2025-03-10 Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning Jiazheng Liu et.al. 2503.07002 null
2025-03-10 Utilizing Jailbreak Probability to Attack and Safeguard Multimodal LLMs Wenzhuo Xu et.al. 2503.06989 null
2025-03-10 Lightweight Multimodal Artificial Intelligence Framework for Maritime Multi-Scene Recognition Xinyu Xi et.al. 2503.06978 null
2025-03-10 ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks Yan Yang et.al. 2503.06885 null
2025-03-09 SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation Zisheng Chen et.al. 2503.06764 link
2025-03-11 Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models Wenxuan Huang et.al. 2503.06749 link
2025-03-07 Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information Junbo Zhao et.al. 2503.05543 null
2025-03-07 Can Large Language Models Grasp Concepts in Visual Content? A Case Study on YouTube Shorts about Depression Jiaying "Lizzy" Liu et.al. 2503.05109 null
2025-03-06 FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement Ian Huang et.al. 2503.04919 null
2025-03-06 Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model Wenke Huang et.al. 2503.04543 link
2025-03-06 Knowledge-Decoupled Synergetic Learning: An MLLM based Collaborative Approach to Few-shot Multimodal Dialogue Intention Recognition Bin Chen et.al. 2503.04201 null
2025-03-06 MASTER: Multimodal Segmentation with Text Prompts Fuyang Liu et.al. 2503.04199 null
2025-03-06 Biological Sequence with Language Model Prompting: A Survey Jiyue Jiang et.al. 2503.04135 null
2025-03-07 Chart-HQA: A Benchmark for Hypothetical Question Answering in Charts Xiangnan Chen et.al. 2503.04095 null
2025-03-06 RetinalGPT: A Retinal Clinical Preference Conversational Assistant Powered by Large Vision-Language Models Wenhui Zhu et.al. 2503.03987 null
2025-03-05 DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance Zhao Yang et.al. 2503.03689 link
2025-03-05 BEVMOSNet: Multimodal Fusion for BEV Moving Object Segmentation Hiep Truong Cong et.al. 2503.03280 null
2025-03-05 COSINT-Agent: A Knowledge-Driven Multimodal Agent for Chinese Open Source Intelligence Wentao Li et.al. 2503.03215 null
2025-03-05 Variance-Aware Loss Scheduling for Multimodal Alignment in Low-Data Settings Sneh Pillai et.al. 2503.03202 null
2025-03-04 Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs Wei-Yao Wang et.al. 2503.02597 link
2025-03-05 MCiteBench: A Benchmark for Multimodal Citation Text Generation in MLLMs Caiyu Hu et.al. 2503.02589 link
2025-03-04 A Token-level Text Image Foundation Model for Document Understanding Tongkun Guan et.al. 2503.02304 null
2025-03-03 Distilled Prompt Learning for Incomplete Multimodal Survival Prediction Yingxue Xu et.al. 2503.01653 null
2025-03-03 RemiHaven: Integrating "In-Town" and "Out-of-Town" Peers to Provide Personalized Reminiscence Support for Older Drifters Xuechen Zhang et.al. 2503.01358 null
2025-03-04 UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface Hao Tang et.al. 2503.01342 link
2025-03-03 Retrieval-Augmented Perception: High-Resolution Image Perception Meets Visual RAG Wenbin Wang et.al. 2503.01222 link
2025-03-03 Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models Tianjie Ju et.al. 2503.01208 link
2025-03-03 Scientific Reasoning: Assessment of Multimodal Generative LLMs Florian Dreyer et.al. 2503.01064 null
2025-03-02 LLM-Fusion: A Novel Multimodal Fusion Model for Accelerated Material Discovery Onur Boyar et.al. 2503.01022 null
2025-02-28 Adaptive Keyframe Sampling for Long Video Understanding Xi Tang et.al. 2502.21271 null
2025-02-28 RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete Yuheng Ji et.al. 2502.21257 null
2025-02-28 Fine-Grained Retrieval-Augmented Generation for Visual Question Answering Zhengxuan Zhang et.al. 2502.20964 null
2025-02-28 HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models Xiao Wang et.al. 2502.20811 null
2025-03-03 MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts Peijie Wang et.al. 2502.20808 null
2025-02-28 Towards General Visual-Linguistic Face Forgery Detection(V2) Ke Sun et.al. 2502.20698 link
2025-02-27 Visual Reasoning at Urban Intersections: FineTuning GPT-4o for Traffic Conflict Detection Sari Masri et.al. 2502.20573 null
2025-02-27 Protecting multimodal large language models against misleading visualizations Jonathan Tonglet et.al. 2502.20503 link
2025-02-27 VideoA11y: Method and Dataset for Accessible Video Description Chaoyu Li et.al. 2502.20480 null
2025-02-27 Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription Benjamin Gutteridge et.al. 2502.20295 link
2025-02-27 Mixture of Experts for Recognizing Depression from Interview and Reading Tasks Loukas Ilias et.al. 2502.20213 null
2025-02-27 New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration Xuzheng Yang et.al. 2502.20104 null
2025-02-27 AsymLoRA: Harmonizing Data Conflicts and Commonalities in MLLMs Xuyang Wei et.al. 2502.20035 link
2025-02-27 Joint Fusion and Encoding: Advancing Multimodal Retrieval from the Ground Up Lang Huang et.al. 2502.20008 null
2025-02-27 Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents Zhenyu Liu et.al. 2502.19917 link
2025-02-27 Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy Zaijing Li et.al. 2502.19902 null
2025-02-27 Towards Multimodal Large-Language Models for Parent-Child Interaction: A Focus on Joint Attention Weiyan Shi et.al. 2502.19877 null
2025-02-27 One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion Chunyang Cheng et.al. 2502.19854 link
2025-02-27 Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack Chenhe Gu et.al. 2502.19672 null
2025-02-26 ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models Danae Sánchez Villegas et.al. 2502.19409 null
2025-02-26 M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance Qingpei Guo et.al. 2502.18778 null
2025-02-25 OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Xiangyu Zhao et.al. 2502.18411 link
2025-02-25 ChatMotion: A Multimodal Multi-Agent for Human Motion Analysis Li Lei et.al. 2502.18180 null
2025-02-25 VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion Pei Liu et.al. 2502.18042 null
2025-02-25 MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks Hyeonjeong Ha et.al. 2502.17832 link
2025-02-25 Can Multimodal LLMs Perform Time Series Anomaly Detection? Xiongxiao Xu et.al. 2502.17812 link
2025-02-24 MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference Zhongwei Wan et.al. 2502.17599 link
2025-02-24 PosterSum: A Multimodal Benchmark for Scientific Poster Summarization Rohit Saxena et.al. 2502.17540 link
2025-02-24 Introducing Visual Perception Token into Multimodal Large Language Model Runpeng Yu et.al. 2502.17425 link
2025-02-24 MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs Jiarui Zhang et.al. 2502.17422 link
2025-02-24 HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization Zhenghao Liu et.al. 2502.17315 link
2025-02-24 Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts Zhenghao Liu et.al. 2502.17297 link
2025-02-24 Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence Wenzhe Yin et.al. 2502.17028 null
2025-02-24 Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs Himanshu Beniwal et.al. 2502.16901 link
2025-02-24 SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding Liangtao Shi et.al. 2502.16786 link
2025-02-23 AeroReformer: Aerial Referring Transformer for UAV-based Referring Image Segmentation Rui Li et.al. 2502.16680 link
2025-02-23 Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries Yin Wu et.al. 2502.16636 link
2025-02-23 Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review Pei Fu et.al. 2502.16586 null
2025-02-21 Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models Anirudh Sundar et.al. 2502.15639 null
2025-02-21 Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs Gengyuan Zhang et.al. 2502.15457 null
2025-02-21 Research advances on fish feeding behavior recognition and intensity quantification methods in aquaculture Shulong Zhang et.al. 2502.15311 null
2025-02-21 M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment Chuan Cui et.al. 2502.15167 link
2025-02-20 Reducing Hallucinations of Medical Multimodal Large Language Models with Visual Retrieval-Augmented Generation Yun-Wei Chu et.al. 2502.15040 null
2025-02-20 Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework Yuming Yang et.al. 2502.14864 link
2025-02-20 Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension Amir Hossein Yari et.al. 2502.14315 null
2025-02-20 Vulnerability of Text-to-Image Models to Prompt Template Stealing: A Differential Evolution Approach Yurong Wu et.al. 2502.14285 null
2025-02-21 PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC Haowei Liu et.al. 2502.14282 null
2025-02-19 ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities Chanjin Zheng et.al. 2502.13832 link
2025-02-19 From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education Yi-Fan Zhang et.al. 2502.13789 null
2025-02-18 Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation Bencheng Liao et.al. 2502.13145 link
2025-02-18 SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models Xianfu Cheng et.al. 2502.13059 null
2025-02-18 AEIA-MN: Evaluating the Robustness of Multimodal LLM-Powered Mobile Agents Against Active Environmental Injection Attacks Yurun Chen et.al. 2502.13053 null
2025-02-18 Towards Text-Image Interleaved Retrieval Xin Zhang et.al. 2502.12799 link
2025-02-18 Corrupted but Not Broken: Rethinking the Impact of Corrupted Data in Visual Instruction Tuning Yunhao Gou et.al. 2502.12635 null
2025-02-18 SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings Weikai Lu et.al. 2502.12562 link
2025-02-18 MomentSeeker: A Comprehensive Benchmark and A Strong Baseline For Moment Retrieval Within Long Videos Huaying Yuan et.al. 2502.12558 null
2025-02-18 SAFEERASER: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning Junkai Chen et.al. 2502.12520 null
2025-02-17 HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation Ling Yang et.al. 2502.12148 link
2025-02-17 PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection Jinhe Bi et.al. 2502.12119 null
2025-02-17 Token Communications: A Unified Framework for Cross-modal Context-aware Semantic Communications Li Qiao et.al. 2502.12096 null
2025-02-17 Unhackable Temporal Rewarding for Scalable Video MLLMs En Yu et.al. 2502.12081 null
2025-02-17 GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs Yi Fang et.al. 2502.11925 null
2025-02-17 EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models Jiamin Su et.al. 2502.11916 link
2025-02-17 MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation Haochen Xue et.al. 2502.11903 null
2025-02-17 Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities Hanbin Wang et.al. 2502.11829 link
2025-02-17 Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning Yuqi Pang et.al. 2502.11751 link
2025-02-17 Mitigating Visual Knowledge Forgetting in MLLM Instruction-tuning via Modality-decoupled Gradient Descent Junda Wu et.al. 2502.11740 null
2025-02-14 MM-RLHF: The Next Step Forward in Multimodal LLM Alignment Yi-Fan Zhang et.al. 2502.10391 null
2025-02-14 AutoS $^2$ earch: Unlocking the Reasoning Potential of Large Models for Web-based Source Search Zhengqiu Zhu et.al. 2502.09913 null
2025-02-13 EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents Rui Yang et.al. 2502.09560 null
2025-02-13 A Benchmark for Crime Surveillance Video Analysis with Large Models Haoran Chen et.al. 2502.09325 null
2025-02-13 From Visuals to Vocabulary: Establishing Equivalence Between Image and Text Token Through Autoregressive Pre-training in MLLMs Mingxiao Li et.al. 2502.09093 null
2025-02-12 FixDrive: Automatically Repairing Autonomous Vehicle Driving Behaviour for $0.08 per Violation Yang Sun et.al. 2502.08260 link
2025-02-12 Learning Human Skill Generators at Key-Step Levels Yilu Wu et.al. 2502.08234 null
2025-02-13 Universal Adversarial Attack on Aligned Multimodal LLMs Temurbek Rahmatullaev et.al. 2502.07987 null
2025-02-11 DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities Chashi Mahiul Islam et.al. 2502.07905 null
2025-02-11 Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models Jiacong Xu et.al. 2502.07601 null
2025-02-11 MLLM4PUE: Toward Universal Embeddings in Computational Pathology through Multimodal LLMs Qifeng Zhou et.al. 2502.07221 null
2025-02-11 Early Risk Prediction of Pediatric Cardiac Arrest from Electronic Health Records via Multimodal Fused Transformer Jiaying Lu et.al. 2502.07158 null
2025-02-09 AI-Driven HSI: Multimodality, Fusion, Challenges, and the Deep Learning Revolution David S. Bhatti et.al. 2502.06894 null
2025-02-11 CoS: Chain-of-Shot Prompting for Long Video Understanding Jian Hu et.al. 2502.06428 null
2025-02-07 Survey on AI-Generated Media Detection: From Non-MLLM to MLLM Yueying Zou et.al. 2502.05240 null
2025-02-07 Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray Yunhang Shen et.al. 2502.05177 link
2025-02-07 Multitwine: Multi-Object Compositing with Text and Layout Control Gemma Canet Tarrés et.al. 2502.05165 null
2025-02-07 Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs Rohit Saxena et.al. 2502.05092 null
2025-02-07 Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark Han Zhang et.al. 2502.04976 null
2025-02-07 Cached Multi-Lora Composition for Multi-Concept Image Generation Xiandong Zou et.al. 2502.04923 link
2025-02-07 MedMimic: Physician-Inspired Multimodal Fusion for Early Diagnosis of Fever of Unknown Origin Minrui Chen et.al. 2502.04794 null
2025-02-06 EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models He Hu et.al. 2502.04424 null
2025-02-05 PerPO: Perceptual Preference Optimization via Discriminative Rewarding Zining Zhu et.al. 2502.04371 link
2025-02-06 PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models? Mennatullah Siam et.al. 2502.04192 link
2025-02-06 MRAMG-Bench: A BeyondText Benchmark for Multimodal Retrieval-Augmented Multimodal Generation Qinhan Yu et.al. 2502.04176 link
2025-02-05 Large Language Models Are Universal Recommendation Learners Junguang Jiang et.al. 2502.03041 null
2025-02-05 Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning Yibo Yan et.al. 2502.02871 null
2025-02-04 SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency Qianhao Yuan et.al. 2502.02458 link
2025-02-04 Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment Yaling Shen et.al. 2502.02438 null
2025-02-06 LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models Tzu-Tao Chang et.al. 2502.02406 null
2025-02-04 Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking Jinyang Wu et.al. 2502.02339 null
2025-02-04 Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration Younan Zhu et.al. 2502.01969 null
2025-02-04 MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving Shiju Zhao et.al. 2502.01960 null
2025-02-04 DAMO: Data- and Model-aware Alignment of Multi-modal LLMs Jinda Lu et.al. 2502.01943 link
2025-02-03 Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models Hashmat Shadab Malik et.al. 2502.01576 link
2025-02-03 Position: Empowering Time Series Reasoning with Multimodal LLMs Yaxuan Kong et.al. 2502.01477 null
2025-02-03 Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models Mingi Jung et.al. 2502.01419 null
2025-01-31 Efficient Reasoning with Hidden Thinking Xuan Shen et.al. 2501.19201 link
2025-01-31 Beyond Token Compression: A Training-Free Reduction Framework for Efficient Visual Processing in MLLMs Hongliang Li et.al. 2501.19036 null
2025-01-31 Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation Bin Zhu et.al. 2501.19017 null
2025-01-30 BounTCHA: A CAPTCHA Utilizing Boundary Identification in AI-extended Videos Lehao Lin et.al. 2501.18565 null
2025-01-29 Generative AI for Vision: A Comprehensive Study of Frameworks and Applications Fouad Bousetouane et.al. 2501.18033 null
2025-01-29 Topological Signatures of Adversaries in Multimodal Alignments Minh Vu et.al. 2501.18006 null
2025-01-30 Leveraging Multimodal LLM for Inspirational User Interface Search Seokhyeon Park et.al. 2501.17799 link
2025-01-29 Learning Free Token Reduction for Multi-Modal LLM Zihui Zhao et.al. 2501.17391 null
2025-01-31 Multimodal Magic Elevating Depression Detection with a Fusion of Text and Audio Intelligence Lindy Gan et.al. 2501.16813 null
2025-01-28 Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding Yun Li et.al. 2501.16786 null
2025-01-28 MME-Industry: A Cross-Industry Multimodal Evaluation Benchmark Dongyi Yi et.al. 2501.16688 null
2025-01-28 CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs Jinlan Fu et.al. 2501.16629 link
2025-01-27 AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models Zheng Lian et.al. 2501.16566 link
2025-01-27 LUCY: Linguistic Understanding and Control Yielding Early Stage of Her Heting Gao et.al. 2501.16327 link
2025-01-27 FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers Renshan Zhang et.al. 2501.16297 null
2025-01-27 Brain-Adapter: Enhancing Neurological Disorder Analysis with Adapter-Tuning Multimodal Large Language Models Jing Zhang et.al. 2501.16282 null
2025-01-27 Can Multimodal Large Language Models be Guided to Improve Industrial Anomaly Detection? Zhiling Chen et.al. 2501.15795 null
2025-01-27 Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning Michael Xieyang Liu et.al. 2501.15727 null
2025-01-26 Ocean-OCR: Towards General OCR Application via a Vision-Language Model Song Chen et.al. 2501.15558 link
2025-01-26 Unveiling the Potential of Multimodal Retrieval Augmented Generation with Planning Xiaohan Yu et.al. 2501.15470 null
2025-01-26 Zero-Shot Interactive Text-to-Image Retrieval via Diffusion-Augmented Representations Zijun Long et.al. 2501.15379 null
2025-01-26 Baichuan-Omni-1.5 Technical Report Yadong Li et.al. 2501.15368 link
2025-01-25 Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink Yining Wang et.al. 2501.15269 null
2025-01-23 Pilot: Building the Federated Multimodal Instruction Tuning Framework Baochen Xiong et.al. 2501.13985 null
2025-01-23 GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration Yue Fan et.al. 2501.13896 null
2025-01-23 EventVL: Understand Event Streams via Multimodal Large Language Model Pengteng Li et.al. 2501.13707 null
2025-01-23 LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models Yizheng Sun et.al. 2501.13652 null
2025-01-23 ReasVQA: Advancing VideoQA with Imperfect Reasoning Process Jianxin Liang et.al. 2501.13536 null
2025-01-23 50 Shades of Deceptive Patterns: A Unified Taxonomy, Multimodal Detection, and Security Implications Zewei Shi et.al. 2501.13351 link
2025-01-24 Multi-aspect Knowledge Distillation with Large Language Model Taegyeong Lee et.al. 2501.13341 link
2025-01-22 Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning Bohao Yang et.al. 2501.13042 link
2025-01-22 InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling Yi Wang et.al. 2501.12386 link
2025-01-21 VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model Xianwei Zhuang et.al. 2501.12327 link
2025-01-21 Bridging Visualization and Optimization: Multimodal Large Language Models on Graph-Structured Combinatorial Optimization Jie Zhao et.al. 2501.11968 null
2025-01-21 EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents Zhili Cheng et.al. 2501.11858 link
2025-01-20 Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution Zhiyuan You et.al. 2501.11561 null
2025-01-20 EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery Guankun Wang et.al. 2501.11347 link
2025-01-20 ITCFN: Incomplete Triple-Modal Co-Attention Fusion Network for Mild Cognitive Impairment Conversion Prediction Xiangyang Hu et.al. 2501.11276 link
2025-01-20 A Survey of World Models for Autonomous Driving Tuo Feng et.al. 2501.11260 link
2025-01-19 Rethinking Early-Fusion Strategies for Improved Multimodal Image Segmentation Zhengwen Shen et.al. 2501.10958 null
2025-01-18 Visual RAG: Expanding MLLM visual knowledge without fine-tuning Mirco Bonomo et.al. 2501.10834 null
2025-01-17 FaceXBench: Evaluating Multimodal LLMs on Face Understanding Kartik Narayan et.al. 2501.10360 link
2025-01-16 A Simple Aerial Detection Baseline of Multimodal Language Models Qingyun Li et.al. 2501.09720 link
2025-01-16 Omni-Emotion: Extending Video MLLM with Detailed Face and Audio Modeling for Multimodal Emotion Analysis Qize Yang et.al. 2501.09502 null
2025-01-16 Interpretable Droplet Digital PCR Assay for Trustworthy Molecular Diagnostics Yuanyuan Wei et.al. 2501.09218 null
2025-01-15 Multimodal LLMs Can Reason about Aesthetics in Zero-Shot Ruixiang Jiang et.al. 2501.09012 link
2025-01-15 The Devil is in Temporal Token: High Quality Video Reasoning Segmentation Sitong Gong et.al. 2501.08549 link
2025-01-14 LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding Hongyu Li et.al. 2501.08282 link
2025-01-14 Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness Jiaxing Zhao et.al. 2501.07978 link
2025-01-14 Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models Yifang Xu et.al. 2501.07972 null
2025-01-14 3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding Haomiao Xiong et.al. 2501.07819 link
2025-01-13 Imagine while Reasoning in Space: Multimodal Visualization-of-Thought Chengzu Li et.al. 2501.07542 null
2025-01-13 Aligning First, Then Fusing: A Novel Weakly Supervised Multimodal Violence Detection Method Wenping Jin et.al. 2501.07496 link
2025-01-13 Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation Han Liu et.al. 2501.07110 link
2025-01-13 LEO: Boosting Mixture of Vision Encoders for Multimodal Large Language Models Mozhgan Nasr Azadani et.al. 2501.06986 link
2025-01-12 X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding Wenqi Zhou et.al. 2501.06835 null
2025-01-12 GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing Ruizhe Ou et.al. 2501.06828 null
2025-01-12 MTPareto: A MultiModal Targeted Pareto Framework for Fake News Detection Kaiying Yan et.al. 2501.06764 null
2025-01-12 Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints Ming Dai et.al. 2501.06710 link
2025-01-11 ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation Xuanle Zhao et.al. 2501.06598 link
2025-01-11 Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs Shan Zhang et.al. 2501.06430 link
2025-01-10 PEACE: Empowering Geologic Map Holistic Understanding with MLLMs Yangyu Huang et.al. 2501.06184 null
2025-01-10 Text-to-Edit: Controllable End-to-End Video Ad Creation via Multimodal LLMs Dabing Cheng et.al. 2501.05884 null
2025-01-10 Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models You Li et.al. 2501.05767 null
2025-01-10 TB-Bench: Training and Testing Multi-Modal AI for Understanding Spatio-Temporal Traffic Behaviors from Dashcam Images/Videos Korawat Charoenpitaks et.al. 2501.05733 link
2025-01-09 MECASA: Motor Execution Classification using Additive Self-Attention for Hybrid EEG-fNIRS Data Gourav Siddhad et.al. 2501.05525 null
2025-01-09 Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark Yunzhuo Hao et.al. 2501.05444 link
2025-01-09 Compression with Global Guidance: Towards Training-free High-Resolution MLLMs Acceleration Xuyang Liu et.al. 2501.05179 link
2025-01-09 Optimizing Multitask Industrial Processes with Predictive Action Guidance Naval Kishore Mehta et.al. 2501.05108 null
2025-01-09 DriVLM: Domain Adaptation of Vision-Language Models in Autonomous Driving Xuran Zheng et.al. 2501.05081 null
2025-01-09 Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency Shiji Zhao et.al. 2501.04931 null
2025-01-08 Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs Yikang Zhou et.al. 2501.04670 link
2025-01-08 InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection Yuhang Liu et.al. 2501.04575 link
2025-01-08 Evidence-based multimodal fusion on structured EHRs and free-text notes for ICU outcome prediction Yucheng Ruan et.al. 2501.04389 link
2025-01-08 Multimodal Graph Constrastive Learning and Prompt for ChartQA Yue Dai et.al. 2501.04303 null
2025-01-08 H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving Siran Chen et.al. 2501.04302 null
2025-01-07 RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance Matin Mortaheb et.al. 2501.03995 null
2025-01-06 Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches Alhassan Mumuni et.al. 2501.03151 null
2025-01-07 Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild Wanpeng Hu et.al. 2501.02964 link
2025-01-06 A Novel Vision Transformer for Camera-LiDAR Fusion based Traffic Object Segmentation Toomas Tahves et.al. 2501.02858 null
2025-01-06 Ultrasound-QBench: Can LLMs Aid in Quality Assessment of Ultrasound Imaging? Hongyi Miao et.al. 2501.02751 null
2025-01-05 FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance Haicheng Wang et.al. 2501.02430 link
2025-01-04 What Kind of Visual Tokens Do We Need? Training-free Visual Token Pruning for Multi-modal Large Language Models from the Perspective of Graph Yutao Jiang et.al. 2501.02268 link
2025-01-03 AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs Sanjoy Chowdhury et.al. 2501.02135 null
2025-01-03 VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Chaoyou Fu et.al. 2501.01957 link
2025-01-03 Virgo: A Preliminary Exploration on Reproducing o1-like MLLM Yifan Du et.al. 2501.01904 link
2025-01-03 Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models Guosheng Zhang et.al. 2501.01720 null
2025-01-02 Face-Human-Bench: A Comprehensive Benchmark of Face and Human Understanding for Multi-modal Assistants Lixiong Qin et.al. 2501.01243 null
2025-01-02 Towards Interactive Deepfake Analysis Lixiong Qin et.al. 2501.01164 link
2025-01-02 EliGen: Entity-Level Controlled Image Generation with Regional Attention Hong Zhang et.al. 2501.01097 link
2025-01-02 Image-based Multimodal Models as Intruders: Transferable Multimodal Attacks on Video-based MLLMs Linhao Huang et.al. 2501.01042 null
2025-01-01 Decoding the Flow: CauseMotion for Emotional Causality Analysis in Long-form Conversations Yuxuan Zhang et.al. 2501.00778 null
2024-12-31 Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method Zhenpeng Huang et.al. 2501.00584 null
2024-12-31 VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling Xinhao Li et.al. 2501.00574 link
2024-12-31 Fine-grained Video-Text Retrieval: A New Benchmark and Method Yifan Xu et.al. 2501.00513 null
2024-12-31 Exploring the Implicit Semantic Ability of Multimodal Large Language Models: A Pilot Study on Entity Set Expansion Hebin Wang et.al. 2501.00330 null
2024-12-31 MLLM-as-a-Judge for Image Safety without Human Labeling Zhenting Wang et.al. 2501.00192 null
2024-12-30 GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models Shangyu Xing et.al. 2412.21036 null
2024-12-30 Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering Junxiao Xue et.al. 2412.20927 null
2024-12-28 ST $^3$ : Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming Jiedong Zhuang et.al. 2412.20105 null
2024-12-28 On the Compositional Generalization of Multimodal LLMs for Medical Imaging Zhenyang Cai et.al. 2412.20070 link
2024-12-27 Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework Jiang Liu et.al. 2412.19684 null
2024-12-27 CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs Siyu Wang et.al. 2412.19663 null
2024-12-27 MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios Jiaqi Fan et.al. 2412.19406 link
2024-12-26 Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Ziang Yan et.al. 2412.19326 link
2024-12-26 Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries Roberto Amoroso et.al. 2412.19304 null
2024-12-26 SeaMo: A Multi-Seasonal and Multimodal Remote Sensing Foundation Model Xuyang Li et.al. 2412.19237 null
2024-12-25 MedHallBench: A New Benchmark for Assessing Hallucination in Medical Large Language Models Kaiwen Zuo et.al. 2412.18947 null
2024-12-25 RapGuard: Safeguarding Multimodal Large Language Models via Rationale-aware Defensive Prompting Yilei Jiang et.al. 2412.18826 null
2024-12-24 Video Is Worth a Thousand Images: Exploring the Latest Trends in Long Video Generation Faraz Waseem et.al. 2412.18688 null
2024-12-24 MixMAS: A Framework for Sampling-Based Mixer Architecture Search for Multimodal Fusion and Learning Abdelmadjid Chergui et.al. 2412.18437 link
2024-12-24 Muse: A Multimodal Conversational Recommendation Dataset with Scenario-Grounded User Profiles Zihan Wang et.al. 2412.18416 null
2024-12-24 Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Huanjin Yao et.al. 2412.18319 link
2024-12-24 ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation Mengyang Wu et.al. 2412.18216 link
2024-12-24 Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation Yucong Luo et.al. 2412.18176 null
2024-12-24 VisionLLM-based Multimodal Fusion Network for Glottic Carcinoma Early Detection Zhaohui Jin et.al. 2412.18124 null
2024-12-24 Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach Jing Bi et.al. 2412.18108 null
2024-12-24 An Ensemble Approach to Short-form Video Quality Assessment Using Multimodal LLM Wen Wen et.al. 2412.18060 null
2024-12-23 A Multimodal Fusion Framework for Bridge Defect Detection with Cross-Verification Ravi Datta Rachuri et.al. 2412.17968 null
2024-12-23 Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy Priyaranjan Pattnayak et.al. 2412.17759 null
2024-12-23 HumanVBench: Exploring Human-Centric Video Understanding Capabilities of MLLMs with Synthetic Benchmark Data Ting Zhou et.al. 2412.17574 link
2024-12-23 Multimodal Preference Data Synthetic Alignment with Reward Model Robert Wijaya et.al. 2412.17417 link
2024-12-23 MineAgent: Towards Remote-Sensing Mineral Exploration with Multimodal Large Language Models Beibei Yu et.al. 2412.17339 null
2024-12-23 Neural-MCRL: Neural Multimodal Contrastive Representation Learning for EEG-based Visual Decoding Yueyang Li et.al. 2412.17337 link
2024-12-23 Revisiting Multimodal Fusion for 3D Anomaly Detection from an Architectural Perspective Kaifang Long et.al. 2412.17297 null
2024-12-22 SubstationAI: Multimodal Large Model-Based Approaches for Analyzing Substation Equipment Faults Jinzhi Wang et.al. 2412.17077 null
2024-12-22 CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language Models Yeyuan Wang et.al. 2412.16869 link
2024-12-22 GME: Improving Universal Multimodal Retrieval by Multimodal LLMs Xin Zhang et.al. 2412.16855 null
2024-12-21 AlzheimerRAG: Multimodal Retrieval Augmented Generation for PubMed articles Aritra Kumar Lahiri et.al. 2412.16701 null
2024-12-20 MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and Detection Andrea Moglia et.al. 2412.15925 link
2024-12-20 Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution Wentao Tan et.al. 2412.15650 link
2024-12-20 Technical Report for ICML 2024 TiFA Workshop MLLM Attack Challenge: Suffix Injection and Projected Gradient Descent Can Easily Fool An MLLM Yangyang Guo et.al. 2412.15614 null
2024-12-20 QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning Xinyang Tong et.al. 2412.15576 null
2024-12-20 Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage Saehyung Lee et.al. 2412.15484 null
2024-12-19 MRWeb: An Exploration of Generating Multi-Page Resource-Aware Web Code from UI Designs Yuxuan Wan et.al. 2412.15310 link
2024-12-19 OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving Shuo Xing et.al. 2412.15208 link
2024-12-19 Progressive Multimodal Reasoning via Active Retrieval Guanting Dong et.al. 2412.14835 null
2024-12-19 Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models Zijun Chen et.al. 2412.14660 link
2024-12-18 Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Jihan Yang et.al. 2412.14171 link
2024-12-18 InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models Cong Wei et.al. 2412.14006 link
2024-12-18 LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer Yipeng Zhang et.al. 2412.13871 link
2024-12-17 Modality-Inconsistent Continual Learning of Multimodal Large Language Models Weiguo Pian et.al. 2412.13050 null
2024-12-17 ComprehendEdit: A Comprehensive Dataset and Evaluation Framework for Multimodal Knowledge Editing Yaohui Ma et.al. 2412.12821 link
2024-12-17 PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model Yuqing Wang et.al. 2412.12737 link
2024-12-17 ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding Zhenxing Zhang et.al. 2412.12718 link
2024-12-17 Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation Andong Chen et.al. 2412.12627 null
2024-12-17 FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning Seunghee Kim et.al. 2412.12567 null
2024-12-17 Beyond Data Quantity: Key Factors Driving Performance in Multilingual Language Models Sina Bagheri Nezhad et.al. 2412.12500 link
2024-12-16 Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering Jinhe Bi et.al. 2412.12359 link
2024-12-16 Instruction-based Image Manipulation by Watching How Things Move Mingdeng Cao et.al. 2412.12087 null
2024-12-16 CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding Guo Chen et.al. 2412.12075 null
2024-12-16 Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning Yuti Liu et.al. 2412.11952 null
2024-12-16 A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges Yibo Yan et.al. 2412.11936 null
2024-12-16 PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension Kun Ouyang et.al. 2412.11906 null
2024-12-16 GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training Renqiu Xia et.al. 2412.11863 link
2024-12-16 IDEA-Bench: How Far are Generative Models from Professional Designing? Chen Liang et.al. 2412.11767 link
2024-12-16 From Specific-MLLM to Omni-MLLM: A Survey about the MLLMs alligned with Multi-Modality Shixin Jiang et.al. 2412.11694 null
2024-12-16 ACE- $M^3$ : Automatic Capability Evaluator for Multimodal Medical Models Xiechi Zhang et.al. 2412.11453 null
2024-12-15 Drawing the Line: Enhancing Trustworthiness of MLLMs Through the Power of Refusal Yuhao Wang et.al. 2412.11196 null
2024-12-13 Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining Zhiqi Ge et.al. 2412.10342 null
2024-12-13 BrushEdit: All-In-One Image Inpainting and Editing Yaowei Li et.al. 2412.10316 null
2024-12-13 Leveraging Multimodal Methods and Spontaneous Speech for Alzheimer's Disease Identification Yifan Gao et.al. 2412.09928 null
2024-12-12 ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation Ali Athar et.al. 2412.09754 null
2024-12-12 EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM Zhuofan Zong et.al. 2412.09618 null
2024-12-13 Olympus: A Universal Task Router for Computer Vision Tasks Yuanze Lin et.al. 2412.09612 link
2024-12-12 SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding Hao Li et.al. 2412.09604 null
2024-12-12 Do Multimodal Large Language Models See Like Humans? Jiaying Lin et.al. 2412.09603 null
2024-12-12 InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Pan Zhang et.al. 2412.09596 link
2024-12-12 OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation Jitesh Jain et.al. 2412.09585 link
2024-12-12 Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition Zhisheng Zhong et.al. 2412.09501 link
2024-12-12 Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation Baisen Wang et.al. 2412.09428 link
2024-12-12 Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine Xiaoshuang Huang et.al. 2412.09278 link
2024-12-11 LLaVA-Zip: Adaptive Visual Token Compression with Intrinsic Image Information Ke Wang et.al. 2412.08771 null
2024-12-11 From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons Andrew Szot et.al. 2412.08442 null
2024-12-11 HyViLM: Enhancing Fine-Grained Recognition with a Hybrid Encoder for Vision-Language Models Shiding Zhu et.al. 2412.08378 null
2024-12-11 M2SE: A Multistage Multitask Instruction Tuning Strategy for Unified Sentiment and Emotion Analysis Ao Li et.al. 2412.08049 link
2024-12-10 DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Jianzong Wu et.al. 2412.07589 null
2024-12-09 SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations Zhaorun Chen et.al. 2412.06878 null
2024-12-09 ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance Chunwei Wang et.al. 2412.06673 null
2024-12-09 3D Spatial Understanding in MLLMs: Disambiguation and Evaluation Chun-Peng Chang et.al. 2412.06613 null
2024-12-12 World knowledge-enhanced Reasoning Using Instruction-guided Interactor in Autonomous Driving Mingliang Zhai et.al. 2412.06324 null
2024-12-09 LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations Mingjie Xu et.al. 2412.06322 link
2024-12-09 Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness Qifan Yu et.al. 2412.06293 null
2024-12-09 ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models Bingchen Gong et.al. 2412.06292 null
2024-12-08 GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis Ashish Goswami et.al. 2412.06089 null
2024-12-08 Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models Xiao Xu et.al. 2412.05939 null
2024-12-08 Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models Ma Teng et.al. 2412.05934 link
2024-12-08 [CLS] Token Tells Everything Needed for Training-free Efficient MLLMs Ao Wang et.al. 2412.05819 link
2024-12-06 Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Zhe Chen et.al. 2412.05271 link
2024-12-06 CompCap: Improving Multimodal Large Language Models with Composite Captions Xiaohui Chen et.al. 2412.05243 null
2024-12-06 MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Jarvis Guo et.al. 2412.05237 null
2024-12-06 LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation Donald Shenaj et.al. 2412.05148 link
2024-12-06 Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models Zehao Wang et.al. 2412.04939 null
2024-12-06 EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation Yongxin Wang et.al. 2412.04903 null
2024-12-06 Parametric-ControlNet: Multimodal Control in Foundation Models for Precise Engineering Design Synthesis Rui Zhou et.al. 2412.04707 null
2024-12-05 Assessing and Learning Alignment of Unimodal Vision and Language Models Le Zhang et.al. 2412.04616 null
2024-12-05 p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay Jun Zhang et.al. 2412.04449 link
2024-12-05 EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios Lu Qiu et.al. 2412.04447 null
2024-12-05 GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration Kaiyi Huang et.al. 2412.04440 null
2024-12-05 Grounding Descriptions in Images informs Zero-Shot Visual Recognition Shaunak Halbe et.al. 2412.04429 link
2024-12-05 Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Jiuhai Chen et.al. 2412.04424 link
2024-12-05 Liquid: Language Models are Scalable Multi-modal Generators Junfeng Wu et.al. 2412.04332 link
2024-12-05 FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression Bo Tong et.al. 2412.04317 link
2024-12-04 VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding Chaoyu Li et.al. 2412.03735 null
2024-12-04 DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation Qingdong He et.al. 2412.03255 null
2024-12-04 Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges Minghao Shao et.al. 2412.03220 null
2024-12-04 ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning Zhe Xie et.al. 2412.03104 link
2024-12-03 AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? Kaixiong Gong et.al. 2412.02611 null
2024-12-03 Multimodal Remote Sensing Scene Classification Using VLMs and Dual-Cross Attention Networks Jinjin Cai et.al. 2412.02531 null
2024-12-03 VR Based Emotion Recognition Using Deep Multimodal Fusion With Biosignals Across Multiple Anatomical Domains Pubudu L. Indrasiri et.al. 2412.02283 null
2024-12-03 Personalized Multimodal Large Language Models: A Survey Junda Wu et.al. 2412.02142 null
2024-12-03 WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image Yuci Liang et.al. 2412.02141 null
2024-12-03 Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey Yunkai Dang et.al. 2412.02104 null
2024-12-02 PKRD-CoT: A Unified Chain-of-thought Prompting for Multi-Modal Large Language Models in Autonomous Driving Xuewen Luo et.al. 2412.02025 null
2024-12-02 MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models Xiaomin Li et.al. 2412.01343 null
2024-12-02 Enhancing Perception Capabilities of Multimodal LLMs with Training-free Fusion Zhuokun Chen et.al. 2412.01289 null
2024-12-02 Ponder & Press: Advancing Visual GUI Agent towards General Computer Control Yiqin Wang et.al. 2412.01268 null
2024-12-02 T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs Shukang Yin et.al. 2411.19951 link
2024-11-29 VLSBench: Unveiling Visual Leakage in Multimodal Safety Xuhao Hu et.al. 2411.19939 link
2024-11-29 On Domain-Specific Post-Training for Multimodal Large Language Models Daixuan Cheng et.al. 2411.19930 null
2024-11-29 Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings Qiong Wu et.al. 2411.19628 link
2024-11-28 Libra: Leveraging Temporal Images for Biomedical Radiology Analysis Xi Zhang et.al. 2411.19378 link
2024-11-28 SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation Yuhan Pei et.al. 2411.19182 null
2024-11-28 Detailed Object Description with Controllable Dimensions Xinran Wang et.al. 2411.19106 link
2024-11-28 I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting Nicola Fanelli et.al. 2411.19050 link
2024-11-28 DuetML: Human-LLM Collaborative Machine Learning Framework for Non-Expert Users Wataru Kawabe et.al. 2411.18908 null
2024-11-27 Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment Soumya Suvra Ghosal et.al. 2411.18688 null
2024-11-27 Cross-modal Information Flow in Multimodal Large Language Models Zhi Zhang et.al. 2411.18620 link
2024-11-27 GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation Pengfei Zhou et.al. 2411.18499 null
2024-11-27 ChatRex: Taming Multimodal LLM for Joint Perception and Understanding Qing Jiang et.al. 2411.18363 link
2024-11-27 Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models Jingming Liu et.al. 2411.18142 null
2024-11-26 NEMO: Can Multimodal LLMs Identify Attribute-Modified Objects? Jiaxuan Li et.al. 2411.17794 null
2024-11-26 Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration Yuhang Han et.al. 2411.17686 null
2024-11-26 What Differentiates Educational Literature? A Multimodal Fusion Approach of Transformers and Computational Linguistics Jordan J. Bird et.al. 2411.17593 null
2024-11-26 Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey Jiayi Kuang et.al. 2411.17558 null
2024-11-26 InsightEdit: Towards Better Instruction Following for Image Editing Yingjing Xu et.al. 2411.17323 null
2024-11-26 in-Car Biometrics (iCarB) Datasets for Driver Recognition: Face, Fingerprint, and Voice Vedrana Krivokuca Hahn et.al. 2411.17305 null
2024-11-26 A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs Lehan He et.al. 2411.17265 null
2024-11-26 HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator Fan Yang et.al. 2411.17261 null
2024-11-26 Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment Zheng Chen et.al. 2411.17237 link
2024-11-26 DOGE: Towards Versatile Visual Document Grounding and Referring Yinan Zhou et.al. 2411.17125 null
2024-11-26 Multimodal Alignment and Fusion: A Survey Songtao Li et.al. 2411.17040 null
2024-11-25 TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation Linqing Zhong et.al. 2411.16425 null
2024-11-25 Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models Hao Yi et.al. 2411.16201 null
2024-11-25 Interpreting Object-level Foundation Models via Visual Precision Search Ruoyu Chen et.al. 2411.16198 link
2024-11-25 ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration Haozhan Shen et.al. 2411.16044 link
2024-11-23 Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark Rong-Cheng Tu et.al. 2411.15488 link
2024-11-23 Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy Te Yang et.al. 2411.15453 null
2024-11-22 MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs Chaoyou Fu et.al. 2411.15296 link
2024-11-22 VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement Daeun Lee et.al. 2411.15115 null
2024-11-22 mR $^2$ AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA Tao Zhang et.al. 2411.15041 null
2024-11-22 De-biased Multimodal Electrocardiogram Analysis Haitao Li et.al. 2411.14795 null
2024-11-22 Evaluating and Advancing Multimodal Large Language Models in Ability Lens Feng Chen et.al. 2411.14725 null
2024-11-22 FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data Binqian Xu et.al. 2411.14717 link
2024-11-22 Any-to-3D Generation via Hybrid Diffusion Supervision Yijun Fan et.al. 2411.14715 null
2024-11-21 LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval Weiheng Lu et.al. 2411.14505 link
2024-11-21 Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Yuhao Dong et.al. 2411.14432 link
2024-11-21 Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding Yiming Zhang et.al. 2411.14401 null
2024-11-21 Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance Haozhe Zhao et.al. 2411.14279 null
2024-11-21 Separable Mixture of Low-Rank Adaptation for Continual Visual Instruction Tuning Ziqi Wang et.al. 2411.13949 null
2024-11-21 Panther: Illuminate the Sight of Multimodal LLMs with Instruction-Guided Visual Prompts Honglin Li et.al. 2411.13909 null
2024-11-20 Decompose and Leverage Preferences from Expert Models for Improving Trustworthiness of MLLMs Rui Cao et.al. 2411.13697 link
2024-11-20 AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations Gaurav Verma et.al. 2411.13451 null
2024-11-20 DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving Xianda Guo et.al. 2411.13112 link
2024-11-20 Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving Hao Zhou et.al. 2411.13076 null
2024-11-19 Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models Zhen Zeng et.al. 2411.12790 null
2024-11-19 Automated 3D Physical Simulation of Open-world Scene with Gaussian Splatting Haoyu Zhao et.al. 2411.12789 null
2024-11-19 Visual Cue Enhancement and Dual Low-Rank Adaptation for Efficient Visual Instruction Fine-Tuning Pengkun Jiao et.al. 2411.12787 null
2024-11-19 Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model Yiming Shi et.al. 2411.12783 null
2024-11-18 Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning Xudong Yan et.al. 2411.12584 link
2024-11-19 CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model Dongyoung Go et.al. 2411.12287 null
2024-11-18 AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning Kun Xiang et.al. 2411.11930 link
2024-11-18 Dissecting Misalignment of Multimodal Large Language Models via Influence Function Lijie Hu et.al. 2411.11667 null
2024-11-18 MAIRA-Seg: Enhancing Radiology Report Generation with Segmentation-Aware Multimodal Large Language Models Harshita Sharma et.al. 2411.11362 null
2024-11-18 CCExpert: Advancing MLLM Capability in Remote Sensing Change Captioning with Difference-Aware Integration and a Foundational Dataset Zhiming Wang et.al. 2411.11360 link
2024-11-18 MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis Yingjie Zhou et.al. 2411.11235 null
2024-11-19 Multilingual Large Language Models: A Systematic Survey Shaolin Zhu et.al. 2411.11072 link
2024-11-19 VidComposition: Can MLLMs Analyze Compositions in Compiled Videos? Yunlong Tang et.al. 2411.10979 null
2024-11-17 Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering Zeping Yu et.al. 2411.10950 link
2024-11-17 Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning Wenke Huang et.al. 2411.10928 null
2024-11-16 BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization Md. Nazmus Sadat Samin et.al. 2411.10879 link
2024-11-16 Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts Jinqiang Long et.al. 2411.10669 link
2024-11-15 Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Weiyun Wang et.al. 2411.10442 null
2024-11-15 Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization Yuhan Fu et.al. 2411.10436 null
2024-11-15 Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting Ziqi Xie et.al. 2411.10309 link
2024-11-15 Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning Jingru Yang et.al. 2411.10252 null
2024-11-15 CMATH: Cross-Modality Augmented Transformer with Hierarchical Variational Distillation for Multimodal Emotion Recognition in Conversation Xiaofei Zhu et.al. 2411.10060 null
2024-11-15 VMID: A Multimodal Fusion LLM Framework for Detecting and Identifying Misinformation of Short Videos Weihao Zhong et.al. 2411.10032 null
2024-11-15 Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs Xiaofeng Zhang et.al. 2411.09968 null
2024-11-14 MagicQuill: An Intelligent Interactive Image Editing System Zichen Liu et.al. 2411.09703 link
2024-11-14 Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models Wei Wang et.al. 2411.09691 null
2024-11-14 Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models Chutian Meng et.al. 2411.09449 null
2024-11-14 Spider: Any-to-Many Multimodal LLM Jinxiang Lai et.al. 2411.09439 link
2024-11-14 LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation Zhenshi Li et.al. 2411.09301 link
2024-11-13 Multimodal Instruction Tuning with Hybrid State Space Models Jianing Zhou et.al. 2411.08840 null
2024-11-13 Can MLLMs Guide Weakly-Supervised Temporal Action Localization Tasks? Quan Zhang et.al. 2411.08466 null
2024-11-13 Material Property Prediction with Element Attribute Knowledge Graphs and Multimodal Representation Learning Chao Huang et.al. 2411.08414 null
2024-11-12 SimBase: A Simple Baseline for Temporal Video Grounding Peijun Bao et.al. 2411.07945 null
2024-11-12 Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding Zirui Shao et.al. 2411.07722 null
2024-11-12 Zer0-Jack: A Memory-efficient Gradient-based Jailbreaking Method for Black-box Multi-modal Large Language Models Tiejin Chen et.al. 2411.07559 null
2024-11-11 Multimodal Fusion Balancing Through Game-Theoretic Regularization Konstantinos Kontras et.al. 2411.07335 null
2024-11-11 CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models Junho Kim et.al. 2411.06869 null
2024-11-11 Learning from Feedback: Semantic Enhancement for Object SLAM Using Foundation Models Jungseok Hong et.al. 2411.06752 null
2024-11-10 KMM: Key Frame Mask Mamba for Extended Motion Generation Zeyu Zhang et.al. 2411.06481 link
2024-11-09 A Comprehensive Survey and Guide to Multimodal Large Language Models in Vision-Language Tasks Chia Xin Liang et.al. 2411.06284 null
2024-11-09 An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models Fatemeh Shiri et.al. 2411.06048 link
2024-11-08 Exploring the Alignment Landscape: LLMs and Geometric Deep Models in Protein Representation Dong Shu et.al. 2411.05316 link
2024-11-08 Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding Jaeyoo Park et.al. 2411.05254 null
2024-11-07 On Erroneous Agreements of CLIP Image Embeddings Siting Li et.al. 2411.05195 null
2024-11-07 Seeing is Deceiving: Exploitation of Visual Pathways in Multi-Modal Language Models Pete Janowczyk et.al. 2411.05056 null
2024-11-07 CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM Jingwei Xu et.al. 2411.04954 null
2024-11-07 GUI Agents with Foundation Models: A Comprehensive Survey Shuai Wang et.al. 2411.04890 null
2024-11-07 Exploring Hierarchical Molecular Graph Representation in Multimodal LLMs Chengxin Hu et.al. 2411.04708 null
2024-11-06 Improving Bilingual Capabilities of Language Models to Support Diverse Linguistic Practices in Education Anand Syamkumar et.al. 2411.04308 null
2024-11-06 Analyzing Multimodal Features of Spontaneous Voice Assistant Commands for Mild Cognitive Impairment Detection Nana Lin et.al. 2411.04158 null
2024-11-06 Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination Dingjie Song et.al. 2411.03823 link
2024-11-06 StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding Junming Lin et.al. 2411.03628 link
2024-11-05 MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning Ziliang Gan et.al. 2411.03314 null
2024-11-05 Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation? Jingyu Xiao et.al. 2411.03292 link
2024-11-06 Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent Yangning Li et.al. 2411.02937 link
2024-11-05 Toward Robust Incomplete Multimodal Sentiment Analysis via Hierarchical Representation Learning Mingcheng Li et.al. 2411.02793 null
2024-11-05 Multimodal Commonsense Knowledge Distillation for Visual Question Answering Shuo Yang et.al. 2411.02722 null
2024-11-05 Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios Yunkai Dang et.al. 2411.02708 null
2024-11-04 MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs Sheng-Chieh Lin et.al. 2411.02571 null
2024-11-04 DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution Yang Yue et.al. 2411.02359 link
2024-11-04 KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension Jie Yang et.al. 2411.01846 null
2024-11-04 ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model Yiming Sun et.al. 2411.01756 null
2024-11-03 UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models Sejoon Oh et.al. 2411.01703 null
2024-11-03 Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation Seongsu Ha et.al. 2411.01494 null
2024-11-02 Can Multimodal Large Language Model Think Analogically? Diandian Guo et.al. 2411.01307 null
2024-11-02 Reasoning Limitations of Multimodal Large Language Models. A case study of Bongard Problems Mikołaj Małkiński et.al. 2411.01173 null
2024-11-01 Exploring Multi-Modality Dynamics: Insights and Challenges in Multimodal Fusion for Biomedical Tasks Laura Wenderoth et.al. 2411.00725 null
2024-11-01 Unified Generative and Discriminative Training for Multi-modal Large Language Models Wei Chow et.al. 2411.00304 null
2024-10-31 JEMA: A Joint Embedding Framework for Scalable Co-Learning with Multimodal Alignment Joao Sousa et.al. 2410.23988 null
2024-10-31 Leveraging LLMs for MT in Crisis Scenarios: a blueprint for low-resource languages Séamus Lankford et.al. 2410.23890 null
2024-10-31 Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual Grounding Jinlong He et.al. 2410.23822 null
2024-10-30 PIP-MM: Pre-Integrating Prompt Information into Visual Encoding via Existing MLLM Structures Tianxiang Wu et.al. 2410.23089 null
2024-10-29 Unsupervised Multimodal Fusion of In-process Sensor Data for Advanced Manufacturing Process Monitoring Matthew McKinney et.al. 2410.22558 null
2024-10-29 Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench Zheyuan Liu et.al. 2410.22108 link
2024-10-28 LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior Hanyu Wang et.al. 2410.21264 null
2024-10-28 Face-MLLM: A Large Face Perception Model Haomiao Sun et.al. 2410.20717 null
2024-10-27 Deep Learning-Driven Microstructure Characterization and Vickers Hardness Prediction of Mg-Gd Alloys Lu Wang et.al. 2410.20402 null
2024-10-26 LLMs Can Evolve Continually on Modality for X-Modal Reasoning Jiazuo Yu et.al. 2410.20178 link
2024-10-25 Evaluating Cost-Accuracy Trade-offs in Multimodal Search Relevance Judgements Silvia Terragni et.al. 2410.19974 null
2024-10-25 Improving Multimodal Large Language Models Using Continual Learning Shikhar Srivastava et.al. 2410.19925 null
2024-10-25 TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning Xiangyu Zeng et.al. 2410.19702 null
2024-10-28 BIFRÖST: 3D-Aware Image compositing with Language Instructions Lingxiao Li et.al. 2410.19079 link
2024-10-24 Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms Zhangheng Li et.al. 2410.18967 null
2024-10-24 SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models Zonghao Ying et.al. 2410.18927 null
2024-10-24 Distill Visual Chart Reasoning Ability from LLMs to MLLMs Wei He et.al. 2410.18798 link
2024-10-24 DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation Yuang Ai et.al. 2410.18666 link
2024-10-25 Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks Lehan Wang et.al. 2410.18387 null
2024-10-23 TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts Yuxuan Xie et.al. 2410.18071 null
2024-10-23 CLEAR: Character Unlearning in Textual and Visual Modalities Alexey Dontsov et.al. 2410.18057 null
2024-10-23 Addressing Asynchronicity in Clinical Multimodal Fusion via Individualized Chest X-ray Generation Wenfang Yao et.al. 2410.17918 link
2024-10-23 ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning Zhiwei Hao et.al. 2410.17779 link
2024-10-23 YOLO-Vehicle-Pro: A Cloud-Edge Collaborative Framework for Object Detection in Autonomous Driving under Adverse Weather Conditions Xiguang Li et.al. 2410.17734 null
2024-10-23 Responsible Multilingual Large Language Models: A Survey of Development, Applications, and Societal Impact Junhua Liu et.al. 2410.17532 null
2024-10-22 LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Xiaoqian Shen et.al. 2410.17434 link
2024-10-22 Order Matters: Exploring Order Sensitivity in Multimodal Large Language Models Zhijie Tan et.al. 2410.16983 null
2024-10-22 IPL: Leveraging Multimodal Large Language Models for Intelligent Product Listing Kang Chen et.al. 2410.16977 null
2024-10-22 Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance Zhangwei Gao et.al. 2410.16261 link
2024-10-21 LLaVA-KD: A Framework of Distilling Multimodal Large Language Models Yuxuan Cai et.al. 2410.16236 link
2024-10-21 Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining Han Huang et.al. 2410.16166 link
2024-10-21 Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Xiang Yue et.al. 2410.16153 null
2024-10-21 Mitigating Object Hallucination via Concentric Causal Attention Yun Xing et.al. 2410.15926 link
2024-10-21 AMPLE: Emotion-Aware Multimodal Fusion Prompt Learning for Fake News Detection Xiaoman Xu et.al. 2410.15591 link
2024-10-20 Generalized Multimodal Fusion via Poisson-Nernst-Planck Equation Jiayu Xiong et.al. 2410.15475 null
2024-10-20 Modality-Fair Preference Optimization for Trustworthy MLLM Alignment Songtao Jiang et.al. 2410.15334 null
2024-10-19 SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation Jingxuan Chen et.al. 2410.15164 link
2024-10-19 LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound Xuechen Guo et.al. 2410.15074 null
2024-10-18 MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps Xiongtao Zhou et.al. 2410.14668 link
2024-10-18 MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems Zifeng Zhu et.al. 2410.14179 link
2024-10-18 RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training Muhe Ding et.al. 2410.14154 null
2024-10-17 PUMA: Empowering Unified MLLM with Multi-granular Visual Generation Rongyao Fang et.al. 2410.13861 link
2024-10-17 $γ-$ MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models Yaxin Luo et.al. 2410.13859 null
2024-10-17 Can MLLMs Understand the Deep Implication Behind Chinese Images? Chenhao Zhang et.al. 2410.13854 link
2024-10-18 Harnessing Webpage UIs for Text-Rich Visual Understanding Junpeng Liu et.al. 2410.13824 null
2024-10-17 MobA: A Two-Level Agent System for Efficient Mobile Task Automation Zichen Zhu et.al. 2410.13757 link
2024-10-17 Exploring the Design Space of Visual Context Representation in Video MLLMs Yifan Du et.al. 2410.13694 link
2024-10-17 Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant Haoran Hao et.al. 2410.13360 link
2024-10-16 MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs Yunqiu Xu et.al. 2410.12332 null
2024-10-16 Understanding the Role of LLMs in Multimodal Evaluation Benchmarks Botian Jiang et.al. 2410.12329 link
2024-10-16 Multimodal Fusion with Relational Learning for Molecular Property Prediction Zhengyang Zhou et.al. 2410.12128 null
2024-10-15 MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding Yue Cao et.al. 2410.11829 link
2024-10-15 MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation Chenxi Wang et.al. 2410.11779 link
2024-10-15 SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding Ying Chen et.al. 2410.11761 null
2024-10-15 Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions Yuhan Fu et.al. 2410.11701 null
2024-10-15 VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI Sijie Cheng et.al. 2410.11623 null
2024-10-15 MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark Bin Shan et.al. 2410.11538 link
2024-10-15 Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs Sihang Zhao et.al. 2410.11437 link
2024-10-15 Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models Zhongye Liu et.al. 2410.11242 link
2024-10-15 MANet: Fine-Tuning Segment Anything Model for Multimodal Remote Sensing Semantic Segmentation Xianping Ma et.al. 2410.11160 link
2024-10-14 Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes Tim Broedermann et.al. 2410.10791 link
2024-10-14 MMCFND: Multimodal Multilingual Caption-aware Fake News Detection for Low-resource Indic Languages Shubhi Bansal et.al. 2410.10407 link
2024-10-14 Spatial-Aware Efficient Projector for MLLMs via Multi-Layer Feature Aggregation Shun Qian et.al. 2410.10319 null
2024-10-14 ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization Jiawei Li et.al. 2410.10238 null
2024-10-14 Tracing Human Stress from Physiological Signals using UWB Radar Jia Xu et.al. 2410.10155 null
2024-10-15 LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models Han Qiu et.al. 2410.09962 link
2024-10-13 Improving Colorectal Cancer Screening and Risk Assessment through Predictive Modeling on Medical Images and Records Shuai Jiang et.al. 2410.09880 null
2024-10-13 Text4Seg: Reimagining Image Segmentation as Text Generation Mengcheng Lan et.al. 2410.09855 link
2024-10-12 Skipping Computations in Multimodal LLMs Mustafa Shukor et.al. 2410.09454 link
2024-10-12 MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection Xi Jiang et.al. 2410.09453 link
2024-10-11 Multi-modal Fusion based Q-distribution Prediction for Controlled Nuclear Fusion Shiao Wang et.al. 2410.08879 null
2024-10-11 Dual-AEB: Synergizing Rule-Based and Multimodal Large Language Models for Effective Emergency Braking Wei Zhang et.al. 2410.08616 null
2024-10-11 Baichuan-Omni Technical Report Yadong Li et.al. 2410.08565 link
2024-10-11 SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models Haotian Xia et.al. 2410.08474 link
2024-10-10 Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training Gen Luo et.al. 2410.08202 null
2024-10-10 Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models Qingni Wang et.al. 2410.08174 null
2024-10-10 Agent S: An Open Agentic Framework that Uses Computers Like a Human Saaket Agashe et.al. 2410.08164 link
2024-10-10 Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs Xiaoyuan Liu et.al. 2410.08145 link
2024-10-09 Retrieval Replace Reduction: An effective visual token reduction method via semantic match Yingen Liu et.al. 2410.07278 null
2024-10-09 Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis Bohan Zeng et.al. 2410.07155 link
2024-10-09 Personalized Visual Instruction Tuning Renjie Pi et.al. 2410.07113 link
2024-10-10 Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology Xiangyu Wang et.al. 2410.07087 null
2024-10-09 HERM: Benchmarking and Enhancing Multimodal LLMs for Human-Centric Understanding Keliang Li et.al. 2410.06777 null
2024-10-09 To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models Junyan Lin et.al. 2410.06765 link
2024-10-09 ING-VP: MLLMs cannot Play Easy Vision-based Games Yet Haoran Zhang et.al. 2410.06555 link
2024-10-09 Gumbel Rao Monte Carlo based Bi-Modal Neural Architecture Search for Audio-Visual Deepfake Detection Aravinda Reddy PN et.al. 2410.06543 null
2024-10-08 Multimodal Situational Safety Kaiwen Zhou et.al. 2410.06172 null
2024-10-08 Quadratic Is Not What You Need For Multimodal Large Language Models Phu Pham et.al. 2410.06169 link
2024-10-08 $\textit{X}^2$-DFD: A framework for e${X}$plainable and e${X}$ tendable Deepfake Detection Yize Chen et.al. 2410.06126 null
2024-10-07 Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents Boyu Gou et.al. 2410.05243 link
2024-10-07 Organizing Unstructured Image Collections using Natural Language Mingxuan Liu et.al. 2410.05217 null
2024-10-07 Multimodal Fusion Strategies for Mapping Biophysical Landscape Features Lucia Gordon et.al. 2410.04833 link
2024-10-07 MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models Kaichen Huang et.al. 2410.04819 link
2024-10-07 **Mitigating

About

Automatically Update Arxiv Papers Daily using Github Actions (Update Every 8th hours)

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages