Updated on 2025.08.14

[![Contributors][contributors-shield]][contributors-url] [![Forks][forks-shield]][forks-url] [![Stargazers][stars-shield]][stars-url] [![Issues][issues-shield]][issues-url]

Updated on 2025.08.14

Website

You can learn directly from this page

Table of Contents

Tracking
HDR
Low-Level
Image Matching
MutilModal
Prompt

Tracking

Publish Date	Title	Authors	PDF	Code
2025-07-23	Giant Damping-like Torque Efficiency via Synergistic Spin Hall and enhanced Orbital Hall Effects	Subhakanta Das et.al.	2507.17372	null
2025-07-21	Is Tracking really more challenging in First Person Egocentric Vision?	Matteo Dunnhofer et.al.	2507.16015	null
2025-07-18	AeroThrow: An Autonomous Aerial Throwing System for Precise Payload Delivery	Ziliang Li et.al.	2507.13903	null
2025-07-14	Vision-Based Anti Unmanned Aerial Technology: Opportunities and Challenges	Guanghai Ding et.al.	2507.10006	null
2025-07-12	MVPinn: Integrating Milne-Eddington Inversion with Physics-Informed Neural Networks for GST/NIRIS Observations	Qin Li et.al.	2507.09430	null
2025-07-08	Can We Predict Your Next Move Without Breaking Your Privacy?	Arpita Soni et.al.	2507.08843	null
2025-07-11	SAM2RL: Towards Reinforcement Learning Memory Control in Segment Anything Model 2	Alen Adamyan et.al.	2507.08548	null
2025-07-10	Temporal Unlearnable Examples: Preventing Personal Video Data from Unauthorized Exploitation by Object Tracking	Qiangqiang Wu et.al.	2507.07483	null
2025-07-02	TrackingMiM: Efficient Mamba-in-Mamba Serialization for Real-time UAV Object Tracking	Bingxi Liu et.al.	2507.01535	null
2025-07-01	UMDATrack: Unified Multi-Domain Adaptive Tracking Under Adverse Weather Conditions	Siyuan Yao et.al.	2507.00648	null
2025-06-30	Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking	Shiao Wang et.al.	2506.23783	null
2025-07-22	R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning	Biao Wang et.al.	2506.21980	null
2025-06-23	Lightweight RGB-T Tracking with Mobile Vision Transformers	Mahdi Falaki et.al.	2506.19154	null
2025-06-18	SOT Enabled 3D Magnetic Field Sensor with Low Offset and High Sensitivity	Sebastian Zeilinger et.al.	2506.15320	null
2025-06-17	Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios	Aswin Shanmugam Subramanian et.al.	2506.14204	null
2025-06-15	Learning Unpaired Image Dehazing with Physics-based Rehazy Generation	Haoyou Deng et.al.	2506.12824	null
2025-06-15	SC-SOT: Conditioning the Decoder on Diarized Speaker Information for End-to-End Overlapped Speech Recognition	Yuta Hirano et.al.	2506.12672	null
2025-06-12	Joint ASR and Speaker Role Tagging with Serialized Output Training	Anfeng Xu et.al.	2506.10349	null
2025-06-09	Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition	Asahi Sakuma et.al.	2506.07515	null
2025-06-06	Diarization-Aware Multi-Speaker Automatic Speech Recognition via Large Language Models	Yuke Lin et.al.	2506.05796	null
2025-06-03	MVTD: A Benchmark Dataset for Maritime Visual Object Tracking	Ahsan Baidar Bakht et.al.	2506.02866	null
2025-05-28	Nanoscale quantum imaging of field-free deterministic switching of a chiral antiferromagnet	Jingcheng Zhou et.al.	2505.22856	null
2025-05-27	Fully Spiking Neural Networks for Unified Frame-Event Object Tracking	Jingjun Yang et.al.	2505.20834	null
2025-05-28	Progressive Scaling Visual Object Tracking	Jack Hong et.al.	2505.19990	null
2025-05-26	Systems of Twinned Systems: A Systematic Literature Review	Feyi Adesanya et.al.	2505.19916	link
2025-05-26	Comparison of Polar Magnetic Fields Derived from MILOS and MERLIN Inversions with Hinode/SOT-SP Data	Masahito Kubo et.al.	2505.19468	null
2025-05-23	Adapting SAM 2 for Visual Object Tracking: 1st Place Solution for MMVPR Challenge Multi-Modal Tracking	Cheng-Yen Yang et.al.	2505.18111	null
2025-05-19	Towards Low-Latency Event Stream-based Visual Object Tracking: A Slow-Fast Approach	Shiao Wang et.al.	2505.12903	link
2025-05-30	Effect of crystallinity on spin-orbit torque in 5 $\textit{d}$ iridium oxide IrO$_{2}$	Tetsuro Morimoto et.al.	2505.10907	null
2025-05-14	Recent progress on electron- and magnon-mediated torques	Jia-Min Lai et.al.	2505.09257	null
2025-05-14	*Enhanced Spin Pumping and Magnetization dynamics in Ni ${80}$Fe${20}$/MoS$_2$ stack via interface modification*	Mahammad Tahir et.al.	2505.09248	null
2025-05-11	Nonlinear Model Predictive Control for Leaderless UAV Formation Flying with Collision Avoidance under Directed Graphs	Yiming Wang et.al.	2505.06895	null
2025-05-11	Streaming Sliced Optimal Transport	Khai Nguyen et.al.	2505.06835	link
2025-05-10	Nonlinearity Modulation of Auto-oscillations in Three-terminal Magnetic Tunnel Junctions	Zixi Wang et.al.	2505.06547	null
2025-05-06	Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation	Gabriele Rosi et.al.	2505.06280	link
2025-05-09	CGTrack: Cascade Gating Network with Hierarchical Feature Aggregation for UAV Tracking	Weihong Li et.al.	2505.05936	link
2025-05-08	A Simple Detector with Frame Dynamics is a Strong Tracker	Chenxu Peng et.al.	2505.04917	link
2025-05-06	Modality-Guided Dynamic Graph Fusion and Temporal Diffusion for Self-Supervised RGB-T Tracking	Shenglan Li et.al.	2505.03507	link
2025-05-02	Current-induced Dynamics of Bloch Domain-wall Bimerons	Jiwen Chen et.al.	2505.00959	null
2025-05-01	A High-resolution, Inversion-Based Synoptic Study of Solar Granulation	James Crowley et.al.	2505.00826	null
2025-05-01	DARTer: Dynamic Adaptive Representation Tracker for Nighttime UAV Tracking	Xuzhao Li et.al.	2505.00752	null
2025-04-24	RGB-D Tracking via Hierarchical Modality Aggregation and Distribution Network	Boyue Xu et.al.	2504.17595	null
2025-04-22	SonarT165: A Large-scale Benchmark and STFTrack Framework for Acoustic Object Tracking	Yunfeng Li et.al.	2504.15609	link
2025-04-19	Adversarial Attack for RGB-Event based Visual Object Tracking	Qiang Chen et.al.	2504.14423	link
2025-04-28	HyDra: SOT-CAM Based Vector Symbolic Macro for Hyperdimensional Computing	Md Mizanur Rahaman Nayan et.al.	2504.14020	null
2025-04-18	FocusTrack: A Self-Adaptive Local Sampling Algorithm for Efficient Anti-UAV Tracking	Ying Wang et.al.	2504.13604	link
2025-04-17	TAXI: Traveling Salesman Problem Accelerator with X-bar-based Ising Macros Powered by SOT-MRAMs and Hierarchical Clustering	Sangmin Yoo et.al.	2504.13294	null
2025-04-16	Efficient spin-orbit torque driven magnetization switching of GdFe using phosphorus-implanted platinum layers	Kazuki Shintaku et.al.	2504.11796	null
2025-04-15	Chiral Domain Walls Induced by Radially Magnetized Nanotube Geometry	Nobuyuki Umetsu et.al.	2504.11005	null
2025-04-16	Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution	Chenghao Li et.al.	2504.09566	link
2025-04-13	Sub-nanosecond in-plane magnetization switching induced by field-like spin-orbit torques from ferromagnets	Hanying Zhang et.al.	2504.09431	null
2025-04-12	Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking	You Wu et.al.	2504.09228	link
2025-04-11	Bayesian Reasoning Enabled by Spin-Orbit Torque Magnetic Tunnel Junctions	Yingqian Xu et.al.	2504.08257	null
2025-04-08	Magnetic Memory Driven by Orbital Current	Jingkai Xu et.al.	2504.05780	null
2025-04-07	Dimensionality Enhanced Out-of-Plane Spin Currents in NbIrTe $_4$ for Efficient Field-Free Switching of Perpendicular Magnetization	Wei Yang et.al.	2504.05280	null
2025-04-02	Shape Anisotropy Enabled Field Free Switching of Perpendicular Nanomagnets	Akanksha Chouhan et.al.	2504.01634	null
2025-03-31	Symmetry Enhanced Unconventional Spin Current Anisotropy in a Collinear Antiferromagnet	Pankhuri Gupta et.al.	2503.20545	null
2025-03-26	Intrinsic back-switching phenomenon in SOT-MRAM devices	Kuldeep Ray et.al.	2503.19840	null
2025-03-22	MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking	Haolin Qin et.al.	2503.17699	link
2025-04-07	Strong Baseline: Multi-UAV Tracking via YOLOv12 with BoT-SORT-ReID	Yu-Hsi Chen et.al.	2503.17237	link
2025-03-21	Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks	Haijin Zeng et.al.	2503.16930	null
2025-03-21	Dynamic Attention Mechanism in Spatiotemporal Memory Networks for Object Tracking	Meng Zhou et.al.	2503.16768	null
2025-03-17	UncTrack: Reliable Visual Object Tracking with Uncertainty-Aware Prototype Memory Network	Siyuan Yao et.al.	2503.12888	link
2025-03-16	Equivalent-Circuit Thermal Model for Batteries with One-Shot Parameter Identification	Myisha A. Chowdhury et.al.	2503.12616	null
2025-03-13	Target-aware Bidirectional Fusion Transformer for Aerial Object Tracking	Xinglong Sun et.al.	2503.09951	null
2025-03-09	Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking	Chaocan Xue et.al.	2503.06625	link
2025-03-09	Dynamic Updates for Language Adaptation in Visual-Language Tracking	Xiaohai Li et.al.	2503.06621	link
2025-03-06	High resolution spectra of the [6297-6303] and [6361-6367] Angstr{ö}m domains (including forbidden OI lines) of the Sun and brightest stars	Jean-Marie Malherbe et.al.	2503.05832	null
2025-03-07	Separating the bulk and interface contribution of spin-orbit torque in ferromagnet-Heavy metal bilayers tuned by variation of resistivity of heavy metal	Abu Bakkar Miah et.al.	2503.05341	null
2025-03-07	Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching	Simon A. Aytes et.al.	2503.05179	link
2025-03-02	Inefficiency of the orbit Hall effect on spin torque in transition metal/ferromagnet bilayers	Yizhuo Song et.al.	2503.00910	null
2025-02-27	MITracker: Multi-View Integration for Visual Object Tracking	Mengjie Xu et.al.	2502.20111	null
2025-03-08	Dynamic Degradation Decomposition Network for All-in-One Image Restoration	Huiqiang Wang et.al.	2502.19068	null
2025-02-25	UASTrack: A Unified Adaptive Selection Framework with Modality-Customization in Single Object Tracking	He Wang et.al.	2502.18220	null
2025-02-24	Symmetry-breaking effects on spin-orbit torque switching in ferromagnetic semiconductors with perpendicular magnetic anisotropy	Apu Kumar Jana et.al.	2502.16788	null
2025-02-17	Effects of antiferromagnetic coupling and pinning on domain wall dynamics in synthetic ferrimagnets	Sougata Mallick et.al.	2502.11621	null
2025-02-13	Modelling spin-orbitronics effects at interfaces and chiral molecules	Poonam Kumari et.al.	2502.09239	null
2025-02-12	Highly efficient field-free switching by orbital Hall torque in a MoS2-based device operating at room temperature	Antonio Bianco et.al.	2502.08483	null
2025-02-08	Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark	Shiao Wang et.al.	2502.05574	link
2025-02-06	Visualizing Field-free Deterministic Magnetic Switching of all-van der Waals Spin-Orbit Torque System Using Spin Ensembles in Hexagonal Boron Nitride	Xi Zhang et.al.	2502.04561	null
2025-01-27	Investigation of Sub-configurations Reveals Stable Spin-Orbit Torque Switching Polarity in Polycrystalline Mn3Sn	Boyu Zhao et.al.	2501.15815	null
2025-01-25	Thermal Stability and Depinning Currents of Domain Wall-Based Artificial Synapses	Guntas Kaur et.al.	2501.15102	null
2025-02-16	Enhancing Unconventional Spin-Orbit Torque Efficiency: Numerical Study on the Influence of Crystallographic Texture and Polycrystalline Effects on Low-Symmetry Materials	Yifei Yang et.al.	2501.14200	null
2025-01-22	Enhanced Field-Free Perpendicular Magnetization Switching via spin splitting torque in Altermagnetic RuO2-based Heterostructures	Badsha Sekh et.al.	2501.12593	null
2025-01-18	Multilayered MXenes for future two-dimensional nonvolatile magnetic memories	P. Kumar et.al.	2501.10678	null
2025-01-13	Robust Single Object Tracking in LiDAR Point Clouds under Adverse Weather Conditions	Xiantong Zhao et.al.	2501.07133	null
2025-01-11	ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation	Xuanle Zhao et.al.	2501.06598	link
2025-01-18	BTMTrack: Robust RGB-T Tracking via Dual-template Bridging and Temporal-Modal Candidate Elimination	Zhongxuan Zhang et.al.	2501.03616	null
2025-01-05	DeTrack: In-model Latent Denoising Learning for Visual Object Tracking	Xinyu Zhou et.al.	2501.02467	null
2024-12-31	Alternative harmonic detection approach for quantitative determination of spin and orbital torques	Y. Xu et.al.	2501.00403	null
2024-12-30	An Experimental Study of Passive UAV Tracking with Digital Arrays and Cellular Downlink Signals	Yifei Sun et.al.	2412.20788	null
2024-12-30	Spin-orbit torque in a three-fold-symmetric bilayer and its effect on magnetization dynamics	Wuzhang Fang et.al.	2412.20746	null
2024-12-28	Learning Adaptive and View-Invariant Vision Transformer with Multi-Teacher Knowledge Distillation for Real-Time UAV Tracking	You Wu et.al.	2412.20002	link
2024-12-27	Enhancing Vision-Language Tracking by Effectively Converting Textual Cues into Visual Cues	X. Feng et.al.	2412.19648	link
2024-12-26	Semistrong edge colorings of planar graphs	Yuquan Lin et.al.	2412.19230	null
2024-12-26	SUTrack: Towards Simple and Unified Single Object Tracking	Xin Chen et.al.	2412.19138	link
2024-12-24	Linear Enhancement of Spin-Orbit Torques and Absence of Bulk Rashba-Type Spin Splitting in Perpendicularly Magnetized [Pt/Co/W]n Superlattices	Zhihao Yan et.al.	2412.18481	null
2024-12-24	Field-free current-induced magnetization switching of a room temperature van der Waals magnet for neuromorphic computing	Chenxi Zhou et.al.	2412.18429	null
2024-12-24	All-electric mimicking synaptic plasticity based on the noncollinear antiferromagnetic device	Cuimei Cao et.al.	2412.18418	null
2025-01-01	Unsupervised UAV 3D Trajectories Estimation with Sparse Point Clouds	Hanfang Liang et.al.	2412.12716	link
2024-12-15	Exploring Enhanced Contextual Information for Video-Level Object Tracking	Ben Kang et.al.	2412.11023	link
2024-12-13	Visual Object Tracking across Diverse Data Modalities: A Review	Mengmeng Wang et.al.	2412.09991	null
2024-12-09	Magnetic Switching in Monolayer 2D Diluted Magnetic Semiconductors via Spin-to- Spin Conversion	Siwei Chen et.al.	2412.06650	null
2024-12-09	Energy Efficient Stochastic Signal Manipulation in Superparamagnetic Tunnel Junctions via Voltage-Controlled Exchange Coupling	Qi Jia et.al.	2412.06256	null
2024-12-03	GSOT3D: Towards Generic 3D Single Object Tracking in the Wild	Yifan Jiao et.al.	2412.02129	link
2024-12-01	MambaNUT: Nighttime UAV Tracking via Mamba and Adaptive Curriculum Learning	You Wu et.al.	2412.00626	link
2024-11-29	Current-driven motion of magnetic domain-wall skyrmions	Haoyang Nie et.al.	2411.19566	null
2024-11-28	Unveiling the anisotropy of linear and nonlinear charge-spin conversion in Weyl semimetal TaIrTe4	Tao Tang et.al.	2411.19062	null
2024-12-04	A Distractor-Aware Memory for Visual Object Tracking with SAM2	Jovana Videnovic et.al.	2411.17576	link
2024-11-24	MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking	Chunhui Zhang et.al.	2411.15761	link
2024-11-23	How Texts Help? A Fine-grained Evaluation to Reveal the Role of Language in Vision-Language Tracking	Xuchen Li et.al.	2411.15600	null
2024-11-23	MambaVLT: Time-Evolving Multimodal State Space Model for Vision-Language Tracking	Xinqi Liu et.al.	2411.15459	null
2024-11-24	ClickTrack: Towards Real-time Interactive Single Object Tracking	Kuiran Wang et.al.	2411.13183	null
2024-11-30	SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory	Cheng-Yen Yang et.al.	2411.11922	link
2024-11-14	Compression Method for Solar Polarization Spectra Collected from Hinode SOT/SP Observations	Jargalmaa Batmunkh et.al.	2411.09311	null
2024-11-10	Orthogonal Spin-Orbit Torque-Induced Deterministic Switching in NiO	Yixiao Qiao et.al.	2411.06379	null
2024-11-08	Giant spin Hall effect with multi-directional spin components in Ni4W	Yifei Yang et.al.	2411.05682	null
2024-11-04	Single-layer spin-orbit-torque magnetization switching due to spin Berry curvature generated by minute spontaneous atomic displacement in a Weyl oxide	Hiroto Horiuchi et.al.	2411.01806	null
2024-11-04	ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model	Yiming Sun et.al.	2411.01756	null
2024-11-03	Capping layer dependent anti-correlation between magnetic damping and spin-orbital to charge conversion	Antarjami Sahoo et.al.	2411.01662	null
2024-11-01	Spin orbit torque-driven motion of quasi-Bloch domain wall in perpendicularly magnetized W/CoFeB/MgO structures	Nobuyuki Umetsu et.al.	2411.00516	null
2024-10-31	Origin of line broadening in fading granule: influence of small-scale turbulence	Ryohtaroh T. Ishikawa et.al.	2410.23654	null
2024-10-27	NT-VOT211: A Large-Scale Benchmark for Night-time Visual Object Tracking	Yu Liu et.al.	2410.20421	link
2024-10-25	Can Stories Help LLMs Reason? Curating Information Space Through Narrative	Vahid Sadiri Javadi et.al.	2410.19221	null
2024-10-19	The Solution for Single Object Tracking Task of Perception Test Challenge 2024	Zhiqiang Zhong et.al.	2410.16329	null
2024-10-14	A stronger form of Yamamoto's theorem II -- Spectral operators	Soumyashant Nayak et.al.	2410.16318	null
2024-10-03	Leveraging Event Streams with Deep Reinforcement Learning for End-to-End UAV Tracking	Ala Souissi et.al.	2410.14685	null
2024-10-16	DaDiff: Domain-aware Diffusion Model for Nighttime UAV Tracking	Haobo Zuo et.al.	2410.12270	link
2024-10-14	SMART-TRACK: A Novel Kalman Filter-Guided Sensor Fusion For Robust UAV Object Tracking in Dynamic Environments	Khaled Gabr et.al.	2410.10409	link
2024-10-09	DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM	Xuchen Li et.al.	2410.02492	null
2024-10-01	Energy-efficient picosecond spin-orbit torque magnetization switching in ferro- and ferrimagnetic films	Eva Díaz et.al.	2410.00474	null
2024-09-27	Improving Visual Object Tracking through Visual Prompting	Shih-Fang Chen et.al.	2409.18901	link
2024-09-27	Prompt-Driven Temporal Domain Adaptation for Nighttime UAV Tracking	Changhong Fu et.al.	2409.18533	link
2024-09-26	A 5T-2MTJ STT-assisted Spin Orbit Torque based Ternary Content Addressable Memory for Hardware Accelerators	Siri Narla et.al.	2409.17863	null
2024-09-26	General Compression Framework for Efficient Transformer Object Tracking	Lingyi Hong et.al.	2409.17564	null
2024-09-26	Dynamic Subframe Splitting and Spatio-Temporal Motion Entangled Sparse Attention for RGB-E Tracking	Pengcheng Shao et.al.	2409.17560	null
2024-09-25	Towards Underwater Camouflaged Object Tracking: An Experimental Evaluation of SAM and SAM 2	Chunhui Zhang et.al.	2409.16902	link
2024-09-25	Conditional Generative Denoiser for Nighttime UAV Tracking	Yucheng Wang et.al.	2409.16834	link
2024-09-25	Progressive Representation Learning for Real-Time UAV Tracking	Changhong Fu et.al.	2409.16652	link
2024-09-25	Enhancing Nighttime UAV Tracking with Light Distribution Suppression	Liangliang Yao et.al.	2409.16631	link
2024-09-24	Pulse Shaping Strategies for Efficient Switching of Magnetic Tunnel Junctions by Spin-Orbit Torque	Marco Hoffmann et.al.	2409.16454	null
2024-09-24	CloudTrack: Scalable UAV Tracking with Cloud Semantics	Yannik Blei et.al.	2409.16111	link
2024-09-20	A survey of sulfur-bearing molecular lines toward the dense cores in eleven massive protoclusters	Mengyao Tang et.al.	2409.13231	null
2024-09-19	Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC	Jiawen Kang et.al.	2409.12388	link
2024-09-11	Topological Spin-Orbit Torque in Ferrimagnetic Weyl Semimetal	Tomonari Meguro et.al.	2409.07106	null
2024-09-09	Effects of Interfacial Oxygen Diffusion on the Magnetic Properties and Thermal Stability of Pd/CoFeB/Pd/Ta Heterostructure	Saravanan Lakshmanan et.al.	2409.05783	null
2024-09-11	Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition	Hao Shi et.al.	2409.00815	null
2024-08-30	Advancing Multi-talker ASR Performance with Large Language Models	Mohan Shi et.al.	2408.17431	null
2024-08-30	Cross Fusion RGB-T Tracking with Bi-directional Adapter	Zhirong Zeng et.al.	2408.16979	null
2024-08-23	Energy-efficient field-free unconventional spin-orbit torque magnetization switching dynamics in van der Waals heterostructures	Lalit Pandey et.al.	2408.13095	null
2024-08-21	Low-Light Object Tracking: A Benchmark	Pengzhi Zhong et.al.	2408.11463	link
2024-08-20	MambaEVT: Event Stream based Visual Object Tracking using State Space Model	Xiao Wang et.al.	2408.10487	link
2024-08-19	Reconfigurable Spin Logics and High-density Multistate Memory in a Single Spin-orbit Torque Device	Raghvendra Posti et.al.	2408.09866	null
2024-08-16	Initialization-Free Multistate Memristor: Synergy of Spin-Orbit Torque and Magnetic Fields	Raghvendra Posti et.al.	2408.08641	null
2024-08-15	MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking	Simiao Lai et.al.	2408.07889	null
2024-08-12	Latent Disentanglement for Low Light Image Enhancement	Zhihao Zheng et.al.	2408.06245	null
2024-08-11	Comparative Evaluation of Memory Technologies for Synaptic Crossbar Arrays- Part 2: Design Knobs and DNN Accuracy Trends	Jeffry Victor et.al.	2408.05857	null
2024-08-05	VoxelTrack: Exploring Voxel Representation for 3D Point Cloud Object Tracking	Yuxuan Lu et.al.	2408.02263	null
2024-08-04	3D Single-object Tracking in Point Clouds with High Temporal Variation	Qiao Wu et.al.	2408.02049	null
2024-07-30	Strained topological insulator spin-orbit torque random access memory (STI-SOTRAM) bit cell for energy-efficient Processing in Memory	Md Golam Morshed et.al.	2407.20925	null
2024-07-19	HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation	Zezeng Li et.al.	2407.14419	null
2024-07-17	Strawberry detection and counting based on YOLOv7 pruning and information based tracking algorithm	Shiyu Liu et.al.	2407.12614	null
2024-07-15	Effective Motion Modeling for UAV-platform Multiple Object Tracking with Re-Margin Loss	Mufeng Yao et.al.	2407.10485	link
2024-07-16	Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking	Lorenzo Vaquero et.al.	2407.10151	link
2024-07-12	DroneMOT: Drone-based Multi-Object Tracking Considering Detection Difficulties and Simultaneous Moving of Drones and Objects	Peng Wang et.al.	2407.09051	null
2024-07-11	Manipulating a Tetris-Inspired 3D Video Representation	Mihir Godbole et.al.	2407.08885	null
2024-07-11	Visual Multi-Object Tracking with Re-Identification and Occlusion Handling using Labeled Random Finite Sets	Linh Van Ma et.al.	2407.08872	link
2024-07-11	CommRad: Context-Aware Sensing-Driven Millimeter-Wave Networks	Ish Kumar Jain et.al.	2407.08817	null
2024-07-10	Deep Learning-Based Robust Multi-Object Tracking via Fusion of mmWave Radar and Camera Sensors	Lei Cheng et.al.	2407.08049	null
2024-07-10	*Large spin-orbit torque in a-plane $α$-Fe${2}$O${3}$ /Pt bilayers*	Igor Lyalin et.al.	2407.07731	null
2024-07-10	Spin Splitting in Altermagnetic RuO $_2$ Enables Field-free Spin-Orbit Torque Switching via Dominant Out-of-Plane Spin Polarization	Zhuoyi Li et.al.	2407.07447	null
2024-07-09	Unconventional Spin-Orbit Torques from Sputtered MoTe2 Films	Shuchen Li et.al.	2407.06487	null
2024-07-07	Addressing single object tracking in satellite imagery through prompt-engineered solutions	Athena Psalta et.al.	2407.05518	null
2024-07-07	Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking	You Wu et.al.	2407.05383	null
2024-07-09	P2P: Part-to-Part Motion Cues Guide a Strong Tracking Framework for LiDAR Point Clouds	Jiahao Nie et.al.	2407.05238	link
2024-07-05	Median Mishaps between Chirality and Spin-Orbit Torques via Asymmetric Hysteresis	Minhwan Kim et.al.	2407.04624	null
2024-07-04	Serialized Output Training by Learned Dominance	Ying Shi et.al.	2407.03966	null
2024-07-04	TrackPGD: A White-box Attack using Binary Masks against Robust Transformer Trackers	Fatemeh Nourilenjan Nokabadi et.al.	2407.03946	link
2024-07-04	Out-of-Plane Polarization from Spin Reflection Induces Field-Free Spin-Orbit Torque Switching in Structures with Canted NiO Interfacial Moments	Zhe Zhang et.al.	2407.03676	null

(back to top)

HDR

Publish Date	Title	Authors	PDF	Code
2025-07-23	UNICE: Training A Universal Image Contrast Enhancer	Ruodai Cui et.al.	2507.17157	null
2025-07-16	Dark-EvGS: Event Camera as an Eye for Radiance Field in the Dark	Jingqian Wu et.al.	2507.11931	null
2025-07-16	SEPose: A Synthetic Event-based Human Pose Estimation Dataset for Pedestrian Monitoring	Kaustav Chanda et.al.	2507.11910	null
2025-07-16	CompressedVQA-HDR: Generalized Full-reference and No-reference Quality Assessment Models for Compressed High Dynamic Range Videos	Wei Sun et.al.	2507.11900	null
2025-07-14	MetaH2: A Snapshot Metasurface HDR Hyperspectral Camera	Yuxuan Liu et.al.	2507.08282	null
2025-07-09	A hybrid dosimetry approach for remote audits in Ir-192 HDR interstitial brachytherapy: Development and pilot implementation	Eleftherios P Pappas et.al.	2507.06958	null
2025-07-09	Capturing Stable HDR Videos Using a Dual-Camera System	Qianyu Zhang et.al.	2507.06593	null
2025-07-07	Clinical test cases for model-based dose calculation algorithm commissioning, QA and benchmarking, for 192Ir HDR brachytherapy of gynecologic cancers	V. Peppa et.al.	2507.05144	null
2025-07-21	Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration	Yuyi Zhang et.al.	2507.05108	null
2025-07-05	Towards Spatially-Varying Gain and Binning	Anqi Yang et.al.	2507.04190	null
2025-07-01	EvRWKV: A RWKV Framework for Effective Event-guided Low-Light Image Enhancement	WenJie Cai et.al.	2507.03184	null
2025-07-02	Enhancing Multi-Exposure High Dynamic Range Imaging with Overlapped Codebook for Improved Representation Learning	Keuntek Lee et.al.	2507.01588	null
2025-07-02	DiffusionLight-Turbo: Accelerated Light Probes for Free via Single-Pass Chrome Ball Inpainting	Worameth Chinchuthakun et.al.	2507.01305	null
2025-06-30	Event-based Tiny Object Detection: A Benchmark Dataset and Baseline	Nuo Chen et.al.	2506.23575	null
2025-07-05	AFUNet: Cross-Iterative Alignment-Fusion Synergy for HDR Reconstruction via Deep Unfolding Paradigm	Xinyue Li et.al.	2506.23537	null
2025-06-29	Event-based Stereo Visual-Inertial Odometry with Voxel Map	Zhaoxing Zhang et.al.	2506.23078	null
2025-06-28	SPICE-HL3: Single-Photon, Inertial, and Stereo Camera dataset for Exploration of High-Latitude Lunar Landscapes	David Rodríguez-Martínez et.al.	2506.22956	null
2025-06-28	ICME 2025 Generalizable HDR and SDR Video Quality Measurement Grand Challenge	Yixu Chen et.al.	2506.22790	null
2025-06-27	Single-shot HDR using conventional image sensor shutter functions and optical randomization	Xiang Dai et.al.	2506.22426	null
2025-06-19	Seven-Probe Fiber Detector for Time-Resolved Source Tracking in HDR-Brachytherapy: Experimental Evaluation	Mathieu Gonod et.al.	2506.16124	null
2025-06-14	Fine-Grained HDR Image Quality Assessment From Noticeably Distorted to Very High Fidelity	Mohsen Jenadeleh et.al.	2506.12505	null
2025-06-13	Automated Treatment Planning for Interstitial HDR Brachytherapy for Locally Advanced Cervical Cancer using Deep Reinforcement Learning	Mohammadamin Moradi et.al.	2506.11957	null
2025-06-11	Automatic Treatment Planning using Reinforcement Learning for High-dose-rate Prostate Brachytherapy	Tonghe Wang et.al.	2506.09805	null
2025-06-11	TRAPs, Generalisations of MZVs, Locality and Resurgence for Quantum Field Theories	Pierre J. Clavier et.al.	2506.09493	null
2025-06-04	Photoreal Scene Reconstruction from an Egocentric Device	Zhaoyang Lv et.al.	2506.04444	link
2025-06-04	GRAVITY+ adaptive optics (GPAO) tests in Europe	Florentin Millour et.al.	2506.03721	null
2025-06-03	IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation	Yuanze Lin et.al.	2506.03150	null
2025-05-29	LCB-CV-UNet: Enhanced Detector for High Dynamic Range Radar Signals	Yanbin Wang et.al.	2505.23454	null
2025-05-29	iHDR: Iterative HDR Imaging with Arbitrary Number of Exposures	Yu Yuan et.al.	2505.22971	null
2025-05-27	HDRSDR-VQA: A Subjective Video Quality Dataset for HDR and SDR Comparative Evaluation	Bowen Chen et.al.	2505.21831	null
2025-05-26	Total-Editing: Head Avatar with Editable Appearance, Motion, and Lighting	Yizhou Zhao et.al.	2505.20582	null
2025-05-28	EventEgoHands: Event-based Egocentric 3D Hand Mesh Reconstruction	Ryosei Hara et.al.	2505.19169	null
2025-05-23	Distance Estimation in Outdoor Driving Environments Using Phase-only Correlation Method with Event Cameras	Masataka Kobayashi et.al.	2505.17582	null
2025-05-22	V2V: Scaling Event-Based Vision through Efficient Video-to-Voxel Simulation	Hanyue Lou et.al.	2505.16797	link
2025-05-21	Evaluation of Mobile Environment for Vehicular Visible Light Communication Using Multiple LEDs and Event Cameras	Ryota Soga et.al.	2505.15412	null
2025-05-17	NTIRE 2025 Challenge on Efficient Burst HDR and Restoration: Datasets, Methods, and Results	Sangmin Lee et.al.	2505.12089	null
2025-05-22	Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression Recognition	Runduo Han et.al.	2505.12007	link
2025-05-16	Towards Navigation-Grade and Deployable Optomechanical Accelerometery	Chang Ge et.al.	2505.11751	null
2025-05-16	Planar Velocity Estimation for Fast-Moving Mobile Robots Using Event-Based Optical Flow	Liam Boyle et.al.	2505.11116	null
2025-05-14	Efficient Modelling of Lyman-α opacity fluctuations during late EoR	Barun Maity et.al.	2505.09369	null
2025-05-13	A Survey of 3D Reconstruction with Event Cameras: From Event-based Geometry to Neural 3D Rendering	Chuanzhi Xu et.al.	2505.08438	null
2025-05-12	Asynchronous Multi-Object Tracking with an Event Camera	Angus Apps et.al.	2505.08126	link
2025-05-12	Towards a physically realistic computationally efficient DVS pixel model	Rui Graca et.al.	2505.07386	null
2025-05-12	RealRep: Generalized SDR-to-HDR Conversion with Style Disentangled Representation Learning	Gang He et.al.	2505.07322	null
2025-04-30	From Events to Enhancement: A Survey on Event-Based Imaging Technologies	Yunfan Lu et.al.	2505.05488	null
2025-05-08	EDmamba: A Simple yet Effective Event Denoising Method with State Space Model	Ciyu Ruan et.al.	2505.05391	null
2025-05-07	EvEnhancer: Empowering Effectiveness, Efficiency and Generalizability for Continuous Space-Time Video Super-Resolution with Events	Shuoyan Wei et.al.	2505.04657	link
2025-05-06	Benchmark-based Study of CPU/GPU Power-Related Features through JAX and TensorFlow	Roblex Nana Tchakoute et.al.	2505.03398	null
2025-05-02	High Dynamic Range Novel View Synthesis with Single Exposure	Kaixuan Zhang et.al.	2505.01212	link
2025-04-29	A Survey on Event-based Optical Marker Systems	Nafiseh Jabbari Tofighi et.al.	2504.20736	null
2025-05-12	Spike Imaging Velocimetry: Dense Motion Estimation of Fluids Using Spike Cameras	Yunzhong Zhang et.al.	2504.18864	null
2025-04-25	Boxi: Design Decisions in the Context of Algorithmic Performance for Robotics	Jonas Frey et.al.	2504.18500	null
2025-04-25	BiasBench: A reproducible benchmark for tuning the biases of event cameras	Andreas Ziegler et.al.	2504.18235	null
2025-04-25	Post-Transfer Learning Statistical Inference in High-Dimensional Regression	Nguyen Vu Khai Tam et.al.	2504.18212	null
2025-04-24	CasualHDRSplat: Robust High Dynamic Range 3D Gaussian Splatting from Casually Captured Videos	Shucheng Gong et.al.	2504.17728	link
2025-04-27	EHGCN: Hierarchical Euclidean-Hyperbolic Fusion via Motion-Aware GCN for Hybrid Event Stream Perception	Haosheng Chen et.al.	2504.16616	null
2025-04-23	SaENeRF: Suppressing Artifacts in Event-based Neural Radiance Fields	Yuanjian Wang et.al.	2504.16389	link
2025-04-20	Approaches to High Dynamic Range Imaging - Application to the ngVLA	T. K. Sridharan et.al.	2504.14449	null
2025-04-17	CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework	Wentao Wu et.al.	2504.12576	link
2025-04-21	Event Quality Score (EQS): Assessing the Realism of Simulated Event Camera Streams via Distances in Latent Space	Kaustav Chanda et.al.	2504.12515	null
2025-04-16	Deep Generative Models for Bayesian Inference on High-Rate Sensor Data: Applications in Automotive Radar and Medical Imaging	Tristan S. W. Stevens et.al.	2504.12154	null
2025-04-11	High Dynamic Range Modulo Imaging for Robust Object Detection in Autonomous Driving	Kebin Contreras et.al.	2504.11472	null
2025-04-17	GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR	Christophe Bolduc et.al.	2504.10809	null
2025-04-14	Minimal Sensing for Orienting a Solar Panel	Jeremy Klotz et.al.	2504.10765	null
2025-04-13	Low-Light Image Enhancement using Event-Based Illumination Estimation	Lei Sun et.al.	2504.09379	null
2025-04-10	S2R-HDR: A Large-Scale Rendered Dataset for HDR Fusion	Yujin Wang et.al.	2504.07667	null
2025-04-08	Orthogonal Matching Pursuit based Reconstruction for Modulo Hysteresis Operators	Matthias Beckmann et.al.	2504.05895	null
2025-04-08	Inter-event Interval Microscopy for Event Cameras	Changqing Su et.al.	2504.04924	null
2025-04-06	eKalibr-Stereo: Continuous-Time Spatiotemporal Calibration for Event-Based Stereo Visual Systems	Shuolong Chen et.al.	2504.04451	link
2025-04-05	Autoregressive High-Order Finite Difference Modulo Imaging: High-Dynamic Range for Computer Vision Applications	Brayan Monroy et.al.	2504.04228	null
2025-04-03	Brightness Perceiving for Recursive Low-Light Image Enhancement	Haodian Wang et.al.	2504.02362	link
2025-04-02	Anomaly Detection for Hybrid Butterfly Subspecies via Probability Filtering	Bo-Kai Ruan et.al.	2504.01671	link
2025-03-31	DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting	Seungjun Lee et.al.	2503.24210	null
2025-03-29	SuperEIO: Self-Supervised Event Feature Learning for Event Inertial Odometry	Peiyu Chen et.al.	2503.22963	link
2025-03-28	Enhancing Celestial Imaging: High Dynamic Range with Neuromorphic Cameras	Satyapreet Singh Yadav et.al.	2503.22814	null
2025-03-26	SpikeDerain: Unveiling Clear Videos from Rainy Sequences Using Color Spike Streams	Hanwen Liang et.al.	2503.20315	null
2025-03-26	A Survey on Event-driven 3D Reconstruction: Development under Different Categories	Chuanzhi Xu et.al.	2503.19753	null
2025-03-25	Maximum Likelihood Estimation Based Complex-Valued Robust Chinese Remainder Theorem and Its Fast Algorithm	Xiaoping Li et.al.	2503.18625	null
2025-03-21	Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras	Shuang Guo et.al.	2503.17262	link
2025-03-20	Neuromorphic Cameras in Astronomy: Unveiling the Future of Celestial Imaging Beyond Conventional Limits	Satyapreet Singh Yadav et.al.	2503.15883	null
2025-03-19	Boosting HDR Image Reconstruction via Semantic Knowledge Transfer	Qingsen Yan et.al.	2503.15361	null
2025-03-20	VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention	Mingzhe Zheng et.al.	2503.15138	null
2025-03-18	Weakly Supervised Spatial Implicit Neural Representation Learning for 3D MRI-Ultrasound Deformable Image Registration in HDR Prostate Brachytherapy	Jing Wang et.al.	2503.14395	null
2025-03-17	UCF-Crime-DVS: A Novel Event-Based Dataset for Video Anomaly Detection with Spiking Neural Networks	Yuanbin Qian et.al.	2503.12905	link
2025-03-17	Stereo Event-based, 6-DOF Pose Tracking for Uncooperative Spacecraft	Zibin Liu et.al.	2503.12732	link
2025-03-16	EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera	Luming Wang et.al.	2503.12419	link
2025-03-14	Gain-MLP: Improving HDR Gain Map Encoding via a Lightweight MLP	Trevor D. Canham et.al.	2503.11883	null
2025-03-13	GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping	Jinfeng Liu et.al.	2503.10143	null
2025-03-10	Retinex-MEF: Retinex-based Glare Effects Aware Unsupervised Multi-Exposure Image Fusion	Haowen Bai et.al.	2503.07235	null
2025-03-08	Optimization models for needle placement in 3D-printed masks for high dose rate brachytherapy	Nasim Mirzavand Boroujeni et.al.	2503.06000	null
2025-03-16	DeepGrav: Anomalous Gravitational-Wave Detection Through Deep Latent Features	Jianqi Yan et.al.	2503.03799	link
2025-03-05	BAT: Learning Event-based Optical Flow with Bidirectional Adaptive Temporal Correlation	Gangwei Xu et.al.	2503.03256	null
2025-03-04	ERetinex: Event Camera Meets Retinex Theory for Low-Light Image Enhancement	Xuejian Guo et.al.	2503.02484	link
2025-03-03	S-R2D2: a spherical extension of the R2D2 deep neural network series paradigm for wide-field radio-interferometric imaging	A. Tajja et.al.	2503.01462	null
2025-03-03	Adaptive cold-atom magnetometry mitigating the trade-off between sensitivity and dynamic range	Zhu Ma et.al.	2503.01211	null
2025-03-01	High Dynamic Range Video Compression: A Large-Scale Benchmark Dataset and A Learned Bit-depth Scalable Compression Algorithm	Zhaoyi Tian et.al.	2503.00410	link
2025-03-01	Adversarial Attacks on Event-Based Pedestrian Detectors: A Physical Approach	Guixu Lin et.al.	2503.00377	null
2025-02-28	EVLoc: Event-based Visual Localization in LiDAR Maps via Event-Depth Registration	Kuangyi Chen et.al.	2503.00167	link
2025-02-28	SEE: See Everything Every Time -- Adaptive Brightness Adjustment for Broad Light Range Images via Events	Yunfan Lu et.al.	2502.21120	null
2025-02-18	Fast Antibiotic resistance-Based gene editing of mammalian cells with CRISPR-Cas9 (FAB-CRISPR)	Petia Adarska et.al.	2502.12675	null
2025-02-14	Quantifying Phase Magnitudes of Open-Source Focused-Probe 4D-STEM Ptychography Reconstructions	Toma Susi et.al.	2502.09938	link
2025-02-10	Indoor Light and Heat Estimation from a Single Panorama	Guanzhou Ji et.al.	2502.06973	null
2025-02-09	Compressed sensing enabled high-bandwidth and large dynamic range magnetic sensing	Galya Haim et.al.	2502.06070	null
2025-02-09	Energy-Efficient Autonomous Aerial Navigation with Dynamic Vision Sensors: A Physics-Guided Neuromorphic Approach	Sourav Sanyal et.al.	2502.05938	null
2025-02-07	Differentiable Mobile Display Photometric Stereo	Gawoon Ban et.al.	2502.05055	null
2025-02-05	Deep Learning-based Event Data Coding: A Joint Spatiotemporal and Polarity Solution	Abdelrahman Seleem et.al.	2502.03285	null
2025-02-04	Event-aided Semantic Scene Completion	Shangwei Guo et.al.	2502.02334	link
2025-01-23	HP2 Survey V. Ophiuchus: Filament formation in a dispersing cloud complex	João Alves et.al.	2501.13931	null
2025-01-22	DocTTT: Test-Time Training for Handwritten Document Recognition Using Meta-Auxiliary Learning	Wenhao Gu et.al.	2501.12898	null
2025-01-20	UltraFusion: Ultra High Dynamic Imaging using Exposure Fusion	Zixuan Chen et.al.	2501.11515	null
2025-01-10	eKalibr: Dynamic Intrinsic Calibration for Event Cameras From First Principles of Events	Shuolong Chen et.al.	2501.05688	link
2025-01-07	AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scene	Chaoran Feng et.al.	2501.02807	null
2024-12-26	Learning Monocular Depth from Events via Egomotion Compensation	Haitao Meng et.al.	2412.19067	null
2024-12-25	HAND: Hierarchical Attention Network for Multi-Scale Handwritten Document Recognition and Layout Analysis	Mohammed Hamdan et.al.	2412.18981	null
2024-12-20	High-Dynamic Range Broadband Terahertz Time-Domain Spectrometer Based on Organic Crystal MNA	Samira Mansourzadeh et.al.	2412.15718	null
2024-12-19	Event-assisted 12-stop HDR Imaging of Dynamic Scene	Shi Guo et.al.	2412.14705	null
2025-01-06	LEDiff: Latent Exposure Diffusion for HDR Generation	Chao Wang et.al.	2412.14456	null
2024-12-18	Development of a High-Resolution, High-Dynamic-Range Charge Detector for Ion Beam Monitoring	O. Adriani et.al.	2412.13934	null
2024-12-18	Multi-Exposure Image Fusion via Distilled 3D LUT Grid with Editable Mode	Xin Su et.al.	2412.13749	link
2024-12-17	Transforming Single Photon Camera Images to Color High Dynamic Range Images	Sumit Sharma et.al.	2412.12942	null
2024-12-17	Efficient Event-based Semantic Segmentation with Spike-driven Lightweight Transformer-based Networks	Xiaxin Zhu et.al.	2412.12843	null
2024-12-17	Compressed Sensing Based Residual Recovery Algorithms and Hardware for Modulo Sampling	Shaik Basheeruddin Shah et.al.	2412.12724	null
2024-12-16	Towards Physically-Based Sky-Modeling	Ian J. Maquignaz et.al.	2412.11883	null
2024-12-16	High dynamic-range quantum sensing of magnons and their dynamics using a superconducting qubit	Sonia Rani et.al.	2412.11859	null
2024-12-16	Predicting the Original Appearance of Damaged Historical Documents	Zhenhua Yang et.al.	2412.11634	link
2024-12-16	Event-based Detectors for Laser Guide Star Tip-Tilt Sensing	Monique Cockram et.al.	2412.11436	null
2024-12-12	Continuous Gaussian Process Pre-Optimization for Asynchronous Event-Inertial Odometry	Zhixiang Wang et.al.	2412.08909	null
2024-12-10	EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering	Toshiya Yura et.al.	2412.07293	null
2024-12-09	Fitting Spherical Gaussians to Dynamic HDRI Sequences	Pascal Clausen et.al.	2412.06511	null
2024-12-09	Event fields: Capturing light fields at high speed, resolution, and dynamic range	Ziyuan Qu et.al.	2412.06191	null
2024-12-07	On an Analytical Inversion Formula for the Modulo Radon Transform	Matthias Beckmann et.al.	2412.05711	null
2024-12-05	DHOST theories as disformal gravity: From black holes to radiative spacetimes	Jibril Ben Achour et.al.	2412.04135	null
2024-12-05	High-power single-cycle THz emission from large-area photoconductive emitters at 400 kHz	Mohsen Khalili et.al.	2412.04004	null
2024-12-05	Enhancing and Accelerating Diffusion-Based Inverse Problem Solving through Measurements Optimization	Tianyu Chen et.al.	2412.03941	null
2024-12-04	Accelerating HI density predictions during the Epoch of Reionization using a GPR-based emulator on N-body simulations	Gaurav Pundir et.al.	2412.03485	null
2024-12-03	EvRT-DETR: The Surprising Effectiveness of DETR-based Detection for Event Cameras	Dmitrii Torbunov et.al.	2412.02890	link
2024-12-02	Learning Differential Pyramid Representation for Tone Mapping	Qirui Yang et.al.	2412.01463	null
2024-11-28	Event-based Tracking of Any Point with Motion-Robust Correlation Features	Friedhelm Hamann et.al.	2412.00133	link
2024-11-25	CapHDR2IR: Caption-Driven Transfer from Visible Light to Infrared Domain	Jingchao Peng et.al.	2411.16327	null
2024-11-22	High-dynamic-range atomic clocks with dual Heisenberg-limited precision scaling	Jungeng Zhou et.al.	2411.14944	null
2024-11-20	Demonstrating the Suitability of Neuromorphic, Event-Based, Dynamic Vision Sensors for In Process Monitoring of Metallic Additive Manufacturing and Welding	David Mascareñas et.al.	2411.13108	null
2024-11-18	Noise Filtering Benchmark for Neuromorphic Satellites Observations	Sami Arja et.al.	2411.11233	link
2024-11-16	Beyond Feature Mapping GAP: Integrating Real HDRTV Priors for Superior SDRTV-to-HDRTV Conversion	Kepeng Xu et.al.	2411.10775	null
2024-11-15	CaLES: A GPU-accelerated solver for large-eddy simulation of wall-bounded flows	Maochao Xiao et.al.	2411.09364	link
2024-11-11	Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models	NVIDIA et.al.	2411.07126	null
2024-11-25	Increasing the scalability of graph convolution for FPGA-implemented event-based vision	Piotr Wzorek et.al.	2411.04269	null
2024-11-13	DEIO: Deep Event Inertial Odometry	Weipeng Guan et.al.	2411.03928	link
2024-11-05	Monocular Event-Based Vision for Obstacle Avoidance with a Quadrotor	Anish Bhattacharya et.al.	2411.03303	null
2024-11-05	Learning-based Lossless Event Data Compression	Ahmadreza Sezavar et.al.	2411.03010	null
2024-10-30	Automatic programming via large language models with population self-evolution for dynamic job shop scheduling problem	Jin Huang et.al.	2410.22657	null
2024-10-29	EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data	Zhonghua Yi et.al.	2410.21743	link
2024-10-28	NYC-Event-VPR: A Large-Scale High-Resolution Event-Based Visual Place Recognition Dataset in Dense Urban Environments	Taiyi Pan et.al.	2410.21615	link
2024-10-27	BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events	Yijin Li et.al.	2410.20451	null
2024-10-26	Unleashing Dynamic Range and Resolution in Unlimited Sensing Framework via Novel Hardware	Yuliang Zhu et.al.	2410.20193	null
2024-10-21	Scene-Segmentation-Based Exposure Compensation for Tone Mapping of High Dynamic Range Scenes	Yuma Kinoshita et.al.	2410.19839	null
2024-10-24	Environment Maps Editing using Inverse Rendering and Adversarial Implicit Functions	Antonio D'Orazio et.al.	2410.18622	null
2024-10-23	Frequency-dependent amplitude correction to free-precession scalar magnetometers	M. E. Limes et.al.	2410.18224	null
2024-10-22	SpikMamba: When SNN meets Mamba in Event-based Human Action Recognition	Jiaqi Chen et.al.	2410.16746	link
2024-10-19	A Cycle Ride to HDR: Semantics Aware Self-Supervised Framework for Unpaired LDR-to-HDR Image Translation	Hrishav Bakul Barua et.al.	2410.15068	link
2024-10-17	360U-Former: HDR Illumination Estimation with Panoramic Adapted Vision Transformers	Jack Hilliard et.al.	2410.13566	null
2024-10-17	On Quantum Programming Languages	Benoît Valiron et.al.	2410.13337	null
2024-10-16	An O(m+n)-Space Spatiotemporal Denoising Filter with Cache-Like Memories for Dynamic Vision Sensors	Qinghang Zhao et.al.	2410.12423	null
2024-10-10	DifFRelight: Diffusion-Based Facial Performance Relighting	Mingming He et.al.	2410.08188	null
2024-10-18	IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera	Jian Huang et.al.	2410.08107	link
2024-10-09	Fourier-based Action Recognition for Wildlife Behavior Quantification with Event Cameras	Friedhelm Hamann et.al.	2410.06698	null
2024-10-03	Spiking Neural Network as Adaptive Event Stream Slicer	Jiahang Cao et.al.	2410.02249	link
2024-10-03	Capturing complex hand movements and object interactions using machine learning-powered stretchable smart textile gloves	Arvin Tashakori et.al.	2410.02221	link
2024-10-01	Signatures of Black Hole Spin and Plasma Acceleration in Jet Polarimetry	Zachary Gelles et.al.	2410.00954	null
2024-10-04	VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models	Jiapeng Wang et.al.	2410.00741	null
2024-09-26	Photon Inhibition for Energy-Efficient Single-Photon Imaging	Lucas J. Koerner et.al.	2409.18337	null
2024-09-26	Deblur e-NeRF: NeRF from Motion-Blurred Events under High-speed or Low-light Conditions	Weng Fei Low et.al.	2409.17988	null
2024-09-26	Unsupervised Learning Based Multi-Scale Exposure Fusion	Chaobing Zheng et.al.	2409.17830	null
2024-09-26	Event-based Stereo Depth Estimation: A Survey	Suman Ghosh et.al.	2409.17680	null
2024-09-26	Dynamic Subframe Splitting and Spatio-Temporal Motion Entangled Sparse Attention for RGB-E Tracking	Pengcheng Shao et.al.	2409.17560	null
2024-09-25	EventHDR: from Event to High-Speed HDR Videos and Beyond	Yunhao Zou et.al.	2409.17029	null
2024-09-25	Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training	Kun Song et.al.	2409.16767	null
2024-09-24	Sub-Nyquist USF Spectral Estimation: $K$ Frequencies with $6K + 4$ Modulo Samples	Ruiming Guo et.al.	2409.16472	null
2024-09-24	Neuromorphic Drone Detection: an Event-RGB Multimodal Approach	Gabriele Magrini et.al.	2409.16099	link
2024-09-24	Deep chroma compression of tone-mapped images	Xenios Milidonis et.al.	2409.16032	link
2024-09-23	Mixing Data-driven and Geometric Models for Satellite Docking Port State Estimation using an RGB or Event Camera	Cedric Le Gentil et.al.	2409.15581	null
2024-09-23	SpikeGS: Learning 3D Gaussian Fields from Continuous Spike Stream	Jinze Yu et.al.	2409.15176	link
2024-09-21	Monocular Event-Inertial Odometry with Adaptive decay-based Time Surface and Polarity-aware Tracking	Kai Tang et.al.	2409.13971	null
2024-09-20	Intrinsic Single-Image HDR Reconstruction	Sebastian Dille et.al.	2409.13803	link
2024-09-20	Elite-EvGS: Learning Event-based 3D Gaussian Splatting by Distilling Event-to-Video Priors	Zixin Zhang et.al.	2409.13392	null
2024-09-18	EventAug: Multifaceted Spatio-Temporal Data Augmentation Methods for Event-based Learning	Yukun Tian et.al.	2409.11813	null
2024-09-18	Enhancing Complex Formula Recognition with Hierarchical Detail-Focused Network	Jiale Wang et.al.	2409.11677	null
2024-09-16	Programmable multifunctional integrated microwave photonic circuit on thin-film lithium niobate	Chuangchuang Wei et.al.	2409.10227	null
2024-09-15	SciDVS: A Scientific Event Camera with 1.7% Temporal Contrast Sensitivity at 0.7 lux	Rui Graca et.al.	2409.09648	null
2024-09-13	Integration of high-performance compact interferometric sensors in a suspended interferometer	Alexandra Mitchell et.al.	2409.08843	null
2024-09-13	Adaptive Robust High-Precision Atomic Gravimetry	Jinye Wei et.al.	2409.08550	null
2024-09-07	Neural Augmentation Based Panoramic High Dynamic Range Stitching	Chaobing Zheng et.al.	2409.04679	null
2024-09-05	MouseSIS: A Frames-and-Events Dataset for Space-Time Instance Segmentation of Mice	Friedhelm Hamann et.al.	2409.03358	link
2024-09-03	Gradient events: improved acquisition of visual information in event cameras	Eero Lehtonen et.al.	2409.01764	null
2024-09-02	SoK: Security of the Image Processing Pipeline in Autonomous Vehicles	Michael Kühr et.al.	2409.01234	link
2024-08-30	Synthetic Lunar Terrain: A Multimodal Open Dataset for Training and Evaluating Neuromorphic Vision Algorithms	Marcus Märtens et.al.	2408.16971	null
2024-08-29	EvLight++: Low-Light Video Enhancement with an Event Camera: A Large-Scale Real-World Dataset, Novel Method, and More	Kanghao Chen et.al.	2408.16254	null
2024-08-28	ES-PTAM: Event-based Stereo Parallel Tracking and Mapping	Suman Ghosh et.al.	2408.15605	link
2024-08-27	Towards Real-world Event-guided Low-light Video Enhancement and Deblurring	Taewoo Kim et.al.	2408.14916	link
2024-08-27	Recent Event Camera Innovations: A Survey	Bharatesh Chakravarthi et.al.	2408.13627	link
2024-08-24	Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation	Yuxuan Zhou et.al.	2408.13586	link
2024-08-22	ISETHDR: A Physics-based Synthetic Radiance Dataset for High Dynamic Range Driving Scenes	Zhenyi Liu et.al.	2408.12048	link
2024-08-20	Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm	Xiao Wang et.al.	2408.10488	link
2024-08-20	MambaEVT: Event Stream based Visual Object Tracking using State Space Model	Xiao Wang et.al.	2408.10487	link
2024-08-19	Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms	Xiao Wang et.al.	2408.09764	link
2024-08-19	Phase-Separated Charge Order and Twinning Across Length Scales in CsV $_3$Sb$_5$	Jayden Plumb et.al.	2408.08842	null
2024-08-16	CoSEC: A Coaxial Stereo Event Camera Dataset for Autonomous Driving	Shihan Peng et.al.	2408.08500	null
2024-08-13	MAIR++: Improving Multi-view Attention Inverse Rendering with Implicit Lighting Representation	JunYong Choi et.al.	2408.06707	null
2024-08-13	HDRGS: High Dynamic Range Gaussian Splatting	Jiahao Wu et.al.	2408.06543	link
2024-08-12	Rethinking Video with a Universal Event-Based Representation	Andrew Freeman et.al.	2408.06248	null
2024-08-10	EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency	Junjie Jiang et.al.	2408.05452	null
2024-08-06	Line-based 6-DoF Object Pose Estimation and Tracking With an Event Camera	Zibin Liu et.al.	2408.03225	link
2024-07-31	Exploiting Change Blindness for Video Coding: Perspectives from a Less Promising User Study	Mitra Amiri et.al.	2408.00052	null
2024-07-23	HDRSplat: Gaussian Splatting for High Dynamic Range 3D Scene Reconstruction from Raw Images	Shreyas Singh et.al.	2407.16503	link
2024-07-23	SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging	Lingtong Kong et.al.	2407.16308	link
2024-07-24	SwinSF: Image Reconstruction from Spatial-Temporal Spike Streams	Liangyan Jiang et.al.	2407.15708	link
2024-08-04	Exposure Completing for Temporally Consistent Neural High Dynamic Range Video Rendering	Jiahao Cui et.al.	2407.13309	link
2024-07-18	Learned HDR Image Compression for Perceptually Optimal Storage and Display	Peibei Cao et.al.	2407.13179	null
2024-07-17	Nonlinear tomographic reconstruction via nonsmooth optimization	Vasileios Charisopoulos et.al.	2407.12984	null
2024-07-16	VideoClusterNet: Self-Supervised and Adaptive Clustering For Videos	Devesh Walawalkar et.al.	2407.12214	null
2024-07-16	I $^2$ -SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM	Gwangtak Bae et.al.	2407.11347	null
2024-07-15	Temporal Event Stereo via Joint Learning with Stereoscopic Flow	Hoonhee Cho et.al.	2407.10831	link
2024-07-15	Towards Robust Event-based Networks for Nighttime via Unpaired Day-to-Night Event Translation	Yuhwan Jeong et.al.	2407.10703	link
2024-07-15	Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction	Lin Zhu et.al.	2407.10636	null
2024-07-18	Efficient hybrid technique for generating sub-grid haloes in reionization simulations	Ankur Barsode et.al.	2407.10585	null
2024-07-12	Radiance Fields from Photons	Sacha Jungerman et.al.	2407.09386	null
2024-07-11	Event-based vision on FPGAs -- a survey	Tomasz Kryjak et.al.	2407.08356	null
2024-07-12	Dynamic phase transition into a mixed-CDW state in 1 $T$-TaS$_2$ via a thermal quench	A. de la Torre et.al.	2407.07953	null
2024-07-08	PanDORA: Casual HDR Radiance Acquisition for Indoor Scenes	Mohammad Reza Karimi Dastjerdi et.al.	2407.06150	null
2024-07-08	Neuromorphic Imaging with Super-Resolution	Pei Zhang et.al.	2407.05764	null

(back to top)

Low-Level

Publish Date	Title	Authors	PDF	Code
2025-07-23	DFDNet: Dynamic Frequency-Guided De-Flare Network	Minglong Xue et.al.	2507.17489	null
2025-07-23	Content-based 3D Image Retrieval and a ColBERT-inspired Re-ranking for Tumor Flagging and Staging	Farnaz Khun Jush et.al.	2507.17412	null
2025-07-23	PolarAnything: Diffusion-based Polarimetric Image Synthesis	Kailong Zhang et.al.	2507.17268	null
2025-07-23	UNICE: Training A Universal Image Contrast Enhancer	Ruodai Cui et.al.	2507.17157	null
2025-07-22	A High Magnifications Histopathology Image Dataset for Oral Squamous Cell Carcinoma Diagnosis and Prognosis	Jinquan Guan et.al.	2507.16360	null
2025-07-21	SAIGFormer: A Spatially-Adaptive Illumination-Guided Network for Low-Light Image Enhancement	Hanting Li et.al.	2507.15520	null
2025-07-20	EBA-AI: Ethics-Guided Bias-Aware AI for Efficient Underwater Image Enhancement and Coral Reef Monitoring	Lyes Saad Saoud et.al.	2507.15036	null
2025-07-20	U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs	Xiaojie Li et.al.	2507.14902	null
2025-07-20	Exploring Scalable Unified Modeling for General Low-Level Vision	Xiangyu Chen et.al.	2507.14801	null
2025-07-18	Global Modeling Matters: A Fast, Lightweight and Effective Baseline for Efficient Image Restoration	Xingyu Jiang et.al.	2507.13663	null
2025-07-17	FAR-Net: Multi-Stage Fusion Network with Enhanced Semantic Alignment and Adaptive Reconciliation for Composed Image Retrieval	Jeong-Woo Park et.al.	2507.12823	null
2025-07-17	MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval	Jeong-Woo Park et.al.	2507.12819	null
2025-07-16	QuRe: Query-Relevant Retrieval through Hard Negative Sampling in Composed Image Retrieval	Jaehyun Kwak et.al.	2507.12416	null
2025-07-16	Wavelet-based Decoupling Framework for low-light Stereo Image Enhancement	Shuangli Du et.al.	2507.12188	null
2025-07-16	Learning Pixel-adaptive Multi-layer Perceptrons for Real-time Image Enhancement	Junyu Lou et.al.	2507.12135	null
2025-07-16	Unsupervised Part Discovery via Descriptor-Based Masked Image Restoration with Optimized Constraints	Jiahao Xia et.al.	2507.11985	null
2025-07-16	A Spatial-Physics Informed Model for 3D Spiral Sample Scanned by SQUID Microscopy	J. Senthilnath et.al.	2507.11853	null
2025-07-14	CWNet: Causal Wavelet Network for Low-Light Image Enhancement	Tongshun Zhang et.al.	2507.10689	null
2025-07-14	GT-Loc: Unifying When and Where in Images Through a Joint Embedding Space	David G. Shatwell et.al.	2507.10473	null
2025-07-14	RefSTAR: Blind Facial Image Restoration with Reference Selection, Transfer, and Reconstruction	Zhicun Yin et.al.	2507.10470	null
2025-07-14	Text-to-Remote-Sensing-Image Retrieval beyond RGB Sources	Daniele Rege Cambrin et.al.	2507.10403	null
2025-07-14	On a class of forward-backward reaction-diffusion systems with local and nonlocal coupling for image restoration	Yihui Tong et.al.	2507.10393	null
2025-07-13	A New Wireless Image Transmission System Using Code Index Modulation and Image Enhancement for High-Rate Next Generation Networks	Burak Ahmet Ozden et.al.	2507.09713	null
2025-07-11	RadiomicsRetrieval: A Customizable Framework for Medical Image Retrieval Using Radiomics Features	Inye Na et.al.	2507.08546	null
2025-07-11	Deep Hashing with Semantic Hash Centers for Image Retrieval	Li Chen et.al.	2507.08404	null
2025-07-11	Single-Step Latent Diffusion for Underwater Image Restoration	Jiayi Wu et.al.	2507.07878	null
2025-07-10	IRAF-SLAM: An Illumination-Robust and Adaptive Feature-Culling Front-End for Visual SLAM in Challenging Environments	Thanh Nguyen Canh et.al.	2507.07752	null
2025-07-10	Degradation-Agnostic Statistical Facial Feature Transformation for Blind Face Restoration in Adverse Weather Conditions	Chang-Hwan Son et.al.	2507.07464	null
2025-07-08	FACap: A Large-scale Fashion Dataset for Fine-grained Composed Image Retrieval	François Gardères et.al.	2507.07135	null
2025-07-09	HVI-CIDNet+: Beyond Extreme Darkness for Low-Light Image Enhancement	Qingsen Yan et.al.	2507.06814	null
2025-07-09	Residual Prior-driven Frequency-aware Network for Image Fusion	Guan Zheng et.al.	2507.06735	null
2025-07-09	Enhancing Diffusion Model Stability for Image Restoration via Gradient Management	Hongjie Wu et.al.	2507.06656	null
2025-07-09	MS-DPPs: Multi-Source Determinantal Point Processes for Contextual Diversity Refinement of Composite Attributes in Text to Image Retrieval	Naoya Sogi et.al.	2507.06654	null
2025-07-09	Capturing Stable HDR Videos Using a Dual-Camera System	Qianyu Zhang et.al.	2507.06593	null
2025-07-08	Automatic Synthesis of High-Quality Triplet Data for Composed Image Retrieval	Haiwen Li et.al.	2507.05970	null
2025-07-08	OFFSET: Segmentation-based Focus Shift Revision for Composed Image Retrieval	Zhiwei Chen et.al.	2507.05631	null
2025-07-08	Kernel Density Steering: Inference-Time Scaling via Mode Seeking for Image Restoration	Yuyang Hu et.al.	2507.05604	null
2025-07-07	Simulating Refractive Distortions and Weather-Induced Artifacts for Resource-Constrained Autonomous Perception	Moseli Mots'oehli et.al.	2507.05536	null
2025-07-07	Llama Nemoretriever Colembed: Top-Performing Text-Image Retrieval Model	Mengyao Xu et.al.	2507.05513	null
2025-07-07	An analysis of vision-language models for fabric retrieval	Francesco Giuliari et.al.	2507.04735	null
2025-07-06	Multi-Modal Semantic Parsing for the Interpretation of Tombstone Inscriptions	Xiao Zhang et.al.	2507.04377	null
2025-07-06	Towards Lightest Low-Light Image Enhancement Architecture for Mobile Devices	Guangrui Bai et.al.	2507.04277	null
2025-07-06	Quick Bypass Mechanism of Zero-Shot Diffusion-Based Image Restoration	Yu-Shan Tai et.al.	2507.04207	null
2025-07-05	EdgeSRIE: A hybrid deep learning framework for real-time speckle reduction and image enhancement on portable ultrasound systems	Hyunwoo Cho et.al.	2507.03937	null
2025-07-03	IGDNet: Zero-Shot Robust Underexposed Image Enhancement via Illumination-Guided and Denoising	Hailong Yan et.al.	2507.02445	null
2025-07-03	MAC-Lookup: Multi-Axis Conditional Lookup Model for Underwater Image Enhancement	Fanghai Yi et.al.	2507.02270	null
2025-07-03	SurgVisAgent: Multimodal Agentic Model for Versatile Surgical Visual Enhancement	Zeyu Lei et.al.	2507.02252	null
2025-07-02	MobileIE: An Extremely Lightweight and Effective ConvNet for Real-Time Image Enhancement on Mobile Devices	Hailong Yan et.al.	2507.01838	null
2025-07-02	DocShaDiffusion: Diffusion Model in Latent Space for Document Image Shadow Removal	Wenjie Liu et.al.	2507.01422	null
2025-07-01	UAVD-Mamba: Deformable Token Fusion Vision Mamba for Multimodal UAV Detection	Wei Li et.al.	2507.00849	null
2025-07-04	LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling	Huaqiu Li et.al.	2507.00790	null
2025-07-01	Laplace-Mamba: Laplace Frequency Prior-Guided Mamba-CNN Fusion Network for Image Dehazing	Yongzhen Wang et.al.	2507.00501	null
2025-06-30	Oneta: Multi-Style Image Enhancement Using Eigentransformation Functions	Jiwon Kim et.al.	2506.23547	null
2025-06-29	Layer Decomposition and Morphological Reconstruction for Task-Oriented Infrared Image Enhancement	Siyuan Chai et.al.	2506.23353	null
2025-06-29	Double-Diffusion: Diffusion Conditioned Diffusion Probabilistic Model For Air Quality Prediction	Hanlin Dong et.al.	2506.23053	null
2025-06-28	Utilizing a Novel Deep Learning Method for Scene Categorization in Remote Sensing Data	Ghufran A. Omran et.al.	2506.22939	null
2025-06-28	Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval	Li-Cheng Shen et.al.	2506.22864	null
2025-06-28	UniFuse: A Unified All-in-One Framework for Multi-Modal Medical Image Fusion Under Diverse Degradations and Misalignments	Dayong Su et.al.	2506.22736	null
2025-06-27	EAMamba: Efficient All-Around Vision State Space Model for Image Restoration	Yu-Cheng Lin et.al.	2506.22246	null
2025-06-27	ReF-LLE: Personalized Low-Light Enhancement via Reference-Guided Deep Reinforcement Learning	Ming Zhao et.al.	2506.22216	null
2025-06-26	Elucidating and Endowing the Diffusion Training Paradigm for General Image Restoration	Xin Lu et.al.	2506.21722	null
2025-06-26	Wild refitting for black box prediction	Martin J. Wainwright et.al.	2506.21460	null
2025-06-26	Learning to See in the Extremely Dark	Hai Jiang et.al.	2506.21132	null
2025-06-25	On the Burstiness of Faces in Set	Jiong Wang et.al.	2506.20312	null
2025-06-25	TDiR: Transformer based Diffusion for Image Restoration Tasks	Abbas Anwar et.al.	2506.20302	null
2025-06-24	A Comparative Study of NAFNet Baselines for Image Restoration	Vladislav Esaulov et.al.	2506.19845	null
2025-06-24	NAADA: A Noise-Aware Attention Denoising Autoencoder for Dental Panoramic Radiographs	Khuram Naveed et.al.	2506.19387	null
2025-06-24	jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval	Michael Günther et.al.	2506.18902	null
2025-06-23	Enhancing Image Restoration Transformer via Adaptive Translation Equivariance	JiaKui Hu et.al.	2506.18520	null
2025-06-23	BSMamba: Brightness and Semantic Modeling for Long-Range Interaction in Low-Light Image Enhancement	Tongshun Zhang et.al.	2506.18346	null
2025-06-23	A Multi-Scale Spatial Attention-Based Zero-Shot Learning Framework for Low-Light Image Enhancement	Muhammad Azeem Aslam et.al.	2506.18323	null
2025-06-23	Attention-Based Ensemble Learning for Crop Classification Using Landsat 8-9 Fusion	Zeeshan Ramzan et.al.	2506.18321	null
2025-06-26	Referring Expression Instance Retrieval and A Strong End-to-End Baseline	Xiangzhao Hao et.al.	2506.18246	null
2025-06-22	CmFNet: Cross-modal Fusion Network for Weakly-supervised Segmentation of Medical Images	Dongdong Meng et.al.	2506.18042	null
2025-06-20	Reversing Flow for Image Restoration	Haina Qin et.al.	2506.16961	null
2025-06-20	Visual-Instructed Degradation Diffusion for All-in-One Image Restoration	Wenyang Luo et.al.	2506.16960	link
2025-06-20	Temperature calibration of surface emissivities with an improved thermal image enhancement network	Ning Chu et.al.	2506.16803	null
2025-06-23	RealSR-R1: Reinforcement Learning for Real-World Image Super-Resolution with Vision-Language Chain-of-Thought	Junbo Qiao et.al.	2506.16796	link
2025-06-20	TextBraTS: Text-Guided Volumetric Brain Tumor Segmentation with Innovative Dataset Development and Fusion Module Exploration	Xiaoyu Shi et.al.	2506.16784	link
2025-06-20	Infrared and Visible Image Fusion Based on Implicit Neural Representations	Shuchen Sun et.al.	2506.16773	null
2025-06-20	Class Agnostic Instance-level Descriptor for Visual Instance Search	Qi-Ying Sun et.al.	2506.16745	null
2025-06-20	TeSG: Textual Semantic Guidance for Infrared and Visible Image Fusion	Mingrui Zhu et.al.	2506.16730	null
2025-06-19	MambaHash: Visual State Space Deep Hashing Model for Large-Scale Image Retrieval	Chao He et.al.	2506.16353	link
2025-06-19	Fine-grained Image Retrieval via Dual-Vision Adaptation	Xin Jiang et.al.	2506.16273	null
2025-06-18	DM-FNet: Unified multimodal medical image fusion via diffusion process-trained encoder-decoder	Dan He et.al.	2506.15218	link
2025-06-18	ReSeDis: A Dataset for Referring-based Object Search across Large-Scale Image Collections	Ziling Huang et.al.	2506.15180	null
2025-06-17	HARMONY: A Scalable Distributed Vector Database for High-Throughput Approximate Nearest Neighbor Search	Qian Xu et.al.	2506.14707	null
2025-06-17	Optimization-Based Image Restoration under Implementation Constraints in Optical Analog Circuits	Taisei Kato et.al.	2506.14624	null
2025-06-17	Unsupervised Imaging Inverse Problems with Diffusion Distribution Matching	Giacomo Meanti et.al.	2506.14605	link
2025-06-17	Exploring Diffusion with Test-Time Training on Efficient Image Restoration	Rongchang Lu et.al.	2506.14541	null
2025-06-17	GrFormer: A Novel Transformer on Grassmann Manifold for Infrared and Visible Image Fusion	Huan Kang et.al.	2506.14384	null
2025-06-18	DREAM: On hallucinations in AI-generated content for nuclear medicine imaging	Menghua Xia et.al.	2506.13995	null
2025-06-16	Robust Recursive Fusion of Multiresolution Multispectral Images with Location-Aware Neural Networks	Haoqing Li et.al.	2506.13733	null
2025-06-16	Exploiting the Exact Denoising Posterior Score in Training-Free Guidance of Diffusion Models	Gregory Bellchambers et.al.	2506.13614	null
2025-06-16	A Semantically-Aware Relevance Measure for Content-Based Medical Image Retrieval Evaluation	Xiaoyang Wei et.al.	2506.13509	null
2025-06-17	Hierarchical Multi-Positive Contrastive Learning for Patent Image Retrieval	Kshitij Kavimandan et.al.	2506.13496	null
2025-06-16	EmbodiedPlace: Learning Mixture-of-Features with Embodied Constraints for Visual Place Recognition	Bingxi Liu et.al.	2506.13133	null
2025-06-15	Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution	Hang Xu et.al.	2506.12738	null
2025-06-14	An Iterative PDE Based Illumination Restoration Scheme for Image Enhancement	Dragos-Patru Covei et.al.	2506.12560	null
2025-06-14	UniDet-D: A Unified Dynamic Spectral Attention Model for Object Detection under Adverse Weathers	Yuantao Wang et.al.	2506.12324	null
2025-06-11	Towards a general-purpose foundation model for fMRI analysis	Cheng Wang et.al.	2506.11167	null
2025-06-10	Adaptive Object Detection with ESRGAN-Enhanced Resolution & Faster R-CNN	Divya Swetha K et.al.	2506.11122	null
2025-06-12	FSATFusion: Frequency-Spatial Attention Transformer for Infrared and Visible Image Fusion	Tianpei Zhang et.al.	2506.10366	link
2025-06-11	Improving Personalized Search with Regularized Low-Rank Parameter Updates	Fiona Ryan et.al.	2506.10182	link
2025-06-10	Safeguarding Multimodal Knowledge Copyright in the RAG-as-a-Service Environment	Tianyu Chen et.al.	2506.10030	link
2025-06-11	Text-Aware Image Restoration with Diffusion Models	Jaewon Min et.al.	2506.09993	null
2025-06-11	Hierarchical Image Matching for UAV Absolute Visual Localization via Semantic and Structural Constraints	Xiangkai Zhang et.al.	2506.09748	null
2025-06-11	Beyond Calibration: Physically Informed Learning for Raw-to-Raw Mapping	Peter Grönquist et.al.	2506.08650	null
2025-06-09	PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement	Teng Hu et.al.	2506.07848	null
2025-06-09	M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration	Yongzhen Wang et.al.	2506.07814	null
2025-06-09	Design and Evaluation of Deep Learning-Based Dual-Spectrum Image Fusion Methods	Beining Xu et.al.	2506.07779	null
2025-06-08	Multi-Step Guided Diffusion for Image Restoration on Edge Devices: Toward Lightweight Perception in Embodied AI	Aditya Chakravarty et.al.	2506.07286	null
2025-06-08	A PDE-Based Image Restoration Method: Mathematical Analysis and Implementation	Dragos-Patru Covei et.al.	2506.07132	null
2025-06-07	Zero Shot Composed Image Retrieval	Santhosh Kakarla et.al.	2506.06602	null
2025-06-06	A Deep Learning Approach for Facial Attribute Manipulation and Reconstruction in Surveillance and Reconnaissance	Anees Nashath Shaik et.al.	2506.06578	null
2025-06-06	GenIR: Generative Visual Feedback for Mental Image Retrieval	Diji Yang et.al.	2506.06220	null
2025-06-06	Bidirectional Image-Event Guided Low-Light Image Enhancement	Zhanwen Liu et.al.	2506.06120	null
2025-06-06	NTIRE 2025 Challenge on HR Depth from Images of Specular and Transparent Surfaces	Pierluigi Zama Ramirez et.al.	2506.05815	null
2025-06-05	UniRes: Universal Image Restoration for Complex Degradations	Mo Zhou et.al.	2506.05599	null
2025-06-05	OpenRR-5k: A Large-Scale Benchmark for Reflection Removal in the Wild	Jie Cai et.al.	2506.05482	null
2025-06-05	Degradation-Aware Image Enhancement via Vision-Language Classification	Jie Cai et.al.	2506.05450	null
2025-06-05	SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training	Jianyi Wang et.al.	2506.05301	null
2025-06-05	Physics Informed Capsule Enhanced Variational AutoEncoder for Underwater Image Enhancement	Niki Martinel et.al.	2506.04753	null
2025-06-04	A Poisson-Guided Decomposition Network for Extreme Low-Light Image Enhancement	Isha Rao et.al.	2506.04470	null
2025-06-04	WIFE-Fusion:Wavelet-aware Intra-inter Frequency Enhancement for Multi-model Image Fusion	Tianpei Zhang et.al.	2506.03555	null
2025-06-03	NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results	Xiaohong Liu et.al.	2506.02875	null
2025-06-03	ControlMambaIR: Conditional Controls with State-Space Model for Image Restoration	Cheng Yang et.al.	2506.02633	null
2025-06-02	Entity Image and Mixed-Modal Image Retrieval Datasets	Cristian-Ioan Blaga et.al.	2506.02291	null
2025-06-04	NTIRE 2025 Challenge on RAW Image Restoration and Super-Resolution	Marcos V. Conde et.al.	2506.02197	null
2025-06-02	RAW Image Reconstruction from RGB on Smartphones. NTIRE 2025 Challenge Report	Marcos V. Conde et.al.	2506.01947	null
2025-06-02	NTIRE 2025 the 2nd Restore Any Image Model (RAIM) in the Wild Challenge	Jie Liang et.al.	2506.01394	null
2025-06-01	Quantization-based Bounds on the Wasserstein Metric	Jonathan Bobrutsky et.al.	2506.00976	null
2025-05-31	Image Restoration Learning via Noisy Supervision in the Fourier Domain	Haosen Liu et.al.	2506.00564	null
2025-05-30	RT-X Net: RGB-Thermal cross attention network for Low-Light Image Enhancement	Raman Jha et.al.	2505.24705	link
2025-05-30	Model-Guided Network with Cluster-Based Operators for Spatio-Spectral Super-Resolution	Ivan Pereira-Sánchez et.al.	2505.24605	link
2025-05-30	SORCE: Small Object Retrieval in Complex Environments	Chunxu Liu et.al.	2505.24441	link
2025-05-30	IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion Models	Hanting Wang et.al.	2505.24406	link
2025-05-30	Boosting All-in-One Image Restoration via Self-Improved Privilege Learning	Gang Wu et.al.	2505.24207	link
2025-05-29	Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch	Aneeshan Sain et.al.	2505.23763	null
2025-05-29	Proximal Algorithm Unrolling: Flexible and Efficient Reconstruction Networks for Single-Pixel Imaging	Ping Wang et.al.	2505.23180	link
2025-05-29	CURVE: CLIP-Utilized Reinforcement Learning for Visual Image Enhancement via Simple Image Processing	Yuka Ogino et.al.	2505.23102	null
2025-05-29	URWKV: Unified RWKV Model with Multi-state Perspective for Low-light Image Restoration	Rui Xu et.al.	2505.23068	link
2025-05-29	Vision-Based Assistive Technologies for People with Cerebral Visual Impairment: A Review and Focus Study	Bhanuka Gamage et.al.	2505.22983	null
2025-05-29	EquiReg: Equivariance Regularized Diffusion for Inverse Problems	Bahareh Tolooshams et.al.	2505.22973	null
2025-05-28	From Controlled Scenarios to Real-World: Cross-Domain Degradation Pattern Matching for All-in-One Image Restoration	Junyu Fan et.al.	2505.22284	null
2025-05-28	UAVPairs: A Challenging Benchmark for Match Pair Retrieval of Large-scale UAV Images	Junhuan Liu et.al.	2505.22098	null
2025-05-28	Fast Feature Matching of UAV Images via Matrix Band Reduction-based GPU Data Schedule	San Jiang et.al.	2505.22089	null
2025-05-28	GL-PGENet: A Parameterized Generation Framework for Robust Document Image Enhancement	Zhihong Tang et.al.	2505.22021	null
2025-05-28	Reference-Guided Identity Preserving Face Restoration	Mo Zhou et.al.	2505.21905	null
2025-05-28	Broadening Our View: Assistive Technology for Cerebral Visual Impairment	Bhanuka Gamage et.al.	2505.21875	null
2025-05-27	QuARI: Query Adaptive Retrieval Improvement	Eric Xing et.al.	2505.21647	null
2025-05-27	BaryIR: Learning Multi-Source Unified Representation in Continuous Barycenter Space for Generalizable All-in-One Image Restoration	Xiaole Tang et.al.	2505.21637	null
2025-05-27	Causality-Driven Infrared and Visible Image Fusion	Linli Ma et.al.	2505.20830	null
2025-05-27	ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval	Eric Xing et.al.	2505.20764	link
2025-05-28	See through the Dark: Learning Illumination-affined Representations for Nighttime Occupancy Prediction	Yuan Wu et.al.	2505.20641	link
2025-05-28	PreP-OCR: A Complete Pipeline for Document Image Restoration and Enhanced OCR Accuracy	Shuhao Guan et.al.	2505.20429	null
2025-05-26	Visualized Text-to-Image Retrieval	Di Wu et.al.	2505.20291	link
2025-05-26	Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval	Rong-Cheng Tu et.al.	2505.19952	null
2025-05-26	Can Visual Encoder Learn to See Arrows?	Naoyuki Terashita et.al.	2505.19944	null
2025-05-26	Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning for Underwater Image Enhancement	Afrah Shaahid et.al.	2505.19895	null
2025-05-26	A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking	Zixiang Zhao et.al.	2505.19858	null
2025-05-26	A Regularization-Guided Equivariant Approach for Image Restoration	Yulu Bai et.al.	2505.19799	link
2025-05-26	MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval	Rong-Cheng Tu et.al.	2505.19707	null
2025-05-25	Improving Novel view synthesis of 360 $^\circ$ Scenes in Extremely Sparse Views by Jointly Training Hemisphere Sampled Synthetic Images	Guangan Chen et.al.	2505.19264	link
2025-05-25	Benchmarking Laparoscopic Surgical Image Restoration and Beyond	Jialun Pei et.al.	2505.19161	link
2025-05-25	Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition	Xiaoyang Liu et.al.	2505.19120	link
2025-05-24	Manifold-aware Representation Learning for Degradation-agnostic Image Restoration	Bin Ren et.al.	2505.18679	null
2025-05-23	RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration	Sudarshan Rajagopalan et.al.	2505.18047	null
2025-05-23	DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval	Yuxin Yang et.al.	2505.17796	null
2025-05-23	MODEM: A Morton-Order Degradation Estimation Mechanism for Adverse Weather Image Recovery	Hainuo Wang et.al.	2505.17581	link
2025-05-23	Dual Ascent Diffusion for Inverse Problems	Minseo Kim et.al.	2505.17353	null
2025-05-22	Forward-only Diffusion Probabilistic Models	Ziwei Luo et.al.	2505.16733	link
2025-05-22	Clear Nights Ahead: Towards Multi-Weather Nighttime Image Restoration	Yuetong Liu et.al.	2505.16479	null
2025-05-22	NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment	Shuhao Han et.al.	2505.16314	null
2025-05-22	Deep Learning-Driven Ultra-High-Definition Image Restoration: A Survey	Liyan Wang et.al.	2505.16161	link
2025-05-22	Breaking Complexity Barriers: High-Resolution Image Restoration with Rank Enhanced Linear Attention	Yuang Ai et.al.	2505.16157	null
2025-05-21	Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval	Siting Li et.al.	2505.15877	null
2025-05-21	SCENIR: Visual Semantic Clarity through Unsupervised Scene Graph Retrieval	Nikolaos Chaidos et.al.	2505.15867	link
2025-05-22	Continuous Representation Methods, Theories, and Applications: An Overview and Perspectives	Yisi Luo et.al.	2505.15222	link
2025-05-20	UHD Image Dehazing via anDehazeFormer with Atmospheric-aware KV Cache	Pu Wang et.al.	2505.14010	null
2025-05-20	Multimodal RAG-driven Anomaly Detection and Classification in Laser Powder Bed Fusion using Large Language Models	Kiarash Naghavi Khanghah et.al.	2505.13828	null
2025-05-19	Adaptive Image Restoration for Video Surveillance: A Real-Time Approach	Muhammad Awais Amin et.al.	2505.13130	null
2025-05-19	LatentINDIGO: An INN-Guided Latent Diffusion Algorithm for Image Restoration	Di You et.al.	2505.12935	null
2025-05-19	Towards a Universal Image Degradation Model via Content-Degradation Disentanglement	Wenbo Yang et.al.	2505.12860	null
2025-05-19	Degradation-Aware Feature Perturbation for All-in-One Image Restoration	Xiangpeng Tian et.al.	2505.12630	link
2025-05-18	Trustworthy Image Super-Resolution via Generative Pseudoinverse	Andreas Floros et.al.	2505.12375	link
2025-05-18	SMFusion: Semantic-Preserving Fusion of Multimodal Medical Images for Enhanced Clinical Diagnosis	Haozhe Xiang et.al.	2505.12251	null
2025-05-17	Self-Learning Hyperspectral and Multispectral Image Fusion via Adaptive Residual Guided Subspace Diffusion Model	Jian Zhu et.al.	2505.11800	link
2025-05-16	Improved Bag-of-Words Image Retrieval with Geometric Constraints for Ground Texture Localization	Aaron Wilhelm et.al.	2505.11620	null
2025-05-16	Diff-Unfolding: A Model-Based Score Learning Framework for Inverse Problems	Yuanhao Wang et.al.	2505.11393	null
2025-05-16	Entropy-Driven Genetic Optimization for Deep-Feature-Guided Low-Light Image Enhancement	Nirjhor Datta et.al.	2505.11246	link
2025-05-16	Redundancy-Aware Pretraining of Vision-Language Foundation Models in Remote Sensing	Mathis Jürgen Adler et.al.	2505.11121	null
2025-05-15	torchmfbd: a flexible multi-object multi-frame blind deconvolution code	A. Asensio Ramos et.al.	2505.10639	link
2025-05-19	Super-Resolution Generative Adversarial Networks based Video Enhancement	Kağan ÇETİN et.al.	2505.10589	null
2025-05-14	PDE: Gene Effect Inspired Parameter Dynamic Evolution for Low-light Image Enhancement	Tong Li et.al.	2505.09196	null
2025-05-13	Behind the Noise: Conformal Quantile Regression Reveals Emergent Representations	Petrus H. Zwart et.al.	2505.08176	null
2025-05-12	Image Restoration via Integration of Optimal Control Techniques and the Hamilton-Jacobi-Bellman Equation	Dragos-Patru Covei et.al.	2505.07699	null
2025-05-12	Generalizable Pancreas Segmentation via a Dual Self-Supervised Learning Framework	Jun Li et.al.	2505.07165	null
2025-05-11	Bi-directional Self-Registration for Misaligned Infrared-Visible Image Fusion	Timing Li et.al.	2505.06920	null
2025-05-10	UnfoldIR: Rethinking Deep Unfolding Network in Illumination Degradation Image Restoration	Chunming He et.al.	2505.06683	null
2025-05-10	MultiTaskVIF: Segmentation-oriented visible and infrared image fusion via multi-task learning	Zixian Zhao et.al.	2505.06665	null
2025-05-09	A review of advancements in low-light image enhancement using deep learning	Fangxue Liu et.al.	2505.05759	null
2025-05-08	Semantic Style Transfer for Enhancing Animal Facial Landmark Detection	Anadil Hussein et.al.	2505.05640	null
2025-05-08	A Preliminary Study for GPT-4o on Image Restoration	Hao Yang et.al.	2505.05621	link
2025-05-07	Image Restoration via Multi-domain Learning	Xingyu Jiang et.al.	2505.05504	link
2025-05-08	SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation	Yonwoo Choi et.al.	2505.05475	link
2025-05-08	EAM: Enhancing Anything with Diffusion Transformers for Blind Super-Resolution	Haizhen Xie et.al.	2505.05209	null
2025-05-08	ADNP-15: An Open-Source Histopathological Dataset for Neuritic Plaque Segmentation in Human Brain Whole Slide Images with Frequency Domain Image Enhancement for Stain Normalization	Chenxi Zhao et.al.	2505.05041	null
2025-05-07	DFVO: Learning Darkness-free Visible and Infrared Image Disentanglement and Fusion All at Once	Qi Zhou et.al.	2505.04526	link
2025-05-08	HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation	Teng Hu et.al.	2505.04512	null
2025-05-07	TS-Diff: Two-Stage Diffusion Model for Low-Light RAW Image Enhancement	Yi Li et.al.	2505.04281	link
2025-05-07	Regional chemical potential analysis for material surfaces	Masahiro Fukuda et.al.	2505.04053	null
2025-05-04	OBD-Finder: Explainable Coarse-to-Fine Text-Centric Oracle Bone Duplicates Discovery	Chongsheng Zhang et.al.	2505.03836	link
2025-05-06	DDaTR: Dynamic Difference-aware Temporal Residual Network for Longitudinal Radiology Report Generation	Shanshan Song et.al.	2505.03401	link
2025-05-06	Seeing the Abstract: Translating the Abstract Language for Vision Language Models	Davide Talon et.al.	2505.03242	link
2025-05-05	MSFNet-CPD: Multi-Scale Cross-Modal Fusion Network for Crop Pest Detection	Jiaqi Zhang et.al.	2505.02441	link
2025-05-05	Quaternion Multi-focus Color Image Fusion	Weihua Yang et.al.	2505.02365	null
2025-05-05	Quaternion Infrared Visible Image Fusion	Weihua Yang et.al.	2505.02364	null
2025-05-04	HiLLIE: Human-in-the-Loop Training for Low-Light Image Enhancement	Xiaorui Zhao et.al.	2505.02134	null
2025-05-03	ImageR: Enhancing Bug Report Clarity by Screenshots	Xuchen Tan et.al.	2505.01925	null
2025-05-03	Multi-Scale Target-Aware Representation Learning for Fundus Image Enhancement	Haofan Wu et.al.	2505.01831	null
2025-05-02	Deblurring fission fragment mass distributions	Pierre Nzabahimana et.al.	2505.01294	null
2025-05-02	RD-UIE: Relation-Driven State Space Modeling for Underwater Image Enhancement	Kui Jiang et.al.	2505.01224	link
2025-05-01	GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-Resolution	Aditya Arora et.al.	2505.00687	null
2025-04-30	DGSolver: Diffusion Generalist Solver with Universal Posterior Sampling for Image Restoration	Hebaixu Wang et.al.	2504.21487	link
2025-04-30	VR-FuseNet: A Fusion of Heterogeneous Fundus Data and Explainable Deep Network for Diabetic Retinopathy Classification	Shamim Rahim Refat et.al.	2504.21464	null
2025-04-29	Spatial-enhanced Reflective Coded Aperture Snapshot Spectral Imaging	Jiayu Di et.al.	2504.20516	null
2025-04-29	TTTFusion: A Test-Time Training-Based Strategy for Multimodal Medical Image Fusion in Surgical Robots	Qinhua Xie et.al.	2504.20362	null
2025-04-27	FusionNet: Multi-model Linear Fusion Framework for Low-light Image Enhancement	Kangbiao Shi et.al.	2504.19295	null
2025-04-27	Marine Snow Removal Using Internally Generated Pseudo Ground Truth	Alexandra Malyugina et.al.	2504.19289	null
2025-04-27	Rendering Anywhere You See: Renderability Field-guided Gaussian Splatting	Xiaofeng Jin et.al.	2504.19261	null
2025-04-27	Adaptive Dual-domain Learning for Underwater Image Enhancement	Lingtao Peng et.al.	2504.19198	link
2025-04-27	DeepSPG: Exploring Deep Semantic Prior Guidance for Low-light Image Enhancement with Multimodal Learning	Jialang Lu et.al.	2504.19127	null
2025-04-25	From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval	Yabing Wang et.al.	2504.17990	null
2025-04-24	Dual Prompting Image Restoration with Diffusion Transformers	Dehong Kong et.al.	2504.17825	null
2025-04-24	DPMambaIR:All-in-One Image Restoration via Degradation-Aware Prompt State Space Model	Zhanwen Liu et.al.	2504.17732	null
2025-04-24	Inverse-Designed Metasurfaces for Wavefront Restoration in Under-Display Camera Systems	Jaegang Jo et.al.	2504.17368	null
2025-04-24	I-INR: Iterative Implicit Neural Representations	Ali Haider et.al.	2504.17364	null
2025-04-23	Rethinking Vision Transformer for Large-Scale Fine-Grained Image Retrieval	Xin Jiang et.al.	2504.16691	null
2025-04-23	RouteWinFormer: A Route-Window Transformer for Middle-range Attention in Image Restoration	Qifan Li et.al.	2504.16637	null
2025-04-23	Cross Paradigm Representation and Alignment Transformer for Image Deraining	Shun Zou et.al.	2504.16455	null
2025-04-22	Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs	Merve Cerit et.al.	2504.16323	link
2025-04-22	AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization	Jinda Lu et.al.	2504.15619	null
2025-04-22	SonarT165: A Large-scale Benchmark and STFTrack Framework for Acoustic Object Tracking	Yunfeng Li et.al.	2504.15609	link
2025-04-22	InstaRevive: One-Step Image Enhancement via Dynamic Score Matching	Yixuan Zhu et.al.	2504.15513	null
2025-04-21	Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration	Junyuan Deng et.al.	2504.15159	null
2025-04-21	Structure-guided Diffusion Transformer for Low-Light Image Enhancement	Xiangchen Yin et.al.	2504.15054	null
2025-04-21	Distribution-aware Dataset Distillation for Efficient Image Restoration	Zhuoran Zheng et.al.	2504.14826	null
2025-04-19	A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling	Kyle Buettner et.al.	2504.14359	null
2025-04-19	Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation	Bin Ren et.al.	2504.14249	null
2025-04-18	Towards Scale-Aware Low-Light Enhancement via Structure-Guided Transformer Design	Wei Dong et.al.	2504.14075	link
2025-04-18	Zebrafish Counting Using Event Stream Data	Qianghua Chen et.al.	2504.13692	null
2025-04-21	Circular Image Deturbulence using Quasi-conformal Geometry	Chu Chen et.al.	2504.13432	null
2025-04-17	SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs	Haoxuan Li et.al.	2504.13172	null
2025-04-17	Saliency-Aware Diffusion Reconstruction for Effective Invisible Watermark Removal	Inzamamul Alam et.al.	2504.12809	link
2025-04-17	AdaQual-Diff: Diffusion-Based Image Restoration via Adaptive Quality Prompting	Xin Su et.al.	2504.12605	null
2025-04-16	Towards Realistic Low-Light Image Enhancement via ISP Driven Data Modeling	Zhihua Wang et.al.	2504.12204	link
2025-04-16	Deep Generative Models for Bayesian Inference on High-Rate Sensor Data: Applications in Automotive Radar and Medical Imaging	Tristan S. W. Stevens et.al.	2504.12154	null
2025-04-16	Generalized Visual Relation Detection with Diffusion Models	Kaifeng Gao et.al.	2504.12100	null
2025-04-16	R-Meshfusion: Reinforcement Learning Powered Sparse-View Mesh Reconstruction with Diffusion Priors	Haoyang Wang et.al.	2504.11946	null
2025-04-16	Learning Physics-Informed Color-Aware Transforms for Low-Light Image Enhancement	Xingxing Yang et.al.	2504.11896	null
2025-04-16	HyperKING: Quantum-Classical Generative Adversarial Networks for Hyperspectral Image Restoration	Chia-Hsiang Lin et.al.	2504.11782	null
2025-04-15	Efficient Medical Image Restoration via Reliability Guided Learning in Frequency Domain	Pengcheng Zheng et.al.	2504.11286	null
2025-04-15	Enhanced Small Target Detection via Multi-Modal Fusion and Attention Mechanisms: A YOLOv5 Approach	Xiaoxiao Ma et.al.	2504.11262	null
2025-04-15	Visual Re-Ranking with Non-Visual Side Information	Gustav Hanning et.al.	2504.11134	link
2025-04-15	UKDM: Underwater keypoint detection and matching using underwater image enhancement techniques	Pedro Diaz-Garcia et.al.	2504.11063	null
2025-04-15	TMCIR: Token Merge Benefits Composed Image Retrieval	Chaoyang Wang et.al.	2504.10995	null
2025-04-15	AgentPolyp: Accurate Polyp Segmentation via Image Enhancement Agent	Pu Wang et.al.	2504.10978	null
2025-04-15	An Efficient and Mixed Heterogeneous Model for Image Restoration	Yubin Gu et.al.	2504.10967	link
2025-04-15	DAAF:Degradation-Aware Adaptive Fusion Framework for Robust Infrared and Visible Images Fusion	Tianpei Zhang et.al.	2504.10871	null
2025-04-14	PG-DPIR: An efficient plug-and-play method for high-count Poisson-Gaussian inverse problems	Maud Biquard et.al.	2504.10375	null
2025-04-14	Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis	Kaiwen Zheng et.al.	2504.10351	null
2025-04-14	VibrantLeaves: A principled parametric image generator for training deep restoration models	Raphael Achddou et.al.	2504.10201	link
2025-04-14	Learning to Harmonize Cross-vendor X-ray Images by Non-linear Image Dynamics Correction	Yucheng Lu et.al.	2504.10080	null
2025-04-14	Progressive Transfer Learning for Multi-Pass Fundus Image Restoration	Uyen Phan et.al.	2504.10025	null
2025-04-14	Beyond Degradation Redundancy: Contrastive Prompt Learning for All-in-One Image Restoration	Gang Wu et.al.	2504.09973	link
2025-04-14	Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition	Changwei Wang et.al.	2504.09881	link
2025-04-13	Computationally iterative methods for salt-and-pepper denoising	Jianwei Ke et.al.	2504.09408	null
2025-04-13	Low-Light Image Enhancement using Event-Based Illumination Estimation	Lei Sun et.al.	2504.09379	null
2025-04-12	Beyond Degradation Conditions: All-in-One Image Restoration via HOG Transformers	Jiawei Wu et.al.	2504.09377	link
2025-04-11	Hypergraph Vision Transformers: Images are More than Nodes, More than Edges	Joshua Fixelle et.al.	2504.08710	null
2025-04-11	ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration	Yongsheng Yu et.al.	2504.08591	null
2025-04-11	FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations	Cheng-Yu Hsieh et.al.	2504.08368	null
2025-04-11	DreamFuse: Adaptive Image Fusion with Diffusion Transformer	Junjia Huang et.al.	2504.08291	null
2025-04-11	VL-UR: Vision-Language-guided Universal Restoration of Images Degraded by Adverse Weather Conditions	Ziyan Liu et.al.	2504.08219	null
2025-04-10	Nonlocal Retinex-Based Variational Model and its Deep Unfolding Twin for Low-Light Image Enhancement	Daniel Torres et.al.	2504.07810	null
2025-04-10	Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval	Zehong Ma et.al.	2504.07718	null
2025-04-10	Multi-Modal Data Fusion for Moisture Content Prediction in Apple Drying	Shichen Li et.al.	2504.07465	null
2025-04-10	Synthetic CT Generation from Time-of-Flight Non-Attenutaion-Corrected PET for Whole-Body PET Attenuation Correction	Weijie Chen et.al.	2504.07450	null
2025-04-09	Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model	Yingjie Zhou et.al.	2504.07148	null
2025-04-09	Distilling Textual Priors from LLM to Efficient Image Fusion	Ran Zhang et.al.	2504.07029	link
2025-04-09	Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception	Ruotian Peng et.al.	2504.06666	null
2025-04-09	Rethinking LayerNorm in Image Restoration Transformers	MinKyu Lee et.al.	2504.06629	null
2025-04-08	AstroClearNet: Deep image prior for multi-frame astronomical image restoration	Yashil Sukurdeep et.al.	2504.06463	null
2025-04-09	Robust Fusion Controller: Degradation-aware Image Fusion with Fine-grained Language Instructions	Hao Zhang et.al.	2504.05795	null
2025-04-07	Balancing Task-invariant Interaction and Task-specific Adaptation for Unified Image Fusion	Xingyu Hu et.al.	2504.05164	null
2025-04-07	DA2Diff: Exploring Degradation-aware Adaptive Diffusion Priors for All-in-One Weather Restoration	Jiamei Xiong et.al.	2504.05135	null
2025-04-08	Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision	Yuandong Pu et.al.	2504.04903	null
2025-04-07	Content-Aware Transformer for All-in-one Image Restoration	Gang Wu et.al.	2504.04869	link
2025-04-07	Inland Waterway Object Detection in Multi-environment: Dataset and Approach	Shanshan Wang et.al.	2504.04835	null
2025-04-06	NCL-CIR: Noise-aware Contrastive Learning for Composed Image Retrieval	Peng Gao et.al.	2504.04339	null
2025-04-05	JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration	Yunlong Lin et.al.	2504.04158	null
2025-04-04	Multimodal Diffusion Bridge with Attention-Based SAR Fusion for Satellite Image Cloud Removal	Yuyang Hu et.al.	2504.03607	null
2025-04-04	REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval	Shabnam Choudhury et.al.	2504.03169	null
2025-04-04	Finding the Reflection Point: Unpadding Images to Remove Data Augmentation Artifacts in Large Open Source Image Datasets for Machine Learning	Lucas Choi et.al.	2504.03168	null
2025-04-03	RoSMM: A Robust and Secure Multi-Modal Watermarking Framework for Diffusion Models	ZhongLi Fang et.al.	2504.02640	null
2025-04-03	Noise Calibration and Spatial-Frequency Interactive Network for STEM Image Enhancement	Hesong Li et.al.	2504.02555	link
2025-04-03	HPGN: Hybrid Priors-Guided Network for Compressed Low-Light Image Enhancement	Hantang Li et.al.	2504.02373	null
2025-04-03	Brightness Perceiving for Recursive Low-Light Image Enhancement	Haodian Wang et.al.	2504.02362	link
2025-04-03	SemiISP/SemiIE: Semi-Supervised Image Signal Processor and Image Enhancement Leveraging One-to-Many Mapping sRGB-to-RAW	Masakazu Yoshimura et.al.	2504.02345	null
2025-04-02	Bridge the Gap between SNN and ANN for Image Restoration	Xin Su et.al.	2504.01755	null
2025-04-02	Prompt-Guided Attention Head Selection for Focus-Oriented Image Retrieval	Yuji Nozawa et.al.	2504.01348	null
2025-04-01	IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval	Bangwei Liu et.al.	2504.00954	null
2025-04-01	Scaling Prompt Instructed Zero Shot Composed Image Retrieval with Image-Only Data	Yiqun Duan et.al.	2504.00812	null
2025-04-01	Deconver: A Deconvolutional Network for Medical Image Segmentation	Pooya Ashtari et.al.	2504.00302	link
2025-03-31	InstructRestore: Region-Customized Image Restoration with Human Instructions	Shuaizheng Liu et.al.	2503.24357	link
2025-03-31	CIBR: Cross-modal Information Bottleneck Regularization for Robust CLIP Generalization	Yingrui Ji et.al.	2503.24182	null
2025-03-31	3D Dental Model Segmentation with Geometrical Boundary Preserving	Shufan Xi et.al.	2503.23702	link
2025-03-30	Multiview Image-Based Localization	Cameron Fiore et.al.	2503.23577	null
2025-03-30	ControlFusion: A Controllable Image Fusion Framework with Language-Vision Degradation Prompts	Linfeng Tang et.al.	2503.23356	null
2025-03-30	DSPFusion: Image Fusion via Degradation and Semantic Dual-Prior Guidance	Linfeng Tang et.al.	2503.23355	null
2025-03-29	A GAN-Enhanced Deep Learning Framework for Rooftop Detection from Historical Aerial Imagery	Pengyu Chen et.al.	2503.23200	null
2025-03-29	indiSplit: Bringing Severity Cognizance to Image Decomposition in Fluorescence Microscopy	Ashesh Ashesh et.al.	2503.22983	null
2025-03-28	RELD: Regularization by Latent Diffusion Models for Image Restoration	Pasquale Cascarano et.al.	2503.22563	null
2025-03-27	Q-MambaIR: Accurate Quantized Mamba for Efficient Image Restoration	Yujie Chen et.al.	2503.21970	null
2025-03-27	LOCORE: Image Re-ranking with Long-Context Sequence Modeling	Zilin Xiao et.al.	2503.21772	link
2025-03-27	Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck	Adrian Bulat et.al.	2503.21757	null
2025-03-27	Invert2Restore: Zero-Shot Degradation-Blind Image Restoration	Hamadi Chihaoui et.al.	2503.21486	null
2025-03-27	Diffusion Image Prior	Hamadi Chihaoui et.al.	2503.21410	null
2025-03-27	FineCIR: Explicit Parsing of Fine-Grained Modification Semantics for Composed Image Retrieval	Zixu Li et.al.	2503.21309	link
2025-03-27	Clean Image May be Dangerous: Data Poisoning Attacks Against Deep Hashing	Shuai Li et.al.	2503.21236	null
2025-03-26	Underwater Image Enhancement by Convolutional Spiking Neural Networks	Vidya Sudevan et.al.	2503.20485	link
2025-03-26	Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration	Shihao Zhou et.al.	2503.20174	null
2025-03-25	CoLLM: A Large Language Model for Composed Image Retrieval	Chuong Huynh et.al.	2503.19910	link
2025-03-25	LENVIZ: A High-Resolution Low-Exposure Night Vision Benchmark Dataset	Manjushree Aithal et.al.	2503.19804	null
2025-03-25	Scene-agnostic Pose Regression for Visual Localization	Junwei Zheng et.al.	2503.19543	null
2025-03-25	From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting	Zhiwei Huang et.al.	2503.19358	null
2025-03-25	Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval	Haoqiang Lin et.al.	2503.19296	link
2025-03-24	LLGS: Unsupervised Gaussian Splatting for Image Enhancement and Reconstruction in Pure Dark Environment	Haoran Wang et.al.	2503.18640	null
2025-03-24	OCCO: LVM-guided Infrared and Visible Image Fusion Framework based on Object-aware and Contextual COntrastive Learning	Hui Li et.al.	2503.18635	null
2025-03-24	Dig2DIG: Dig into Diffusion Information Gains for Image Fusion	Bing Cao et.al.	2503.18627	null
2025-03-24	Exploring State Space Model in Wavelet Domain: An Infrared and Visible Image Fusion Network via Wavelet Transform and State Space Model	Tianpei Zhang et.al.	2503.18378	null
2025-03-23	LocDiffusion: Identifying Locations on Earth by Diffusing in the Hilbert Space	Zhangyu Wang et.al.	2503.18142	null
2025-03-23	Deep Learning Assisted Denoising of Experimental Micrographs	Owais Ahmad et.al.	2503.17945	null
2025-03-23	Cross-Domain Underwater Image Enhancement Guided by No-Reference Image Quality Assessment: A Transfer Learning Approach	Zhi Zhang et.al.	2503.17937	null
2025-03-23	Cat-AIR: Content and Task-Aware All-in-One Image Restoration	Jiachen Jiang et.al.	2503.17915	null
2025-03-23	What Time Tells Us? An Explorative Study of Time Awareness Learned from Static Images	Dongheng Lin et.al.	2503.17899	null
2025-03-22	good4cir: Generating Detailed Synthetic Captions for Composed Image Retrieval	Pranavi Kolouju et.al.	2503.17871	null
2025-03-21	Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval	Yuanmin Tang et.al.	2503.17109	link
2025-03-21	Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks	Haijin Zeng et.al.	2503.16930	null
2025-03-20	Efficient Bayesian Computation Using Plug-and-Play Priors for Poisson Inverse Problems	Teresa Klatzer et.al.	2503.16222	null
2025-03-20	3-D Image-to-Image Fusion in Lightsheet Microscopy by Two-Step Adversarial Network: Contribution to the FuseMyCells Challenge	Marek Wodzinski et.al.	2503.16075	null
2025-03-20	PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval	Qiang Zou et.al.	2503.16064	link
2025-03-20	Automating 3D Dataset Generation with Neural Radiance Fields	P. Schulz et.al.	2503.15997	link
2025-03-20	DIPLI: Deep Image Prior Lucky Imaging for Blind Astronomical Image Restoration	Suraj Singh et.al.	2503.15984	null
2025-03-21	UniCoRN: Latent Diffusion-based Unified Controllable Image Restoration Network across Multiple Degradations	Debabrata Mandal et.al.	2503.15868	null
2025-03-19	Image Restoration Models with Optimal Transport and Total Variation Regularization	Weijia Huang et.al.	2503.14947	null
2025-03-19	MMAIF: Multi-task and Multi-degradation All-in-One for Image Fusion with Language Guidance	Zihan Cao et.al.	2503.14944	null
2025-03-19	Degradation Alchemy: Self-Supervised Unknown-to-Known Transformation for Blind Hyperspectral Image Fusion	He Huang et.al.	2503.14892	null
2025-03-18	Revisiting Image Fusion for Multi-Illuminant White-Balance Correction	David Serrano-Lozano et.al.	2503.14774	null
2025-03-18	SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model	Yucheng Mao et.al.	2503.14463	null
2025-03-18	AI-Driven Diabetic Retinopathy Diagnosis Enhancement through Image Processing and Salp Swarm Algorithm-Optimized Ensemble Network	Saif Ur Rehman Khan et.al.	2503.14209	null
2025-03-18	Towards properties of adversarial image perturbations	Egor Kuznetsov et.al.	2503.14111	null
2025-03-18	Intra and Inter Parser-Prompted Transformers for Effective Image Restoration	Cong Wang et.al.	2503.14037	link
2025-03-17	Scale Efficient Training for Large Datasets	Qing Zhou et.al.	2503.13385	link
2025-03-17	From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspective	Chen Zhao et.al.	2503.13165	null
2025-03-17	All You Need to Know About Training Image Retrieval Models	Gabriele Berton et.al.	2503.13045	link
2025-03-17	Decouple to Reconstruct: High Quality UHD Restoration via Active Feature Disentanglement and Reversible Fusion	Yidi Liu et.al.	2503.12764	null
2025-03-16	DPF-Net: Physical Imaging Model Embedded Data-Driven Underwater Image Enhancement	Han Mei et.al.	2503.12470	link
2025-03-16	Pathology Image Restoration via Mixture of Prompts	Jiangdong Cai et.al.	2503.12399	link
2025-03-14	Advancements in Real-Time Oncology Diagnosis: Harnessing AI and Image Fusion Techniques	Leila Bagheriye et.al.	2503.11332	null
2025-03-14	Breaking Shallow Limits: Task-Driven Pixel Fusion for Gap-free RGBT Tracking	Andong Lu et.al.	2503.11247	null
2025-03-14	Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption	Du Chen et.al.	2503.11221	null
2025-03-14	InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences	Hongkai Zheng et.al.	2503.11043	null
2025-03-13	ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning	Pengfei Luo et.al.	2503.10166	link
2025-03-13	Hybrid Agents for Image Restoration	Bingchen Li et.al.	2503.10120	null
2025-03-13	Dream-IF: Dynamic Relative EnhAnceMent for Image Fusion	Xingxin Xu et.al.	2503.10109	null
2025-03-12	FDCT: Frequency-Aware Decomposition and Cross-Modal Token-Alignment for Multi-Sensor Target Classification	Shoaib Meraj Sami et.al.	2503.09873	null
2025-03-12	Multi-Agent Image Restoration	Xu Jiang et.al.	2503.09403	null
2025-03-12	Revisiting Medical Image Retrieval via Knowledge Consolidation	Yang Nan et.al.	2503.09370	null
2025-03-12	MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration	Zhehui Wu et.al.	2503.09131	link
2025-03-12	Prompt to Restore, Restore to Prompt: Cyclic Prompting for Universal Adverse Weather Removal	Rongxin Liao et.al.	2503.09013	link
2025-03-11	QUIET-SR: Quantum Image Enhancement Transformer for Single Image Super-Resolution	Siddhant Dutta et.al.	2503.08759	null
2025-03-11	Language-Depth Navigated Thermal and Visible Image Fusion	Jinchang Zhang et.al.	2503.08676	null
2025-03-11	PromptLNet: Region-Adaptive Aesthetic Enhancement via Prompt Guidance in Low-Light Enhancement Net	Jun Yin et.al.	2503.08276	null
2025-03-11	TSCnet: A Text-driven Semantic-level Controllable Framework for Customized Low-Light Image Enhancement	Miao Zhang et.al.	2503.08168	null
2025-03-11	Few-Shot Class-Incremental Model Attribution Using Learnable Representation From CLIP-ViT Features	Hanbyul Lee et.al.	2503.08148	null
2025-03-11	Deep Perceptual Enhancement for Medical Image Analysis	S M A Sharif et.al.	2503.08027	link
2025-03-10	GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts	Minwen Liao et.al.	2503.07417	null
2025-03-10	Retinex-MEF: Retinex-based Glare Effects Aware Unsupervised Multi-Exposure Image Fusion	Haowen Bai et.al.	2503.07235	null
2025-03-11	Boosting Diffusion-Based Text Image Super-Resolution Model Towards Generalized Real-World Scenarios	Chenglu Pan et.al.	2503.07232	null
2025-03-10	Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization	Michael Green et.al.	2503.07038	null
2025-03-10	Zero-Shot Hashing Based on Reconstruction With Part Alignment	Yan Jiang et.al.	2503.07037	null
2025-03-10	Learning a Unified Degradation-aware Representation Model for Multi-modal Image Fusion	Haolong Ma et.al.	2503.07033	null
2025-03-10	MERLION: Marine ExploRation with Language guIded Online iNformative Visual Sampling and Enhancement	Shrutika Vishal Thengane et.al.	2503.06953	link
2025-03-09	RoboDesign1M: A Large-scale Dataset for Robot Design Understanding	Tri Le et.al.	2503.06796	null
2025-03-09	StructVPR++: Distill Structural and Semantic Knowledge with Weighting Samples for Visual Place Recognition	Yanqing Shen et.al.	2503.06601	link
2025-03-07	Data-Efficient Generalization for Zero-shot Composed Image Retrieval	Zining Chen et.al.	2503.05204	null
2025-03-06	RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining	Tengfei Zhang et.al.	2503.04653	null
2025-03-06	Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior	Haitao Wu et.al.	2503.04207	link
2025-03-05	An Adaptive Underwater Image Enhancement Framework via Multi-Domain Fusion and Color Compensation	Yuezhe Tian et.al.	2503.03640	null
2025-03-05	Mineral segmentation using electron microscope images and spectral sampling through multimodal graph neural networks	Samuel Repka et.al.	2503.03507	null
2025-03-05	Two-Stream Thermal Imaging Fusion for Enhanced Time of Birth Detection in Neonatal Care	Jorge García-Torres et.al.	2503.03244	null
2025-03-03	Hyperspectral Image Restoration and Super-resolution with Physics-Aware Deep Learning for Biomedical Applications	Yuchen Xiang et.al.	2503.02908	null
2025-03-04	ERetinex: Event Camera Meets Retinex Theory for Low-Light Image Enhancement	Xuejian Guo et.al.	2503.02484	link
2025-03-04	Semantic Prior Distillation with Vision Foundation Model for Enhanced Rapid Bone Scintigraphy Image Restoration	Pengchen Liang et.al.	2503.02321	null
2025-03-03	MRI super-resolution reconstruction using efficient diffusion probabilistic model with residual shifting	Mojtaba Safari et.al.	2503.01576	link
2025-03-03	Wavelet-Enhanced Desnowing: A Novel Single Image Restoration Approach for Traffic Surveillance under Adverse Weather Conditions	Zihan Shen et.al.	2503.01339	null
2025-03-03	Composed Multi-modal Retrieval: A Survey of Approaches and Applications	Kun Zhang et.al.	2503.01334	link
2025-03-03	Reconciling Stochastic and Deterministic Strategies for Zero-shot Image Restoration using Diffusion Model in Dual	Chong Wang et.al.	2503.01288	link
2025-03-03	Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond	Guanyao Wu et.al.	2503.01210	null
2025-03-02	Explainable Classifier for Malignant Lymphoma Subtyping via Cell Graph and Image Fusion	Daiki Nishiyama et.al.	2503.00925	null
2025-03-01	Self-supervision via Controlled Transformation and Unpaired Self-conditioning for Low-light Image Enhancement	Aupendu Kar et.al.	2503.00642	link
2025-03-01	Class-Independent Increment: An Efficient Approach for Multi-label Class-Incremental Learning	Songlin Dong et.al.	2503.00515	null
2025-02-28	SEE: See Everything Every Time -- Adaptive Brightness Adjustment for Broad Light Range Images via Events	Yunfan Lu et.al.	2502.21120	null
2025-02-28	CoTMR: Chain-of-Thought Multi-Scale Reasoning for Training-Free Zero-Shot Composed Image Retrieval	Zelong Sun et.al.	2502.20826	null
2025-02-28	Diffusion Restoration Adapter for Real-World Image Restoration	Hanbang Liang et.al.	2502.20679	null
2025-02-28	HVI: A New Color Space for Low-light Image Enhancement	Qingsen Yan et.al.	2502.20272	link
2025-02-27	Night-Voyager: Consistent and Efficient Nocturnal Vision-Aided State Estimation in Object Maps	Tianxiao Gao et.al.	2502.20054	null
2025-02-27	Striving for Faster and Better: A One-Layer Architecture with Auto Re-parameterization for Low-Light Image Enhancement	Nan An et.al.	2502.19867	null
2025-02-27	One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion	Chunyang Cheng et.al.	2502.19854	link
2025-02-26	ILACS-LGOT: A Multi-Layer Contrast Enhancement Approach for Palm-Vein Images	Kaveen Perera et.al.	2502.19456	null
2025-02-27	On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation	Ruben T. Lucassen et.al.	2502.19285	null
2025-02-26	Self-supervised conformal prediction for uncertainty quantification in Poisson imaging problems	Bernardin Tamo Amougou et.al.	2502.19194	null
2025-02-26	Multi-level Attention-guided Graph Neural Network for Image Restoration	Jiatao Jiang et.al.	2502.19181	null
2025-02-27	RetinaRegen: A Hybrid Model for Readability and Detail Restoration in Fundus Images	Yuhan Tang et.al.	2502.19153	null
2025-02-26	Dynamic Degradation Decomposition Network for All-in-One Image Restoration	Huiqiang Wang et.al.	2502.19068	null
2025-02-25	Spatial Analysis of Neuromuscular Junctions Activation in Three-Dimensional Histology-based Muscle Reconstructions	Alessandro Ascani Orsini et.al.	2502.18646	link
2025-02-24	Splitting Regularized Wasserstein Proximal Algorithms for Nonsmooth Sampling Problems	Fuqun Han et.al.	2502.16773	link
2025-02-23	Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries	Yin Wu et.al.	2502.16636	link
2025-02-21	Improved Partial Differential Equation and Fast Approximation Algorithm for Hazy/Underwater/Dust Storm Image Enhancement	Uche A. Nnolim et.al.	2502.15986	null
2025-02-21	ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval	Guanqi Zhan et.al.	2502.15682	null
2025-02-21	LUMINA-Net: Low-light Upgrade through Multi-stage Illumination and Noise Adaptation Network for Image Enhancement	Namrah Siddiqua et.al.	2502.15186	null
2025-02-21	Optimized Pap Smear Image Enhancement: Hybrid PMD Filter-CLAHE Using Spider Monkey Optimization	Ach Khozaimi et.al.	2502.15156	null
2025-02-20	Reinforcement Learning for Ultrasound Image Analysis A Comprehensive Review of Advances and Applications	Maha Ezzelarab et.al.	2502.14995	null
2025-02-20	CrossFuse: Learning Infrared and Visible Image Fusion by Cross-Sensor Top-K Vision Alignment and Beyond	Yukai Shi et.al.	2502.14493	null
2025-02-20	EyeBench: A Call for More Rigorous Evaluation of Retinal Image Enhancement	Wenhui Zhu et.al.	2502.14260	null
2025-02-19	RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior	Ching-Hua Lee et.al.	2502.13574	null
2025-02-18	Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization	Shuo Xing et.al.	2502.13146	link
2025-02-18	Local Flaw Detection with Adaptive Pyramid Image Fusion Across Spatial Sampling Resolution for SWRs	Siyu You et.al.	2502.12512	null
2025-02-17	Descriminative-Generative Custom Tokens for Vision-Language Models	Pramuditha Perera et.al.	2502.12095	null
2025-02-17	ILIAS: Instance-Level Image retrieval At Scale	Giorgos Kordopatis-Zilos et.al.	2502.11748	null
2025-02-17	Adversarially Robust CLIP Models Can Induce Better (Robust) Perceptual Metrics	Francesco Croce et.al.	2502.11725	link
2025-02-17	Precise GPS-Denied UAV Self-Positioning via Context-Enhanced Cross-View Geo-Localization	Yuanze Xu et.al.	2502.11408	null
2025-02-12	E2LVLM:Evidence-Enhanced Large Vision-Language Model for Multimodal Out-of-Context Misinformation Detection	Junjie Wu et.al.	2502.10455	null
2025-02-19	Compression-Aware One-Step Diffusion Model for JPEG Artifact Removal	Jinpei Guo et.al.	2502.09873	link
2025-02-13	Source function from two-particle correlation function through entropy-regularized Richardson-Lucy deblurring	C. K. Tam et.al.	2502.09478	null
2025-02-13	ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation	Rotem Shalev-Arkushin et.al.	2502.09411	null
2025-02-12	Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions	Prajwal Gatti et.al.	2502.08438	null
2025-02-13	MRS: A Fast Sampler for Mean Reverting Diffusion based on ODE and SDE Solvers	Ao Li et.al.	2502.07856	null
2025-02-11	Captured by Captions: On Memorization and its Mitigation in CLIP Models	Wenhao Wang et.al.	2502.07830	null
2025-02-11	Multi-Task-oriented Nighttime Haze Imaging Enhancer for Vision-driven Measurement Systems	Ai Chen et.al.	2502.07351	link
2025-02-11	Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos	Haowen Gao et.al.	2502.07327	null
2025-02-11	PDV: Prompt Directional Vectors for Zero-shot Composed Image Retrieval	Osman Tursun et.al.	2502.07215	null
2025-02-10	AstroLoc: Robust Space to Ground Image Localizer	Gabriele Berton et.al.	2502.07003	null
2025-02-10	UniDemoiré: Towards Universal Image Demoiréing with Data Generation and Synthesis	Zemin Yang et.al.	2502.06324	null
2025-02-09	A Comprehensive Survey on Image Signal Processing Approaches for Low-Illumination Image Enhancement	Muhammad Turab et.al.	2502.05995	null
2025-02-09	Uni-Retrieval: A Multi-Style Retrieval Framework for STEM's Education	Yanhao Jia et.al.	2502.05863	null
2025-02-11	UniDB: A Unified Diffusion Bridge Framework via Stochastic Optimal Control	Kaizhen Zhu et.al.	2502.05749	link
2025-02-07	Self-supervised Conformal Prediction for Uncertainty Quantification in Imaging Problems	Jasper M. Everink et.al.	2502.05127	null
2025-02-07	Performance Evaluation of Image Enhancement Techniques on Transfer Learning for Touchless Fingerprint Recognition	S Sreehari et.al.	2502.04680	null
2025-02-07	HetSSNet: Spatial-Spectral Heterogeneous Graph Learning Network for Panchromatic and Multispectral Images Fusion	Mengting Ma et.al.	2502.04623	null
2025-02-06	Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion	Marco Mistretta et.al.	2502.04263	link
2025-02-05	All-in-One Image Compression and Restoration	Huimin Zeng et.al.	2502.03649	link
2025-02-05	Efficient Image Restoration via Latent Consistency Flow Matching	Elad Cohen et.al.	2502.03500	null
2025-02-05	Human-Aligned Image Models Improve Visual Decoding from the Brain	Nona Rajabi et.al.	2502.03081	null
2025-02-04	Blind Visible Watermark Removal with Morphological Dilation	Preston K. Robinette et.al.	2502.02676	null
2025-02-04	MATCNN: Infrared and Visible Image Fusion Method Based on Multi-scale CNN with Attention Transformer	Jingjing Liu et.al.	2502.01959	link
2025-02-03	Deep Unfolding Multi-modal Image Fusion Network via Attribution Analysis	Haowen Bai et.al.	2502.01467	null
2025-02-03	Human Body Restoration with One-Step Diffusion Model and A New Benchmark	Jue Gong et.al.	2502.01411	null
2025-02-03	ConceptVAE: Self-Supervised Fine-Grained Concept Disentanglement from 2D Echocardiographies	Costin F. Ciusdel et.al.	2502.01335	null
2025-02-04	Compressed Image Generation with Denoising Diffusion Codebook Models	Guy Ohayon et.al.	2502.01189	null
2025-02-01	A framework for river connectivity classification using temporal image processing and attention based neural networks	Timothy James Becker et.al.	2502.00474	null
2025-02-01	Shape from Semantics: 3D Shape Generation from Multi-View Semantics	Liangchen Li et.al.	2502.00360	null
2025-01-31	Deep Ensembling with Multimodal Image Fusion for Efficient Classification of Lung Cancer	Surochita Pal et.al.	2502.00078	null
2025-01-30	Integrating Spatial and Frequency Information for Under-Display Camera Image Restoration	Kyusu Ahn et.al.	2501.18517	null
2025-01-31	MatIR: A Hybrid Mamba-Transformer Image Restoration Model	Juan Wen et.al.	2501.18401	link
2025-01-30	Arbitrary Data as Images: Fusion of Patient Data Across Modalities and Irregular Intervals with Vision Transformers	Malte Tölle et.al.	2501.18237	null
2025-01-29	Segmentation-Aware Generative Reinforcement Network (GRN) for Tissue Layer Segmentation in 3-D Ultrasound Images for Chronic Low-back Pain (cLBP) Assessment	Zixue Zeng et.al.	2501.17690	link
2025-01-28	Text-to-Image Generation for Vocabulary Learning Using the Keyword Method	Nuwan T. Attygalle et.al.	2501.17099	null
2025-01-27	Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration	Long Peng et.al.	2501.16583	null
2025-01-27	UDBE: Unsupervised Diffusion-based Brightness Enhancement in Underwater Images	Tatiana Taís Schein et.al.	2501.16211	link
2025-01-27	Freestyle Sketch-in-the-Loop Image Segmentation	Subhadeep Koley et.al.	2501.16022	null
2025-01-27	CausalSR: Structural Causal Model-Driven Super-Resolution with Counterfactual Inference	Zhengyang Lu et.al.	2501.15852	link
2025-01-26	Universal Image Restoration Pre-training via Degradation Classification	JiaKui Hu et.al.	2501.15510	link
2025-01-26	Zero-Shot Interactive Text-to-Image Retrieval via Diffusion-Augmented Representations	Zijun Long et.al.	2501.15379	null
2025-01-24	Enhanced Confocal Laser Scanning Microscopy with Adaptive Physics Informed Deep Autoencoders	Zaheer Ahmad et.al.	2501.14709	null
2025-01-24	Bayesian Neural Networks for One-to-Many Mapping in Image Enhancement	Guoxi Huang et.al.	2501.14265	link
2025-01-24	CDI: Blind Image Restoration Fidelity Evaluation based on Consistency with Degraded Image	Xiaojun Tang et.al.	2501.14264	null
2025-01-23	Revisiting CLIP: Efficient Alignment of 3D MRI and Tabular Data using Domain-Specific Foundation Models	Jakob Krogh Petersen et.al.	2501.14051	link
2025-01-23	INDIGO+: A Unified INN-Guided Probabilistic Diffusion Algorithm for Blind and Non-Blind Image Restoration	Di You et.al.	2501.14014	null
2025-01-23	Binary Diffusion Probabilistic Model	Vitaliy Kinakh et.al.	2501.13915	null
2025-01-23	Where Do You Go? Pedestrian Trajectory Prediction using Scene Features	Mohammad Ali Rezaei et.al.	2501.13848	null
2025-01-22	UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior	I-Hsiang Chen et.al.	2501.13134	null
2025-01-22	Deep Learning-Based Image Recovery and Pose Estimation for Resident Space Objects	Louis Aberdeen et.al.	2501.13009	null
2025-01-22	UniUIR: Considering Underwater Image Restoration as An All-in-One Learner	Xu Zhang et.al.	2501.12981	null
2025-01-22	FDG-Diff: Frequency-Domain-Guided Diffusion Framework for Compressed Hazy Image Restoration	Ruicheng Zhang et.al.	2501.12832	link
2025-01-21	Quality Enhancement of Radiographic X-ray Images by Interpretable Mapping	Hongxu Yang et.al.	2501.12245	null
2025-01-21	DLEN: Dual Branch of Transformer for Low-Light Image Enhancement in Dual Domains	Junyu Xia et.al.	2501.12235	null
2025-01-21	Proxies for Distortion and Consistency with Applications for Real-World Image Restoration	Sean Man et.al.	2501.12102	null
2025-01-20	SILO: Solving Inverse Problems with Latent Operators	Ron Raphaeli et.al.	2501.11746	null
2025-01-19	Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection	Zhipeng Yu et.al.	2501.11063	link
2025-01-19	Rethinking Early-Fusion Strategies for Improved Multimodal Image Segmentation	Zhengwen Shen et.al.	2501.10958	null
2025-01-18	Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption	Jinyuan Liu et.al.	2501.10761	link
2025-01-18	A Resource-Efficient Training Framework for Remote Sensing Text--Image Retrieval	Weihang Zhang et.al.	2501.10638	null
2025-01-17	DiffStereo: High-Frequency Aware Diffusion Model for Stereo Image Restoration	Huiyun Cao et.al.	2501.10325	null
2025-01-16	FLOL: Fast Baselines for Real-World Low-Light Enhancement	Juan C. Benito et.al.	2501.09718	link
2025-01-16	Soft Knowledge Distillation with Multi-Dimensional Cross-Net Attention for Image Restoration Models Compression	Yongheng Zhang et.al.	2501.09321	null
2025-01-16	Knowledge Distillation for Image Restoration : Simultaneous Learning from Degraded and Clean Images	Yongheng Zhang et.al.	2501.09268	null
2025-01-15	Vision Foundation Models for Computed Tomography	Suraj Pai et.al.	2501.09001	link
2025-01-12	SCOT: Self-Supervised Contrastive Pretraining For Zero-Shot Compositional Retrieval	Bhavin Jawade et.al.	2501.08347	null
2025-01-14	AI Driven Water Segmentation with deep learning models for Enhanced Flood Monitoring	Sanjida Afrin Mou et.al.	2501.08266	link
2025-01-13	Depth and Image Fusion for Road Obstacle Detection Using Stereo Camera	Oleg Perezyabov et.al.	2501.07245	null
2025-01-12	Static Segmentation by Tracking: A Frustratingly Label-Efficient Approach to Fine-Grained Segmentation	Zhenyang Feng et.al.	2501.06749	null
2025-01-11	Natural Language Supervision for Low-light Image Enhancement	Jiahui Tang et.al.	2501.06546	null
2025-01-10	Underwater Image Enhancement using Generative Adversarial Networks: A Survey	Kancharagunta Kishan Babu et.al.	2501.06273	null
2025-01-09	HipyrNet: Hypernet-Guided Feature Pyramid network for mixed-exposure correction	Shaurya Singh Rathore et.al.	2501.05195	null
2025-01-09	ResPanDiff: Diffusion Model with Disentangled Modulations for Image Fusion	Shiqi Cao et.al.	2501.05091	null
2025-01-09	IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation	Qi Chen et.al.	2501.04995	link
2025-01-08	Color Correction Meets Cross-Spectral Refinement: A Distribution-Aware Diffusion for Underwater Image Restoration	Laibin Chang et.al.	2501.04740	null
2025-01-14	HyFusion: Enhanced Reception Field Transformer for Hyperspectral Image Fusion	Chia-Ming Lee et.al.	2501.04665	null
2025-01-08	FrontierNet: Learning Visual Cues to Explore	Boyang Sun et.al.	2501.04597	link
2025-01-08	MB-TaylorFormer V2: Improved Multi-branch Linear Transformer Expanded by Taylor Formula for Image Restoration	Zhi Jin et.al.	2501.04486	link
2025-01-08	Recognition-Oriented Low-Light Image Enhancement based on Global and Pixelwise Optimization	Seitaro Ono et.al.	2501.04210	null
2025-01-07	Fixed Points of Deep Neural Networks: Emergence, Stability, and Applications	L. Berlyand et.al.	2501.04182	null
2025-01-07	Convergent Primal-Dual Plug-and-Play Image Restoration: A General Algorithm and Applications	Yodai Suzuki et.al.	2501.03780	link
2025-01-06	ImageMM: Joint multi-frame image restoration and super-resolution	Yashil Sukurdeep et.al.	2501.03002	null
2025-01-06	Integrating Language-Image Prior into EEG Decoding for Cross-Task Zero-Calibration RSVP-BCI	Xujin Li et.al.	2501.02841	null
2025-01-06	Underwater Image Restoration Through a Prior Guided Hybrid Sense Approach and Extensive Benchmark Analysis	Xiaojiao Guo et.al.	2501.02701	link
2025-01-03	iCBIR-Sli: Interpretable Content-Based Image Retrieval with 2D Slice Embeddings	Shuhei Tomoshige et.al.	2501.01642	null
2025-01-02	Domain-invariant feature learning in brain MR imaging for content-based image retrieval	Shuya Tobari et.al.	2501.01326	null
2025-01-03	Conditional Consistency Guided Image Translation and Enhancement	Amil Bhagat et.al.	2501.01223	link
2025-01-02	Generalized Task-Driven Medical Image Quality Enhancement with Gradient Promotion	Dong Zhang et.al.	2501.01114	null
2024-12-30	Text-to-Image GAN with Pretrained Representations	Xiaozhou You et.al.	2501.00116	null
2024-12-30	Varformer: Adapting VAR's Generative Prior for Image Restoration	Siyang Wang et.al.	2412.21063	link
2024-12-30	Low-Light Image Enhancement via Generative Perceptual Priors	Han Zhou et.al.	2412.20916	link
2024-12-29	Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)	Tomer Garber et.al.	2412.20596	link
2024-12-28	Injecting Explainability and Lightweight Design into Weakly Supervised Video Anomaly Detection Systems	Wen-Dong Jiang et.al.	2412.20201	null
2024-12-28	UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity	Jingbo Lin et.al.	2412.20157	link
2024-12-28	MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration	Boyun Li et.al.	2412.20066	link
2024-12-28	An Ordinary Differential Equation Sampler with Stochastic Start for Diffusion Bridge Models	Yuang Wang et.al.	2412.19992	null
2024-12-27	Generative Adversarial Network on Motion-Blur Image Restoration	Zhengdong Li et.al.	2412.19479	null
2024-12-25	FOR: Finetuning for Object Level Open Vocabulary Image Retrieval	Hila Levi et.al.	2412.18806	null
2024-12-24	Underwater Image Restoration via Polymorphic Large Kernel CNNs	Xiaojiao Guo et.al.	2412.18459	link
2024-12-24	UNet--: Memory-Efficient and Feature-Enhanced Network Architecture based on U-Net with Reduced Skip-Connections	Lingxiao Yin et.al.	2412.18276	null
2024-12-24	SDM-Car: A Dataset for Small and Dim Moving Vehicles Detection in Satellite Videos	Zhen Zhang et.al.	2412.18214	link
2024-12-24	ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval	Le Dong et.al.	2412.18136	link
2024-12-22	Where am I? Cross-View Geo-localization with Natural Language Descriptions	Junyan Ye et.al.	2412.17007	null
2024-12-21	Optoelectronic generative adversarial networks	Jumin Qiu et.al.	2412.16672	link
2024-12-21	Complementary Advantages: Exploiting Cross-Field Frequency Correlation for NIR-Assisted Image Denoising	Yuchen Wang et.al.	2412.16645	null
2024-12-24	Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling	Daichi Yashima et.al.	2412.16576	link
2024-12-21	Rethinking Model Redundancy for Low-light Image Enhancement	Tong Li et.al.	2412.16459	null
2024-12-20	SeagrassFinder: Deep Learning for Eelgrass Detection and Coverage Estimation in the Wild	Jannik Elsäßer et.al.	2412.16147	null
2024-12-20	NeuroPump: Simultaneous Geometric and Color Rectification for Underwater Images	Yue Guo et.al.	2412.15890	null
2024-12-20	Multi-dimensional Visual Prompt Enhanced Image Restoration via Mamba-Transformer Aggregation	Aiwen Jiang et.al.	2412.15845	link
2024-12-20	A New Method to Capturing Compositional Knowledge in Linguistic Space	Jiahe Wan et.al.	2412.15632	null
2024-12-20	Stabilizing Laplacian Inversion in Fokker-Planck Image Retrieval using the Transport-of-Intensity Equation	Samantha J Alloo et.al.	2412.15513	null
2024-12-19	Learning Visual Composition through Improved Semantic Guidance	Austin Stone et.al.	2412.15396	null
2024-12-19	Unified Image Restoration and Enhancement: Degradation Calibrated Cycle Reconstruction Diffusion Model	Minglong Xue et.al.	2412.14630	link
2024-12-19	MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval	Junjie Zhou et.al.	2412.14475	null
2024-12-18	Personalized Generative Low-light Image Denoising and Enhancement	Xijun Wang et.al.	2412.14327	null
2024-12-18	Distilled Pooling Transformer Encoder for Efficient Realistic Image Dehazing	Le-Anh Tran et.al.	2412.14220	link
2024-12-18	Adversarial Hubness in Multi-Modal Retrieval	Tingwei Zhang et.al.	2412.14113	link
2024-12-18	Maybe you are looking for CroQS: Cross-modal Query Suggestion for Text-to-Image Retrieval	Giacomo Pacini et.al.	2412.13834	null
2024-12-18	Fed-AugMix: Balancing Privacy and Utility via Data Augmentation	Haoyang Li et.al.	2412.13818	null
2024-12-18	Multi-Exposure Image Fusion via Distilled 3D LUT Grid with Editable Mode	Xin Su et.al.	2412.13749	link
2024-12-18	VIIS: Visible and Infrared Information Synthesis for Severe Low-light Image Enhancement	Chen Zhao et.al.	2412.13655	link
2024-12-18	DarkIR: Robust Low-Light Image Restoration	Daniel Feijoo et.al.	2412.13443	link
2024-12-18	Zero-Shot Low Light Image Enhancement with Diffusion Prior	Joshua Cho et.al.	2412.13401	link
2024-12-17	Consistent Diffusion: Denoising Diffusion Model with Data-Consistent Training for Image Restoration	Xinlong Cheng et.al.	2412.12550	null
2024-12-17	Three Things to Know about Deep Metric Learning	Yash Patel et.al.	2412.12432	null
2024-12-16	Expanded Comprehensive Robotic Cholecystectomy Dataset (CRCD)	Ki-Hwan Oh et.al.	2412.12238	link
2024-12-16	Ultra-High-Definition Dynamic Multi-Exposure Image Fusion via Infinite Pixel Learning	Xingchi Chen et.al.	2412.11685	null
2024-12-16	CLIP-SR: Collaborative Linguistic and Image Processing for Super-Resolution	Bingwen Hu et.al.	2412.11609	null
2024-12-15	Leveraging Large Vision-Language Model as User Intent-aware Encoder for Composed Image Retrieval	Zelong Sun et.al.	2412.11087	null
2024-12-15	Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval	Yuanmin Tang et.al.	2412.11077	link
2024-12-15	Towards Context-aware Convolutional Network for Image Restoration	Fangwei Hao et.al.	2412.11008	null
2024-12-14	Boosting ViT-based MRI Reconstruction from the Perspectives of Frequency Modulation, Spatial Purification, and Scale Diversification	Yucong Meng et.al.	2412.10776	null
2024-12-16	Matrix Completion via Residual Spectral Matching	Ziyuan Chen et.al.	2412.10005	null
2024-12-13	$\textrm{A}^{\textrm{2}}$ RNet: Adversarial Attack Resilient Network for Robust Infrared and Visible Image Fusion	Jiawei Li et.al.	2412.09954	link
2024-12-12	OFTSR: One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs	Yuanzhi Zhu et.al.	2412.09465	link
2024-12-13	Are Conditional Latent Diffusion Models Effective for Image Restoration?	Yunchen Yuan et.al.	2412.09324	null
2024-12-13	MVC-VPR: Mutual Learning of Viewpoint Classification and Visual Place Recognition	Qiwen Gu et.al.	2412.09199	null
2024-12-12	ExpRDiff: Short-exposure Guided Diffusion Model for Realistic Local Motion Deblurring	Zhongbao Yang et.al.	2412.09193	null
2024-12-12	Dynamic Contrastive Knowledge Distillation for Efficient Image Restoration	Yunshuai Zhou et.al.	2412.08939	link
2024-12-12	A Flexible Plug-and-Play Module for Generating Variable-Length	Liyang He et.al.	2412.08922	link
2024-12-11	Image Retrieval Methods in the Dissimilarity Space	Madhu Kiran et.al.	2412.08618	null
2024-12-11	Convergence Analysis of a Proximal Stochastic Denoising Regularization Algorithm	Marien Renaud et.al.	2412.08262	null
2024-12-11	Visible and Infrared Image Fusion Using Encoder-Decoder Network	Ferhat Can Ataman et.al.	2412.08073	link
2024-12-11	BSAFusion: A Bidirectional Stepwise Feature Alignment Network for Unaligned Medical Image Fusion	Huafeng Li et.al.	2412.08050	link
2024-12-10	Image Retrieval with Intra-Sweep Representation Learning for Neck Ultrasound Scanning Guidance	Wanwen Chen et.al.	2412.07741	null
2024-12-10	Leveraging Content and Context Cues for Low-Light Image Enhancement	Igor Morawski et.al.	2412.07693	link
2024-12-10	Analytical-Heuristic Modeling and Optimization for Low-Light Image Enhancement	Axel Martinez et.al.	2412.07659	null
2024-12-10	Deep Joint Unrolling for Deblurring and Low-Light Image Enhancement (JUDE).pdf	Tu Vo et.al.	2412.07527	null
2024-12-10	Modeling Dual-Exposure Quad-Bayer Patterns for Joint Denoising and Deblurring	Yuzhi Zhao et.al.	2412.07256	link
2024-12-10	EchoIR: Advancing Image Restoration with Echo Upsampling and Bi-Level Optimization	Yuhan He et.al.	2412.07225	null
2024-12-10	A Progressive Image Restoration Network for High-order Degradation Imaging in Remote Sensing	Yujie Feng et.al.	2412.07195	null
2024-12-09	InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention	Howard Zhang et.al.	2412.06753	null
2024-12-09	EchoSim4D: A Proof-of-Concept Gamified XR Echocardiography Training Simulator for Neonates using 4D Ultrasound Volume	Deepthy Rose Jose et.al.	2412.06271	null
2024-12-08	A Review on Multisensor Data Fusion for Wearable Health Monitoring	Arlene John et.al.	2412.05895	null
2024-12-07	Compositional Image Retrieval via Instruction-Aware Contrastive Learning	Wenliang Zhong et.al.	2412.05756	link
2024-12-07	Enhancing Sample Generation of Diffusion Models using Noise Level Correction	Abulikemu Abuduweili et.al.	2412.05488	null
2024-12-06	Equivariant Denoisers for Image Restoration	Marien Renaud et.al.	2412.05343	null
2024-12-06	ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration	Chi-Wei Hsiao et.al.	2412.05043	null
2024-12-06	DEYOLO: Dual-Feature-Enhancement YOLO for Cross-Modality Object Detection	Yishuo Chen et.al.	2412.04931	link
2024-12-06	DAug: Diffusion-based Channel Augmentation for Radiology Image Retrieval and Classification	Ying Jin et.al.	2412.04828	null
2024-12-06	Modality Decoupling is All You Need: A Simple Solution for Unsupervised Hyperspectral Image Fusion	Songcheng Du et.al.	2412.04802	link
2024-12-05	Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise	Brayan Monroy et.al.	2412.04648	link
2024-12-05	MetaFormer: High-fidelity Metalens Imaging via Aberration Correcting Transformers	Byeonghyeon Lee et.al.	2412.04591	null
2024-12-05	Hipandas: Hyperspectral Image Joint Denoising and Super-Resolution by Image Fusion with the Panchromatic Image	Shuang Xu et.al.	2412.04201	null
2024-12-05	Deep priors for satellite image restoration with accurate uncertainties	Biquard Maud et.al.	2412.04130	null
2024-12-05	Blind Underwater Image Restoration using Co-Operational Regressor Networks	Ozer Can Devecioglu et.al.	2412.03995	null
2024-12-05	LL-ICM: Image Compression for Low-level Machine Vision via Large Vision-Language Model	Yuan Xue et.al.	2412.03841	null
2024-12-05	Exploring Real&Synthetic Dataset and Linear Attention in Image Restoration	Yuzhen Du et.al.	2412.03814	null
2024-12-04	Composed Image Retrieval for Training-Free Domain Conversion	Nikos Efthymiadis et.al.	2412.03297	link
2024-12-04	Task-driven Image Fusion with Learnable Fusion Loss	Haowen Bai et.al.	2412.03240	null
2024-12-04	Semantic Segmentation Prior for Diffusion-Based Real-World Super-Resolution	Jiahua Xiao et.al.	2412.02960	null
2024-12-03	Active Learning via Classifier Impact and Greedy Selection for Interactive Image Retrieval	Leah Bar et.al.	2412.02310	link
2024-12-03	Relaxed and Inertial Nonlinear Forward-Backward with Momentum	Fernando Roldán et.al.	2412.02045	link
2024-12-02	Optimizing Domain-Specific Image Retrieval: A Benchmark of FAISS and Annoy with Fine-Tuned Features	MD Shaikh Rahman et.al.	2412.01555	null
2024-12-02	Phaseformer: Phase-based Attention Mechanism for Underwater Image Restoration and Beyond	MD Raqib Khan et.al.	2412.01456	link
2024-12-02	FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration	Hao Li et.al.	2412.01427	null
2024-12-02	Neuron Abandoning Attention Flow: Visual Explanation of Dynamics inside CNN Models	Yi Liao et.al.	2412.01202	null
2024-12-01	Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration	Haoze Sun et.al.	2412.00878	null
2024-12-01	DMFourLLIE: Dual-Stage and Multi-Branch Fourier Network for Low-Light Image Enhancement	Tongshun Zhang et.al.	2412.00683	link
2024-12-01	MambaNUT: Nighttime UAV Tracking via Mamba and Adaptive Curriculum Learning	You Wu et.al.	2412.00626	link
2024-11-30	Blind Inverse Problem Solving Made Easy by Text-to-Image Latent Diffusion	Michail Dontas et.al.	2412.00557	null
2024-11-29	Self-Supervised Denoiser Framework	Emilien Valat et.al.	2411.19593	null
2024-11-27	Optimizing Image Retrieval with an Extended b-Metric Space	Abdelkader Belhenniche et.al.	2411.18800	null
2024-11-27	Hierarchical Information Flow for Generalized Efficient Image Restoration	Yawei Li et.al.	2411.18588	null
2024-11-27	Complexity Experts are Task-Discriminative Learners for Any Image Restoration	Eduard Zamfir et.al.	2411.18466	null
2024-11-27	Adaptive Blind All-in-One Image Restoration	David Serrano-Lozano et.al.	2411.18412	link
2024-11-29	HUPE: Heuristic Underwater Perceptual Enhancement with Semantic Collaborative Learning	Zengxi Zhang et.al.	2411.18296	link
2024-11-27	TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution	Linwei Dong et.al.	2411.18263	link
2024-12-02	Pixel-aligned RGB-NIR Stereo Imaging and Dataset for Robot Vision	Jinnyeong Kim et.al.	2411.18025	null
2024-11-26	Low-rank Adaptation-based All-Weather Removal for Autonomous Navigation	Sudarshan Rajagopalan et.al.	2411.17814	null
2024-11-26	GenDeg: Diffusion-Based Degradation Synthesis for Generalizable All-in-One Image Restoration	Sudarshan Rajagopalan et.al.	2411.17687	null
2024-11-26	Learning Visual Hierarchies with Hyperbolic Embeddings	Ziwei Wang et.al.	2411.17490	null
2024-11-26	Puzzle Similarity: A Perceptually-guided No-Reference Metric for Artifact Detection in 3D Scene Reconstructions	Nicolai Hermann et.al.	2411.17489	null
2024-11-26	MWFormer: Multi-Weather Image Restoration Using Degradation-Aware Transformers	Ruoxi Zhu et.al.	2411.17226	link
2024-11-25	Mixed Degradation Image Restoration via Local Dynamic Optimization and Conditional Embedding	Yubin Gu et.al.	2411.16217	null
2024-11-25	U2NeRF: Unsupervised Underwater Image Restoration and Neural Radiance Fields	Vinayak Gupta et.al.	2411.16172	null
2024-11-25	Image Generation Diversity Issues and How to Tame Them	Mischa Dombrowski et.al.	2411.16171	link
2024-11-24	PromptHSI: Universal Hyperspectral Image Restoration Framework for Composite Degradation	Chia-Ming Lee et.al.	2411.15922	link
2024-11-24	MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking	Chunhui Zhang et.al.	2411.15761	link
2024-11-24	LTCF-Net: A Transformer-Enhanced Dual-Channel Fourier Framework for Low-Light Image Restoration	Gaojing Zhang et.al.	2411.15740	null
2024-11-22	Frequency-Guided Posterior Sampling for Diffusion-Based Image Restoration	Darshan Thaker et.al.	2411.15295	null
2024-11-22	MambaIRv2: Attentive State Space Restoration	Hang Guo et.al.	2411.15269	link
2024-11-22	Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval	Zengbao Sun et.al.	2411.14704	link
2024-11-21	Unveiling the Hidden: A Comprehensive Evaluation of Underwater Image Enhancement and Its Impact on Object Detection	Ali Awad et.al.	2411.14626	link
2024-11-21	Zero-Shot Low-Light Image Enhancement via Joint Frequency Domain Priors Guided Diffusion	Jinhong He et.al.	2411.13961	link
2024-11-20	Analysis and Synthesis Denoisers for Forward-Backward Plug-and-Play Algorithms	Matthieu Kowalski et.al.	2411.13276	null
2024-11-20	Globally Correlation-Aware Hard Negative Generation	Wenjie Peng et.al.	2411.13145	link
2024-11-19	Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution	Yang Zou et.al.	2411.12530	link
2024-11-19	Frequency-Aware Guidance for Blind Image Restoration via Diffusion Models	Jun Xiao et.al.	2411.12450	null
2024-11-19	Versatile Cataract Fundus Image Restoration Model Utilizing Unpaired Cataract and High-quality Images	Zheng Gong et.al.	2411.12278	null
2024-11-16	GeoGround: A Unified Large Vision-Language Model. for Remote Sensing Visual Grounding	Yue Zhou et.al.	2411.11904	link
2024-11-18	Edge-Enhanced Dilated Residual Attention Network for Multimodal Medical Image Fusion	Meng Zhou et.al.	2411.11799	link
2024-11-18	Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment	Zhendong Liu et.al.	2411.11543	null
2024-11-17	Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method	Yan Zheng et.al.	2411.11135	null
2024-11-19	TSFormer: A Robust Framework for Efficient UHD Image Restoration	Xin Su et.al.	2411.10951	null
2024-11-16	AllRestorer: All-in-One Transformer for Image Restoration under Composite Degradations	Jiawei Mao et.al.	2411.10708	null
2024-11-16	Underwater Image Enhancement with Cascaded Contrastive Learning	Yi Liu et.al.	2411.10682	link
2024-11-16	SPDFusion: An Infrared and Visible Image Fusion Network Based on a Non-Euclidean Representation of Riemannian Manifolds	Huan Kang et.al.	2411.10679	null
2024-11-15	Probabilistic Prior Driven Attention Mechanism Based on Diffusion Model for Imaging Through Atmospheric Turbulence	Guodong Sun et.al.	2411.10321	null
2024-11-15	Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting	Ziqi Xie et.al.	2411.10309	link
2024-11-15	Rethinking Normalization Strategies and Convolutional Kernels for Multimodal Image Fusion	Dan He et.al.	2411.10036	null
2024-11-14	Instruction-Driven Fusion of Infrared-Visible Images: Tailoring for Diverse Downstream Tasks	Zengyi Yang et.al.	2411.09387	null
2024-11-13	Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval	Saul Santos et.al.	2411.08590	link
2024-11-13	Saliency Map-based Image Retrieval using Invariant Krawtchouk Moments	Ashkan Nejad et.al.	2411.08567	link
2024-11-12	CT-Mamba: A Hybrid Convolutional State Space Model for Low-Dose CT Denoising	Linxuan Li et.al.	2411.07930	link
2024-11-12	Joint multi-dimensional dynamic attention and transformer for general image restoration	Huan Zhang et.al.	2411.07893	link
2024-11-12	All-in-one Weather-degraded Image Restoration via Adaptive Degradation-aware Self-prompting Model	Yuanbo Wen et.al.	2411.07445	null
2024-11-11	Multi-scale Frequency Enhancement Network for Blind Image Deblurring	Yawen Xiang et.al.	2411.06893	null
2024-11-10	Dropout the High-rate Downsampling: A Novel Design Paradigm for UHD Image Restoration	Chen Wu et.al.	2411.06456	null
2024-11-08	A Modular Conditional Diffusion Framework for Image Reconstruction	Magauiya Zhussip et.al.	2411.05993	null
2024-11-05	From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing	Xintian Sun et.al.	2411.05826	null
2024-11-07	Dynamic Brightness Adaptation for Robust Multi-modal Image Fusion	Yiming Sun et.al.	2411.04697	link
2024-11-07	l0-Regularized Sparse Coding-based Interpretable Network for Multi-Modal Image Fusion	Gargi Panda et.al.	2411.04519	null
2024-11-05	Test-Time Dynamic Image Fusion	Bing Cao et.al.	2411.02840	link
2024-11-05	ERUP-YOLO: Enhancing Object Detection Robustness for Adverse Weather Condition by Unified Image-Adaptive Processing	Yuka Ogino et.al.	2411.02799	null
2024-11-04	TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives	Maitreya Patel et.al.	2411.02545	null
2024-11-11	INQUIRE: A Natural World Text-to-Image Retrieval Benchmark	Edward Vendrow et.al.	2411.02537	link
2024-11-04	Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models	Sharat Agarwal et.al.	2411.01925	null
2024-11-03	Degradation-Aware Residual-Conditioned Optimal Transport for Unified Image Restoration	Xiaole Tang et.al.	2411.01656	link
2024-11-03	Conditional Controllable Image Fusion	Bing Cao et.al.	2411.01573	link
2024-11-03	Efficient Medical Image Retrieval Using DenseNet and FAISS for BIRADS Classification	MD Shaikh Rahman et.al.	2411.01473	null
2024-11-03	TPOT: Topology Preserving Optimal Transport in Retinal Fundus Image Enhancement	Xuanzhao Dong et.al.	2411.01403	link
2024-11-02	Medical X-Ray Image Enhancement Using Global Contrast-Limited Adaptive Histogram Equalization	Sohrab Namazi Nia et.al.	2411.01373	null
2024-11-01	Identifying Implicit Social Biases in Vision-Language Models	Kimia Hamidieh et.al.	2411.00997	null
2024-10-31	Aquatic-GS: A Hybrid 3D Representation for Underwater Scenes	Shaohua Liu et.al.	2411.00239	null
2024-10-31	Chasing Better Deep Image Priors between Over- and Under-parameterization	Qiming Wu et.al.	2410.24187	link
2024-10-31	Nearest Neighbor Normalization Improves Multimodal Retrieval	Neil Chowdhury et.al.	2410.24114	link
2024-10-31	Image Synthesis with Class-Aware Semantic Diffusion Models for Surgical Scene Segmentation	Yihang Zhou et.al.	2410.23962	null
2024-10-31	Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model	Hao Zhang et.al.	2410.23905	link
2024-10-31	MoTaDual: Modality-Task Dual Alignment for Enhanced Zero-shot Composed Image Retrieval	Haiwen Li et.al.	2410.23736	null
2024-10-31	Cycle-Constrained Adversarial Denoising Convolutional Network for PET Image Denoising: Multi-Dimensional Validation on Large Datasets with Reader Study and Real Low-Dose Data	Yucun Hou et.al.	2410.23628	null
2024-10-31	MS-Glance: Non-semantic context vectors and the applications in supervising image reconstruction	Ziqi Gao et.al.	2410.23577	link
2024-10-30	Decoupling Semantic Similarity from Spatial Alignment for Neural Networks	Tassilo Wald et.al.	2410.23107	link
2024-10-30	EnsIR: An Ensemble Algorithm for Image Restoration via Gaussian Mixture Models	Shangquan Sun et.al.	2410.22959	link
2024-10-30	SFDFusion: An Efficient Spatial-Frequency Domain Fusion Network for Infrared and Visible Image Fusion	Kun Hu et.al.	2410.22837	link
2024-10-30	Analyzing Noise Models and Advanced Filtering Algorithms for Image Enhancement	Sahil Ali Akbar et.al.	2410.21946	link
2024-10-29	Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications	Monica Riedler et.al.	2410.21943	link
2024-10-28	Kandinsky 3: Text-to-Image Synthesis for Multifunctional Generative Framework	Vladimir Arkhipkin et.al.	2410.21061	link
2024-10-27	Wavelet-based Mamba with Fourier Adjustment for Low-light Image Enhancement	Junhao Tan et.al.	2410.20314	link
2024-10-27	Deep Learning, Machine Learning -- Digital Signal and Image Processing: From Theory to Application	Weiche Hsieh et.al.	2410.20304	null
2024-10-24	HUE Dataset: High-Resolution Event and Frame Sequences for Low-Light Vision	Burak Ercan et.al.	2410.19164	null
2024-10-24	ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval	Zijia Zhao et.al.	2410.18715	link
2024-10-29	DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation	Yuang Ai et.al.	2410.18666	link
2024-10-23	DREB-Net: Dual-stream Restoration Embedding Blur-feature Fusion Network for High-mobility UAV Object Detection	Qingpeng Li et.al.	2410.17822	link
2024-10-23	An Intelligent Agentic System for Complex Image Restoration Problems	Kaiwen Zhu et.al.	2410.17809	link
2024-10-23	A variational approach to nonlocal image restoration flows	Harsh Prasad et.al.	2410.17649	null
2024-10-23	Diffusion Priors for Variational Likelihood Estimation and Image Denoising	Jun Cheng et.al.	2410.17521	link
2024-10-22	Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval	Yuanmin Tang et.al.	2410.17393	null
2024-10-20	LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration	Yuang Ai et.al.	2410.15385	link
2024-10-20	GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning	Haiwen Diao et.al.	2410.15266	link
2024-10-19	A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends	Junjun Jiang et.al.	2410.15067	link
2024-10-19	Visual Navigation of Digital Libraries: Retrieval and Classification of Images in the National Library of Norway's Digitised Book Collection	Marie Roald et.al.	2410.14969	link
2024-10-16	Development of Image Collection Method Using YOLO and Siamese Network	Chan Young Shin et.al.	2410.12561	null
2024-10-16	Towards Flexible and Efficient Diffusion Low Light Enhancer	Guanzhou Lan et.al.	2410.12346	null
2024-10-16	Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond	Pengwei Liang et.al.	2410.12274	null
2024-10-15	Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos	Zhouxia Wang et.al.	2410.11828	null
2024-10-15	LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images	Yuzhou Cheng et.al.	2410.11505	null
2024-10-13	Fusion Based Hand Geometry Recognition Using Dempster-Shafer Theory	Asish Bera et.al.	2410.09842	null
2024-10-13	LoLI-Street: Benchmarking Low-Light Image Enhancement and Beyond	Md Tanvir Islam et.al.	2410.09831	link
2024-10-14	LIME-Eval: Rethinking Low-light Image Enhancement Evaluation via Object Detection	Mingjia Li et.al.	2410.08810	link
2024-10-11	Chain-of-Restoration: Multi-Task Image Restoration Models are Zero-Shot Step-by-Step Universal Image Restorers	Jin Cao et.al.	2410.08688	link
2024-10-16	Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP	Eunji Kim et.al.	2410.08469	null
2024-10-11	A Unified Deep Semantic Expansion Framework for Domain-Generalized Person Re-identification	Eugene P. W. Ang et.al.	2410.08456	null
2024-10-10	TANet: Triplet Attention Network for All-In-One Adverse Weather Image Restoration	Hsing-Hua Wang et.al.	2410.08177	link
2024-10-10	A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks	Hoin Jung et.al.	2410.07593	link
2024-10-09	Exploiting Distribution Constraints for Scalable and Efficient Image Retrieval	Mohammad Omama et.al.	2410.07022	null
2024-10-09	Rethinking the Evaluation of Visible and Infrared Image Fusion	Dayan Guan et.al.	2410.06811	link
2024-10-09	InstantIR: Blind Image Restoration with Instant Generative Reference	Jen-Yuan Huang et.al.	2410.06551	null
2024-10-09	MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging	Noel C. F. Codella et.al.	2410.06542	null
2024-10-08	Temporal Image Caption Retrieval Competition -- Description and Results	Jakub Pokrywka et.al.	2410.06314	null
2024-10-08	GSLoc: Visual Localization with 3D Gaussian Splatting	Kazii Botashev et.al.	2410.06165	null
2024-10-08	Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning	Ayush Singh et.al.	2410.05928	null
2024-10-08	ReFIR: Grounding Large Restoration Models with Retrieval Augmentation	Hang Guo et.al.	2410.05601	link
2024-10-09	LoTLIP: Improving Language-Image Pre-training for Long Text Understanding	Wei Wu et.al.	2410.05249	null
2024-10-07	Learning Efficient and Effective Trajectories for Differential Equation-based Image Restoration	Zhiyu Zhu et.al.	2410.04811	link
2024-10-06	Generalizability analysis of deep learning predictions of human brain responses to augmented and semantically novel visual stimuli	Valentyn Piskovskyi et.al.	2410.04497	null
2024-10-06	SITCOM: Step-wise Triple-Consistent Diffusion Sampling for Inverse Problems	Ismail Alkhouri et.al.	2410.04479	link
2024-10-05	Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model	Keda Tao et.al.	2410.04161	null
2024-10-04	Diffusion State-Guided Projected Gradient for Inverse Problems	Rayhan Zirvi et.al.	2410.03463	link
2024-10-03	PnP-Flow: Plug-and-Play Image Restoration with Flow Matching	Ségolène Martin et.al.	2410.02423	link
2024-10-03	Can Capacitive Touch Images Enhance Mobile Keyboard Decoding?	Piyawat Lertvittayakumjorn et.al.	2410.02264	link
2024-10-02	Posterior sampling via Langevin dynamics based on generative priors	Vishal Purohit et.al.	2410.02078	null
2024-10-03	EUFCC-CIR: a Composed Image Retrieval Dataset for GLAM Collections	Francesc Net et.al.	2410.01536	link
2024-10-04	CSIM: A Copula-based similarity index sensitive to local changes for Image quality assessment	Safouane El Ghazouali et.al.	2410.01411	link
2024-10-01	Three-Operator Splitting Method with Two-Step Inertial Extrapolation	Olaniyi S. Iyiola et.al.	2410.01099	null
2024-10-01	GMT: Enhancing Generalizable Neural Rendering via Geometry-Driven Multi-Reference Texture Transfer	Youngho Yoon et.al.	2410.00672	link
2024-10-01	Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration	Guy Ohayon et.al.	2410.00418	link
2024-10-01	GLMHA A Guided Low-rank Multi-Head Self-Attention for Efficient Image Restoration and Spectral Reconstruction	Zaid Ilyas et.al.	2410.00380	null
2024-09-30	Class-Agnostic Visio-Temporal Scene Sketch Semantic Segmentation	Aleyna Kütük et.al.	2410.00266	null
2024-09-30	A Survey on Diffusion Models for Inverse Problems	Giannis Daras et.al.	2410.00083	null
2024-09-30	UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation	Cheng Zhang et.al.	2409.20197	link
2024-09-29	Underwater Organism Color Enhancement via Color Code Decomposition, Adaptation and Interpolation	Xiaofeng Cong et.al.	2409.19685	link
2024-09-28	Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration	Chu-Jie Qin et.al.	2409.19403	link
2024-09-28	VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition	Ahmad Khaliq et.al.	2409.19293	link
2024-09-28	PDCFNet: Enhancing Underwater Images through Pixel Difference Convolution	Song Zhang et.al.	2409.19269	link
2024-09-28	Extending Depth of Field for Varifocal Multiview Images	Zhilong Li et.al.	2409.19220	null
2024-09-27	MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion	Bardienus Duisterhof et.al.	2409.19152	null
2024-09-27	Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors	Yunlong Lin et.al.	2409.18899	null
2024-09-26	Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval	Mankeerat Sidhu et.al.	2409.18733	null
2024-09-27	Multi-modal Medical Image Fusion For Non-Small Cell Lung Cancer Classification	Salma Hassan et.al.	2409.18715	null
2024-09-27	Underwater Image Enhancement with Physical-based Denoising Diffusion Implicit Models	Nguyen Gia Bach et.al.	2409.18476	link
2024-09-27	SinoSynth: A Physics-based Domain Randomization Approach for Generalizable CBCT Image Enhancement	Yunkui Pang et.al.	2409.18355	link
2024-09-26	Toward Efficient Deep Blind RAW Image Restoration	Marcos V. Conde et.al.	2409.18204	link
2024-09-26	Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs	Qinpeng Cui et.al.	2409.17778	link
2024-09-25	Morphological-consistent Diffusion Network for Ultrasound Coronal Image Enhancement	Yihao Zhou et.al.	2409.16661	null
2024-09-25	Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement	Guanlin Li et.al.	2409.16604	link
2024-09-24	Proactive Schemes: A Survey of Adversarial Attacks for Social Good	Vishal Asnani et.al.	2409.16491	null
2024-09-24	Liger at W.M. Keck Observatory: imager structural analysis, fabrication, and characterization plan	James Wiley et.al.	2409.16263	null
2024-09-23	PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions	Weifeng Lin et.al.	2409.15278	link
2024-09-23	FusionRF: High-Fidelity Satellite Neural Radiance Fields from Multispectral and Panchromatic Acquisitions	Michael Sprintson et.al.	2409.15132	null
2024-09-22	Low-Light Enhancement Effect on Classification and Detection: An Empirical Study	Xu Wu et.al.	2409.14461	null
2024-09-22	Quantitative and Qualitative Evaluation of NLM and Wavelet Methods in Image Enhancement	Cameron Khanpour et.al.	2409.14334	null
2024-09-20	Efficient and Discriminative Image Feature Extraction for Universal Image Retrieval	Morris Florek et.al.	2409.13513	link
2024-09-19	Deep Learning-Based Detection of Referable Diabetic Retinopathy and Macular Edema Using Ultra-Widefield Fundus Imaging	Philippe Zhang et.al.	2409.12854	null
2024-09-19	Fundus image enhancement through direct diffusion bridges	Sehui Kim et.al.	2409.12377	link
2024-09-18	Denoising diffusion models for high-resolution microscopy image restoration	Pamela Osuna-Vargas et.al.	2409.12078	null
2024-09-18	DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image Fusion	Jian Xu et.al.	2409.11642	link
2024-09-17	Ultrasound Image Enhancement with the Variance of Diffusion Models	Yuxin Zhang et.al.	2409.11380	link
2024-09-17	Improving the Efficiency of Visually Augmented Language Models	Paula Ontalvilla et.al.	2409.11148	link
2024-09-17	CUNSB-RFIE: Context-aware Unpaired Neural Schrödinger Bridge in Retinal Fundus Image Enhancement	Xuanzhao Dong et.al.	2409.10966	link
2024-09-16	Taming Diffusion Models for Image Restoration: A Review	Ziwei Luo et.al.	2409.10353	null
2024-09-17	Fuse4Seg: Image-Level Fusion Based Multi-Modality Medical Image Segmentation	Yuchen Guo et.al.	2409.10328	null
2024-09-16	Garment Attribute Manipulation with Multi-level Attention	Vittorio Casula et.al.	2409.10206	null
2024-09-16	DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion	Yuchen Guo et.al.	2409.10080	null
2024-09-15	Underwater Image Enhancement via Dehazing and Color Restoration	Chengqin Wu et.al.	2409.09779	null
2024-09-15	Unsupervised Hyperspectral and Multispectral Image Blind Fusion Based on Deep Tucker Decomposition Network with Spatial-Spectral Manifold Learning	He Wang et.al.	2409.09670	link
2024-09-14	Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image Retrieval	Amirreza Mahbod et.al.	2409.09430	link
2024-09-14	Infrared and Visible Image Fusion with Hierarchical Human Perception	Guang Yang et.al.	2409.09291	null
2024-09-12	Context-Aware Optimal Transport Learning for Retinal Fundus Image Enhancement	Vamsi Krishna Vasa et.al.	2409.07862	null
2024-09-12	Quaternion Nuclear Norm minus Frobenius Norm Minimization for color image reconstruction	Yu Guo et.al.	2409.07797	null
2024-09-11	FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process	Yang Luo et.al.	2409.07451	null
2024-09-11	Retinex-RAWMamba: Bridging Demosaicing and Denoising for Low-Light RAW Image Enhancement	Xianmin Chen et.al.	2409.07040	link
2024-09-11	PanAdapter: Two-Stage Fine-Tuning with Spatial-Spectral Priors Injecting for Pansharpening	RuoCheng Wu et.al.	2409.06980	null
2024-09-10	Modeling Image Tone Dichotomy with the Power Function	Axel Martinez et.al.	2409.06764	null
2024-09-10	Lightweight Multiscale Feature Fusion Super-Resolution Network Based on Two-branch Convolution and Transformer	Li Ke et.al.	2409.06590	null
2024-09-10	Unrevealed Threats: A Comprehensive Study of the Adversarial Robustness of Underwater Image Enhancement Models	Siyu Zhai et.al.	2409.06420	null
2024-09-10	A Cross-Font Image Retrieval Network for Recognizing Undeciphered Oracle Bone Inscriptions	Zhicong Wu et.al.	2409.06381	null
2024-09-10	Multi-Weather Image Restoration via Histogram-Based Transformer Feature Enhancement	Yang Wen et.al.	2409.06334	null
2024-09-10	AgileIR: Memory-Efficient Group Shifted Windows Attention for Agile Image Restoration	Hongyi Cai et.al.	2409.06206	null
2024-09-09	Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding	Bram Willemsen et.al.	2409.05721	link
2024-09-09	Open-World Dynamic Prompt and Continual Visual Representation Learning	Youngeun Kim et.al.	2409.05312	null
2024-09-09	Rethinking the Atmospheric Scattering-driven Attention via Channel and Gamma Correction Priors for Low-Light Image Enhancement	Shyang-En Weng et.al.	2409.05274	link
2024-09-07	Training-free ZS-CIR via Weighted Modality Fusion and Similarity	Ren-Di Wu et.al.	2409.04918	link
2024-09-07	Power Line Aerial Image Restoration under dverse Weather: Datasets and Baselines	Sai Yang et.al.	2409.04812	link
2024-09-06	Zero-Shot Whole Slide Image Retrieval in Histopathology Using Embeddings of Foundation Models	Saghir Alfasly et.al.	2409.04631	null
2024-09-06	Empirical Bayesian image restoration by Langevin sampling with a denoising diffusion implicit prior	Charlesquin Kemajou Mbakam et.al.	2409.04384	null
2024-09-06	RCNet: Deep Recurrent Collaborative Network for Multi-View Low-Light Image Enhancement	Hao Luo et.al.	2409.04363	link
2024-09-06	Secure Traffic Sign Recognition: An Attention-Enabled Universal Image Inpainting Mechanism against Light Patch Attacks	Hangcheng Cao et.al.	2409.04133	null
2024-09-05	Data-free Distillation with Degradation-prompt Diffusion for Multi-weather Image Restoration	Pei Wang et.al.	2409.03455	null
2024-09-05	KAN See In the Dark	Aoxiang Ning et.al.	2409.03404	link
2024-09-05	Multiple weather images restoration using the task transformer and adaptive mixup strategy	Yang Wen et.al.	2409.03249	null
2024-09-05	Why mamba is effective? Exploit Linear Transformer-Mamba Network for Multi-Modality Image Fusion	Chenguang Zhu et.al.	2409.03223	null
2024-09-05	Perceptual-Distortion Balanced Image Super-Resolution is a Multi-Objective Optimization Problem	Qiwen Zhu et.al.	2409.03179	link
2024-09-04	Design and Evaluation of Camera-Centric Mobile Crowdsourcing Applications	Abby Stylianou et.al.	2409.03012	null
2024-09-04	Multi-Head Attention Residual Unfolded Network for Model-Based Pansharpening	Ivan Pereira-Sánchez et.al.	2409.02675	link
2024-09-04	NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval	Sepanta Zeighami et.al.	2409.02343	link
2024-09-03	Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models	Jiaqi Xu et.al.	2409.02101	link
2024-09-03	F2former: When Fractional Fourier Meets Deep Wiener Deconvolution and Selective Frequency Transformer for Image Deblurring	Subhajit Paul et.al.	2409.02056	null
2024-09-03	AllWeatherNet:Unified Image enhancement for autonomous driving under adverse weather and lowlight-conditions	Chenghao Qian et.al.	2409.02045	link
2024-09-03	Optimizing CLIP Models for Image Retrieval with Maintained Joint-Embedding Alignment	Konstantin Schall et.al.	2409.01936	link
2024-09-03	Shuffle Mamba: State Space Models with Random Shuffle for Multi-Modal Image Fusion	Ke Cao et.al.	2409.01728	null
2024-09-03	Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement	Kun Zhou et.al.	2409.01641	link
2024-09-03	GaussianPU: A Hybrid 2D-3D Upsampling Framework for Enhancing Color Point Clouds via 3D Gaussian Splatting	Zixuan Guo et.al.	2409.01581	null
2024-09-02	A Review of Image Retrieval Techniques: Data Augmentation and Adversarial Learning Approaches	Kim Jinwoo et.al.	2409.01219	null
2024-08-30	Enhancing Underwater Imaging with 4-D Light Fields: Dataset and Method	Yuji Lin et.al.	2408.17339	link
2024-09-02	RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance	Avideep Mukherjee et.al.	2408.17095	null
2024-08-30	Efficient Image Restoration through Low-Rank Adaptation and Stable Diffusion XL	Haiyang Zhao et.al.	2408.17060	null
2024-08-29	GameIR: A Large-Scale Synthesized Ground-Truth Dataset for Image Restoration over Gaming Content	Lebin Zhou et.al.	2408.16866	null
2024-09-02	A Deep-Learning-Based Label-free No-Reference Image Quality Assessment Metric: Application in Sodium MRI Denoising	Shuaiyu Yuan et.al.	2408.16481	null
2024-08-29	Enhanced Control for Diffusion Bridge in Image Restoration	Conghan Yue et.al.	2408.16303	link
2024-08-29	Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models	Kengo Nakata et.al.	2408.16296	null
2024-08-29	LMT-GP: Combined Latent Mean-Teacher and Gaussian Process for Semi-supervised Low-light Image Enhancement	Ye Yu et.al.	2408.16235	link
2024-08-28	Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration	Xu Zhang et.al.	2408.15994	null
2024-08-28	MMDRFuse: Distilled Mini-Model with Dynamic Refresh for Multi-Modality Image Fusion	Yanglin Deng et.al.	2408.15641	link
2024-08-28	Temporal Attention for Cross-View Sequential Image Localization	Dong Yuan et.al.	2408.15569	link
2024-08-27	A Preliminary Exploration Towards General Image Restoration	Xiangtao Kong et.al.	2408.15143	null
2024-08-27	Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild	Tianqi Wei et.al.	2408.14723	null
2024-08-26	FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation	Daixun Li et.al.	2408.13980	null
2024-08-25	LowCLIP: Adapting the CLIP Model Architecture for Low-Resource Languages in Multimodal Image Retrieval Task	Ali Asgarov et.al.	2408.13909	link
2024-08-23	O-Mamba: O-shape State-Space Model for Underwater Image Enhancement	Chenyu Dong et.al.	2408.12816	link
2024-08-22	CODE: Confident Ordinary Differential Editing	Bastien van Delft et.al.	2408.12418	link
2024-08-22	Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement	Lingyu Zhu et.al.	2408.12316	link
2024-08-21	Visual Localization in 3D Maps: Comparing Point Cloud, Mesh, and NeRF Representations	Lintong Zhang et.al.	2408.11966	null
2024-08-21	OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal	Qiao Mo et.al.	2408.11480	link
2024-08-21	UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation	Xiangyu Zhao et.al.	2408.11305	link
2024-08-21	Taming Generative Diffusion for Universal Blind Image Restoration	Siwei Tu et.al.	2408.11287	null
2024-08-20	Prompt-Guided Image-Adaptive Neural Implicit Lookup Tables for Interpretable Image Enhancement	Satoshi Kosugi et.al.	2408.11055	link
2024-08-20	SDI-Net: Toward Sufficient Dual-View Interaction for Low-light Stereo Image Enhancement	Linlin Hu et.al.	2408.10934	null
2024-08-20	UIE-UnFold: Deep Unfolding Network with Color Priors and Vision Transformer for Underwater Image Enhancement	Yingtie Lei et.al.	2408.10653	link
2024-08-19	BrewCLIP: A Bifurcated Representation Learning Framework for Audio-Visual Retrieval	Zhenyu Lu et.al.	2408.10383	null
2024-08-19	Multi-Scale Representation Learning for Image Restoration with State-Space Model	Yuhong He et.al.	2408.10145	null
2024-08-19	Harnessing Multi-resolution and Multi-scale Attention for Underwater Image Restoration	Alik Pramanick et.al.	2408.09912	link
2024-08-19	Fashion Image-to-Image Translation for Complementary Item Retrieval	Matteo Attimonelli et.al.	2408.09847	link
2024-08-19	ExpoMamba: Exploiting Frequency SSM Blocks for Efficient and Effective Image Enhancement	Eashan Adhikarla et.al.	2408.09650	link
2024-08-17	Re-boosting Self-Collaboration Parallel Prompt GAN for Unsupervised Image Restoration	Xin Lin et.al.	2408.09241	link
2024-08-16	DFT-Based Adversarial Attack Detection in MRI Brain Imaging: Enhancing Diagnostic Accuracy in Alzheimer's Case Studies	Mohammad Hossein Najafi et.al.	2408.08489	null
2024-08-15	Unsupervised Variational Translator for Bridging Image Restoration and High-Level Vision Tasks	Jiawei Wu et.al.	2408.08149	link
2024-08-15	HAIR: Hypernetworks-based All-in-One Image Restoration	Jin Cao et.al.	2408.08091	link
2024-08-15	DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions	Ryosuke Korekata et.al.	2408.07910	null
2024-08-13	Review Learning: Advancing All-in-One Ultra-High-Definition Image Restoration Training Method	Xin Su et.al.	2408.06709	null
2024-08-12	Wavelet based inpainting detection	Barglazan Adrian-Alin et.al.	2408.06429	null
2024-08-12	Latent Disentanglement for Low Light Image Enhancement	Zhihao Zheng et.al.	2408.06245	null
2024-08-10	Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network	Junyan Ye et.al.	2408.05475	link
2024-08-10	Greedy randomized block Kaczmarz method for matrix equation AXB=C and its applications in color image restoration	Wenli Wang et.al.	2408.05444	null
2024-08-08	Physical prior guided cooperative learning framework for joint turbulence degradation estimation and infrared video restoration	Ziran Zhang et.al.	2408.04227	null
2024-08-08	MultiColor: Image Colorization by Learning from Multiple Color Spaces	Xiangcheng Du et.al.	2408.04172	null
2024-08-06	AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval	Pavel Suma et.al.	2408.03282	link
2024-08-05	Multi-weather Cross-view Geo-localization Using Denoising Diffusion Models	Tongtong Feng et.al.	2408.02408	null
2024-08-02	On Validation of Search & Retrieval of Tissue Images in Digital Pathology	H. R. Tizhoosh et.al.	2408.01570	null
2024-08-02	Underwater Object Detection Enhancement via Channel Stabilization	Muhammad Ali et.al.	2408.01293	link
2024-08-02	Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement	Wenbin Zou et.al.	2408.01276	link
2024-08-02	Contribution-based Low-Rank Adaptation with Pre-training Model for Real Image Restoration	Donwon Park et.al.	2408.01099	null
2024-08-02	FCDFusion: a Fast, Low Color Deviation Method for Fusing Visible and Infrared Image Pairs	Hesong Li et.al.	2408.01080	null
2024-08-01	A Prior Embedding-Driven Architecture for Long Distance Blind Iris Recognition	Qi Xiong et.al.	2408.00210	null
2024-07-30	UniProcessor: A Text-induced Unified Low-level Image Processor	Huiyu Duan et.al.	2407.20928	link
2024-07-27	Inverse Problems with Diffusion Models: A MAP Estimation Perspective	Sai bharath chandra Gutha et.al.	2407.20784	link
2024-07-29	ALEN: A Dual-Approach for Uniform and Non-Uniform Low-Light Image Enhancement	Ezequiel Perez-Zarate et.al.	2407.19708	link
2024-07-31	Mamba-UIE: Enhancing Underwater Images with Physical Model Constraint	Song Zhang et.al.	2407.19248	null
2024-07-27	Multi-Expert Adaptive Selection: Task-Balancing for All-in-One Image Restoration	Xiaoyan Yu et.al.	2407.19139	link
2024-07-26	Dilated Strip Attention Network for Image Restoration	Fangwei Hao et.al.	2407.18613	null
2024-07-25	RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models	Haoyu Chen et.al.	2407.18035	null
2024-07-25	Joint RGB-Spectral Decomposition Model Guided Image Enhancement in Mobile Photography	Kailai Zhou et.al.	2407.17996	link
2024-07-23	S-E Pipeline: A Vision Transformer (ViT) based Resilient Classification Pipeline for Medical Imaging Against Adversarial Attacks	Neha A S et.al.	2407.17587	null
2024-07-24	Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation	Yongqi Li et.al.	2407.17274	null
2024-07-23	CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction	Liang Zhao et.al.	2407.16204	null
2024-07-23	Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems	Sojin Lee et.al.	2407.16125	link
2024-07-20	Deep Learning CT Image Restoration using System Blur and Noise Models	Yijie Yuan et.al.	2407.14983	null
2024-07-23	AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement	Yunlong Lin et.al.	2407.14900	null
2024-07-20	Dual High-Order Total Variation Model for Underwater Image Restoration	Yuemei Li et.al.	2407.14868	link
2024-07-19	Adaptive Frequency Enhancement Network for Single Image Deraining	Fei Yan et.al.	2407.14292	null
2024-07-19	Double-Shot 3D Shape Measurement with a Dual-Branch Network	Mingyang Lei et.al.	2407.14198	null
2024-07-19	TaGAT: Topology-Aware Graph Attention Network For Multi-modal Retinal Image Fusion	Xin Tian et.al.	2407.14188	link
2024-07-18	Visual Haystacks: Answering Harder Questions About Sets of Images	Tsung-Han Wu et.al.	2407.13766	link
2024-07-18	Any Image Restoration with Efficient Automatic Degradation Adaptation	Bin Ren et.al.	2407.13372	link
2024-07-18	Training-Free Large Model Priors for Multiple-in-One Image Restoration	Xuanhua He et.al.	2407.13181	null
2024-07-18	Unified-EGformer: Exposure Guided Lightweight Transformer for Mixed-Exposure Image Enhancement	Eashan Adhikarla et.al.	2407.13170	null
2024-07-21	HPPP: Halpern-type Preconditioned Proximal Point Algorithms and Applications to Image Restoration	Shuchang Zhang et.al.	2407.13120	link
2024-07-17	Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations	Tomáš Chobola et.al.	2407.12511	link
2024-07-17	GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval	Han Zhou et.al.	2407.12431	link
2024-07-17	Towards Revisiting Visual Place Recognition for Joining Submaps in Multimap SLAM	Markus Weißflog et.al.	2407.12408	null
2024-07-17	GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity	Shuo Cao et.al.	2407.12273	null
2024-07-16	Haze-Aware Attention Network for Single-Image Dehazing	Lihan Tong et.al.	2407.11505	null
2024-07-16	EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis	Ruijie Yang et.al.	2407.11401	null
2024-07-15	No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations	Walter Simoncini et.al.	2407.10964	link
2024-07-15	In-Loop Filtering via Trained Look-Up Tables	Zhuoyuan Li et.al.	2407.10926	null
2024-07-15	MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration	Yulin Ren et.al.	2407.10833	null
2024-07-15	DINO Pre-training for Vision-based End-to-end Autonomous Driving	Shubham Juneja et.al.	2407.10803	null
2024-07-15	Addressing Image Hallucination in Text-to-Image Generation through Factual Image Retrieval	Youngsun Lim et.al.	2407.10683	null
2024-07-15	An experimental evaluation of Siamese Neural Networks for robot localization using omnidirectional imaging in indoor environments	J. J. Cabrera et.al.	2407.10536	null

(back to top)

Image Matching

Publish Date	Title	Authors	PDF	Code
2025-07-22	A Single-step Accurate Fingerprint Registration Method Based on Local Feature Matching	Yuwei Jia et.al.	2507.16201	null
2025-07-09	Dual-Granularity Cross-Modal Identity Association for Weakly-Supervised Text-to-Person Image Matching	Yafei Zhang et.al.	2507.06744	null
2025-07-05	From Query to Explanation: Uni-RAG for Multi-Modal Retrieval-Augmented Learning in STEM	Xinyi Wu et.al.	2507.03868	null
2025-07-02	What does really matter in image goal navigation?	Gianluca Monaci et.al.	2507.01667	null
2025-06-30	Efficient and Accurate Image Provenance Analysis: A Scalable Pipeline for Large-scale Images	Jiewei Lai et.al.	2506.23707	null
2025-06-29	Dynamic Contrastive Learning for Hierarchical Retrieval: A Case Study of Distance-Aware Cross-View Geo-Localization	Suofei Zhang et.al.	2506.23077	null
2025-06-27	MatChA: Cross-Algorithm Matching with Feature Augmentation	Paula Carbó Cubero et.al.	2506.22336	null
2025-07-22	Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs	Shaojie Zhang et.al.	2506.22139	null
2025-06-27	ZeroReg3D: A Zero-shot Registration Pipeline for 3D Consecutive Histopathology Image Reconstruction	Juming Xiong et.al.	2506.21923	null
2025-06-25	Fast entropy-regularized SDP relaxations for permutation synchronization	Michael Lindsey et.al.	2506.20191	null
2025-06-18	ReSeDis: A Dataset for Referring-based Object Search across Large-Scale Image Collections	Ziling Huang et.al.	2506.15180	null
2025-06-16	EmbodiedPlace: Learning Mixture-of-Features with Embodied Constraints for Visual Place Recognition	Bingxi Liu et.al.	2506.13133	null
2025-06-12	RealKeyMorph: Keypoints in Real-world Coordinates for Resolution-agnostic Image Registration	Mina C. Moghadam et.al.	2506.10344	null
2025-06-11	Hierarchical Image Matching for UAV Absolute Visual Localization via Semantic and Structural Constraints	Xiangkai Zhang et.al.	2506.09748	null
2025-06-11	ScaleLSD: Scalable Deep Line Segment Detection Streamlined	Zeran Ke et.al.	2506.09369	link
2025-05-21	Anti-interrupted sampling repeater jamming via linear canonical Wigner distribution lightweight LFM detection	Jia-Mian Li et.al.	2506.06302	null
2025-06-05	Vanishing arcs for isolated plane curve singularities	Hanwool Bae et.al.	2506.04917	null
2025-06-05	Deep Learning Reforms Image Matching: A Survey and Outlook	Shihua Zhang et.al.	2506.04619	null
2025-06-20	SR3D: Unleashing Single-view 3D Reconstruction for Transparent and Specular Object Grasping	Mingxu Zhang et.al.	2505.24305	null
2025-06-05	Universal Domain Adaptation for Semantic Segmentation	Seun-An Choe et.al.	2505.22458	null
2025-05-23	To Glue or Not to Glue? Classical vs Learned Image Matching for Mobile Mapping Cameras to Textured Semantic 3D Building Models	Simone Gaisbauer et.al.	2505.17973	link
2025-05-16	Multi-view dense image matching with similarity learning and geometry priors	Mohamed Ali Chebbi et.al.	2505.11264	null
2025-05-12	Boosting Global-Local Feature Matching via Anomaly Synthesis for Multi-Class Point Cloud Anomaly Detection	Yuqi Cheng et.al.	2505.07375	link
2025-05-04	OBD-Finder: Explainable Coarse-to-Fine Text-Centric Oracle Bone Duplicates Discovery	Chongsheng Zhang et.al.	2505.03836	link
2025-05-06	LiftFeat: 3D Geometry-Aware Local Feature Matching	Yepeng Liu et.al.	2505.03422	link
2025-05-04	Focus What Matters: Matchability-Based Reweighting for Local Feature Matching	Dongyue Li et.al.	2505.02161	null
2025-05-15	Mitigating Modality Bias in Multi-modal Entity Alignment from a Causal Perspective	Taoyu Su et.al.	2504.19458	link
2025-04-28	Dynamic Arthroscopic Navigation System for Anterior Cruciate Ligament Reconstruction Based on Multi-level Memory Architecture	Shuo Wang et.al.	2504.19398	null
2025-04-23	Road Similarity-Based BEV-Satellite Image Matching for UGV Localization	Zhenping Sun et.al.	2504.16346	null
2025-04-18	Outlier-Robust Multi-Model Fitting on Quantum Annealers	Saurabh Pandey et.al.	2504.13836	null
2025-04-11	Geometric Consistency Refinement for Single Image Novel View Synthesis via Test-Time Adaptation of Diffusion Models	Josef Bengtson et.al.	2504.08348	null
2025-04-10	Image registration of 2D optical thin sections in a 3D porous medium: Application to a Berea sandstone digital rock image	Jaehong Chung et.al.	2504.06604	link
2025-04-22	To Match or Not to Match: Revisiting Image Matching for Reliable Visual Place Recognition	Davide Sferrazza et.al.	2504.06116	link
2025-04-10	Learning Affine Correspondences by Integrating Geometric Constraints	Pengju Sun et.al.	2504.04834	link
2025-04-01	Scaling Prompt Instructed Zero Shot Composed Image Retrieval with Image-Only Data	Yiqun Duan et.al.	2504.00812	null
2025-03-31	CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching	Zizhuo Li et.al.	2503.23925	null
2025-03-28	Pairwise Matching of Intermediate Representations for Fine-grained Explainability	Lauren Shrack et.al.	2503.22881	link
2025-03-26	Multimodal Image Matching based on Frequency-domain Information of Local Energy Response	Meng Yang et.al.	2503.20827	null
2025-03-22	Normalized Matching Transformer	Abtin Pourhadi et.al.	2503.17715	link
2025-03-20	Loop Closure from Two Views: Revisiting PGO for Scalable Trajectory Estimation through Monocular Priors	Tian Yi Lim et.al.	2503.16275	null
2025-03-20	MapGlue: Multimodal Remote Sensing Image Matching	Peihao Wu et.al.	2503.16185	link
2025-03-19	PAPI-Reg: Patch-to-Pixel Solution for Efficient Cross-Modal Registration between LiDAR Point Cloud and Camera Image	Yuanchao Yue et.al.	2503.15285	null
2025-04-07	Less Biased Noise Scale Estimation for Threshold-Robust RANSAC	Johan Edstedt et.al.	2503.13433	null
2025-03-17	SatDepth: A Novel Dataset for Satellite Image Matching	Rahul Deshmukh et.al.	2503.12706	link
2025-03-14	Refining Image Edge Detection via Linear Canonical Riesz Transforms	Shuhui Yang et.al.	2503.11148	null
2025-03-13	Speedy MASt3R	Jingxing Li et.al.	2503.10017	null
2025-03-11	Keypoint Detection and Description for Raw Bayer Images	Jiakai Lin et.al.	2503.08673	null
2025-03-06	Learning 3D Medical Image Models From Brain Functional Connectivity Network Supervision For Mental Disorder Diagnosis	Xingcan Hu et.al.	2503.04205	null
2025-03-07	Diff-Reg v2: Diffusion-Based Matching Matrix Estimation for Image Matching and 3D Registration	Qianliang Wu et.al.	2503.04127	null
2025-03-05	JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba	Xiaoyong Lu et.al.	2503.03437	null
2025-02-28	CNSv2: Probabilistic Correspondence Encoded Neural Image Servo	Anzhe Chen et.al.	2503.00132	null
2025-02-27	A2-GNN: Angle-Annular GNN for Visual Descriptor-free Camera Relocalization	Yejun Zhang et.al.	2502.20036	link
2025-02-27	RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges	Thibaut Loiseau et.al.	2502.19955	null
2025-02-26	BEV-LIO(LC): BEV Image Assisted LiDAR-Inertial Odometry with Loop Closure	Haoxin Cai et.al.	2502.19242	link
2025-02-25	PromptMID: Modal Invariant Descriptors Based on Diffusion and Vision Foundation Models for Optical-SAR Image Matching	Han Nie et.al.	2502.18104	link
2025-02-25	Improving Transformer Based Line Segment Detection with Matched Predicting and Re-ranking	Xin Tong et.al.	2502.17766	null
2025-03-04	Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model	Yaxuan Huang et.al.	2502.16779	null
2025-02-16	FeaKM: Robust Collaborative Perception under Noisy Pose Conditions	Jiuwu Hao et.al.	2502.11003	link
2025-02-24	Enhancing Ground-to-Aerial Image Matching for Visual Misinformation Detection Using Semantic Segmentation	Emanuele Mule et.al.	2502.06288	link
2025-02-04	Muographic Image Upsampling with Machine Learning for Built Infrastructure Applications	William O'Donnell et.al.	2502.02624	null
2025-02-01	MambaGlue: Fast and Robust Local Feature Matching With Mamba	Kihwan Ryoo et.al.	2502.00462	link
2025-01-24	Dense-SfM: Structure from Motion with Dense Consistent Matching	JongMin Lee et.al.	2501.14277	null
2025-01-20	MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching	Yepeng Liu et.al.	2501.11299	null
2025-01-13	MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training	Xingyi He et.al.	2501.07556	null
2025-01-13	Matching Free Depth Recovery from Structured Light	Zhuohang Yu et.al.	2501.07113	null
2025-01-02	Sparis: Neural Implicit Surface Reconstruction of Indoor Scenes from Sparse Views	Yulun Wu et.al.	2501.01196	null
2024-12-31	Towards Real-Time 2D Mapping: Harnessing Drones, AI, and Computer Vision for Advanced Insights	Bharath Kumar Agnur et.al.	2412.20210	null
2024-12-27	MINIMA: Modality Invariant Image Matching	Xingyu Jiang et.al.	2412.19412	link
2024-12-24	GIMS: Image Matching System Based on Adaptive Graph Construction and Graph Neural Network	Xianfeng Song et.al.	2412.18221	link
2024-12-17	Bringing Multimodality to Amazon Visual Search System	Xinliang Zhu et.al.	2412.13364	null
2024-12-04	Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis	Siyoon Jin et.al.	2412.03150	null
2024-11-20	DT-LSD: Deformable Transformer-based Line Segment Detection	Sebastian Janampa et.al.	2411.13005	link
2024-11-15	Image Matching Filtering and Refinement by Planes and Beyond	Fabio Bellavia et.al.	2411.09484	link
2024-11-11	XPoint: A Self-Supervised Visual-State-Space based Architecture for Multispectral Image Registration	Ismail Can Yagmur et.al.	2411.07430	link
2024-11-07	The Impact of Semi-Supervised Learning on Line Segment Detection	Johanna Engman et.al.	2411.04596	link
2024-11-04	Silver medal Solution for Image Matching Challenge 2024	Yian Wang et.al.	2411.01851	null
2024-10-30	Variable Resolution Sampling and Deep Learning Image Recovery for Accelerated Multi-Spectral MRI Near Metal Implants	Azadeh Sharafi et.al.	2410.23329	null
2024-11-05	RelationBooth: Towards Relation-Aware Customized Object Generation	Qingyu Shi et.al.	2410.23280	null
2024-10-31	ETO:Efficient Transformer-based Local Feature Matching by Organizing Multiple Homography Hypotheses	Junjie Ni et.al.	2410.22733	null
2024-10-30	LoFLAT: Local Feature Matching using Focused Linear Attention Transformer	Naijian Cao et.al.	2410.22710	null
2024-10-26	Generative Adversarial Patches for Physical Attacks on Cross-Modal Pedestrian Re-Identification	Yue Su et.al.	2410.20097	null
2024-10-01	A Robust Multisource Remote Sensing Image Matching Method Utilizing Attention and Feature Enhancement Against Noise Interference	Yuan Li et.al.	2410.11848	null
2024-10-15	LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images	Yuzhou Cheng et.al.	2410.11505	null
2024-10-12	Leveraging Semantic Cues from Foundation Vision Models for Enhanced Local Feature Correspondence	Felipe Cadar et.al.	2410.09533	link
2024-09-27	Exploiting Motion Prior for Accurate Pose Estimation of Dashboard Cameras	Yipeng Lu et.al.	2409.18673	null
2024-09-25	Game4Loc: A UAV Geo-Localization Benchmark from Game Data	Yuxiang Ji et.al.	2409.16925	link
2024-09-24	Automatic Registration of SHG and H&E Images with Feature-based Initial Alignment and Intensity-based Instance Optimization: Contribution to the COMULIS Challenge	Marek Wodzinski et.al.	2409.15931	null
2024-09-10	Weakly-supervised Camera Localization by Ground-to-satellite Image Registration	Yujiao Shi et.al.	2409.06471	link
2024-09-05	Enabling Practical and Privacy-Preserving Image Processing	Chao Wang et.al.	2409.03568	null
2024-09-20	A General Albedo Recovery Approach for Aerial Photogrammetric Images through Inverse Rendering	Shuang Song et.al.	2409.03032	link
2024-08-29	Super-Resolution works for coastal simulations	Zhi-Song Liu et.al.	2408.16553	null
2024-09-15	Mismatched: Evaluating the Limits of Image Matching Approaches and Benchmarks	Sierra Bonilla et.al.	2408.16445	link
2024-08-26	Affine steerers for structured keypoint description	Georg Bökman et.al.	2408.14186	link
2024-08-25	TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers	Chuanrui Zhang et.al.	2408.13770	null
2024-09-11	Coarse-to-fine Alignment Makes Better Speech-image Retrieval	Lifeng Zhou et.al.	2408.13119	null
2024-08-19	BrewCLIP: A Bifurcated Representation Learning Framework for Audio-Visual Retrieval	Zhenyu Lu et.al.	2408.10383	null
2024-08-14	RSD-DOG : A New Image Descriptor based on Second Order Derivatives	Darshan Venkatrayappa et.al.	2408.07687	null
2024-08-09	One Shot is Enough for Sequential Infrared Small Target Segmentation	Bingbing Dan et.al.	2408.04823	link
2024-08-07	PRISM: PRogressive dependency maxImization for Scale-invariant image Matching	Xudong Cai et.al.	2408.03598	null
2024-08-05	ConDL: Detector-Free Dense Image Matching	Monika Kwiatkowski et.al.	2408.02766	null
2024-08-04	Improving Neural Surface Reconstruction with Feature Priors from Multi-View Image	Xinlin Ren et.al.	2408.02079	link
2024-07-29	Image-text matching for large-scale book collections	Artemis Llabrés et.al.	2407.19812	link
2024-07-26	PIV3CAMS: a multi-camera dataset for multiple computer vision problems and its application to novel view-point synthesis	Sohyeong Kim et.al.	2407.18695	null
2024-07-22	RADA: Robust and Accurate Feature Learning with Domain Adaptation	Jingtai He et.al.	2407.15791	null
2024-07-17	GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection	Jingwen Yu et.al.	2407.11736	link
2024-07-16	REMM:Rotation-Equivariant Framework for End-to-End Multimodal Image Matching	Han Nie et.al.	2407.11637	link
2024-07-16	A Self-Correcting Strategy of the Digital Volume Correlation Displacement Field Based on Image Matching: Application to Poor Speckles Quality and Complex-Large Deformation	Chengsheng Li et.al.	2407.11287	null
2024-07-14	Raising the Ceiling: Conflict-Free Local Feature Matching with Dynamic View Switching	Xiaoyong Lu et.al.	2407.07789	null
2024-07-10	Mutual Information calculation on different appearances	Jiecheng Liao et.al.	2407.07410	null
2024-07-15	SfM on-the-fly: Get better 3D from What You Capture	Zongqian Zhan et.al.	2407.03939	null
2024-07-03	IMC 2024 Methods & Solutions Review	Shyam Gupta et.al.	2407.03172	null
2024-06-21	High Resolution Surface Reconstruction of Cultural Heritage Objects Using Shape from Polarization Method	F. S. Mortazavi et.al.	2406.15121	null
2024-06-16	Light Up the Shadows: Enhance Long-Tailed Entity Grounding with Concept-Guided Vision-Language Models	Yikai Zhang et.al.	2406.10902	link
2024-06-14	Grounding Image Matching in 3D with MASt3R	Vincent Leroy et.al.	2406.09756	link

(back to top)

MutilModal

Publish Date	Title	Authors	PDF	Code
2025-07-23	See the Forest and the Trees: A Synergistic Reasoning Framework for Knowledge-Based Visual Question Answering	Junjie Wang et.al.	2507.17659	null
2025-07-23	Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning	Xinyao Liu et.al.	2507.17539	null
2025-07-23	ERMV: Editing 4D Robotic Multi-view images to enhance embodied agents	Chang Nie et.al.	2507.17462	null
2025-07-23	HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs	Zhaolin Cai et.al.	2507.17394	null
2025-07-23	A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model	Zhe Xu et.al.	2507.17303	null
2025-07-23	Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation	Zixuan Wang et.al.	2507.17204	null
2025-07-22	Toward Scalable Video Narration: A Training-free Approach Using Multimodal Large Language Models	Tz-Ying Wu et.al.	2507.17050	null
2025-07-22	HOComp: Interaction-Aware Human-Object Composition	Dong Liang et.al.	2507.16813	null
2025-07-22	Enhancing Remote Sensing Vision-Language Models Through MLLM and LLM-Based High-Quality Image-Text Dataset Generation	Yiguo He et.al.	2507.16716	null
2025-07-22	Self-Contradiction as Self-Improvement: Mitigating the Generation-Understanding Gap in MLLMs	Yujin Han et.al.	2507.16663	null
2025-07-22	Pixels to Principles: Probing Intuitive Physics Understanding in Multimodal Language Models	Mohamad Ballout et.al.	2507.16572	null
2025-07-22	Spatial 3D-LLM: Exploring Spatial Awareness in 3D Vision-Language Models	Xiaoyan Wang et.al.	2507.16524	null
2025-07-22	C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning	Xiuwei Chen et.al.	2507.16518	null
2025-07-22	MONITRS: Multimodal Observations of Natural Incidents Through Remote Sensing	Shreelekha Revankar et.al.	2507.16228	null
2025-07-21	True Multimodal In-Context Learning Needs Attention to the Visual Context	Shuo Chen et.al.	2507.15807	null
2025-07-21	Surfacing Variations to Calibrate Perceived Reliability of MLLM-generated Image Descriptions	Meng Chen et.al.	2507.15692	null
2025-07-21	Extracting Visual Facts from Intermediate Layers for Mitigating Hallucinations in Multimodal Large Language Models	Haoran Zhou et.al.	2507.15652	null
2025-07-21	DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding	Xiaoyi Bao et.al.	2507.15569	null
2025-07-21	FreeCus: Free Lunch Subject-driven Customization in Diffusion Transformers	Yanbing Zhang et.al.	2507.15249	null
2025-07-20	Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction	Ce Zhang et.al.	2507.15130	null
2025-07-20	Language Integration in Fine-Tuning Multimodal Large Language Models for Image-Based Regression	Roy H. Jennings et.al.	2507.14997	null
2025-07-20	Open-set Cross Modal Generalization via Multimodal Unified Representation	Hai Huang et.al.	2507.14935	null
2025-07-20	U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs	Xiaojie Li et.al.	2507.14902	null
2025-07-20	LeAdQA: LLM-Driven Context-Aware Temporal Grounding for Video Question Answering	Xinxin Dong et.al.	2507.14784	null
2025-07-18	Moodifier: MLLM-Enhanced Emotion-Driven Image Editing	Jiarong Ye et.al.	2507.14024	null
2025-07-17	"PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models	Jing Gu et.al.	2507.13428	null
2025-07-17	Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark	Junsu Kim et.al.	2507.13314	null
2025-07-17	Automating Steering for Safe Multimodal Large Language Models	Lyucheng Wu et.al.	2507.13255	null
2025-07-17	Analysis of Image-and-Text Uncertainty Propagation in Multimodal Large Language Models with Cardiac MR-Based Applications	Yucheng Tang et.al.	2507.12945	null
2025-07-17	AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning	Yiming Ren et.al.	2507.12841	null
2025-07-17	MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval	Jeong-Woo Park et.al.	2507.12819	null
2025-07-17	DeQA-Doc: Adapting DeQA-Score to Document Image Quality Assessment	Junjie Gao et.al.	2507.12796	null
2025-07-17	City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning	Penglei Sun et.al.	2507.12795	null
2025-07-16	InSight: AI Mobile Screening Tool for Multiple Eye Disease Detection using Multimodal Fusion	Ananya Raghu et.al.	2507.12669	null
2025-07-16	Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models	Gen Luo et.al.	2507.12566	null
2025-07-16	Mitigating Object Hallucinations via Sentence-Level Early Intervention	Shangpin Peng et.al.	2507.12455	null
2025-07-16	Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker	Rachna Saxena et.al.	2507.12378	null
2025-07-16	Can LLMs Find Fraudsters? Multi-level LLM Enhanced Graph Fraud Detection	Tairan Huang et.al.	2507.11997	null
2025-07-16	Watch, Listen, Understand, Mislead: Tri-modal Adversarial Attacks on Short Videos for Content Appropriateness Evaluation	Sahid Hossain Mustakim et.al.	2507.11968	null
2025-07-16	Hyperphantasia: A Benchmark for Evaluating the Mental Visualization Capabilities of Multimodal LLMs	Mohammad Shahab Sepehri et.al.	2507.11932	null
2025-07-16	MNO : A Multi-modal Neural Operator for Parametric Nonlinear BVPs	Vamshi C. Madala et.al.	2507.11870	null
2025-07-15	Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification	Moises Andrade et.al.	2507.11662	null
2025-07-15	MapIQ: Benchmarking Multimodal Large Language Models for Map Question Answering	Varun Srivastava et.al.	2507.11625	null
2025-07-15	NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation Models	X. Feng et.al.	2507.11245	null
2025-07-15	How Far Have Medical Vision-Language Models Come? A Comprehensive Benchmarking Study	Che Liu et.al.	2507.11200	null
2025-07-15	KptLLM++: Towards Generic Keypoint Comprehension with Large Language Model	Jie Yang et.al.	2507.11102	null
2025-07-15	Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation	Yanbo Wang et.al.	2507.11001	null
2025-07-14	Warehouse Spatial Question Answering with LLM Agent	Hsiang-Wei Huang et.al.	2507.10778	null
2025-07-14	Vision Language Action Models in Robotic Manipulation: A Systematic Review	Muhayy Ud Din et.al.	2507.10672	null
2025-07-14	Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI	Jiangkai Wu et.al.	2507.10510	null
2025-07-16	Text-Visual Semantic Constrained AI-Generated Image Quality Assessment	Qiang Li et.al.	2507.10432	null
2025-07-14	DisCo: Towards Distinct and Coherent Visual Encapsulation in Video MLLMs	Jiahe Zhao et.al.	2507.10302	null
2025-07-14	FaceLLM: A Multimodal Large Language Model for Face Understanding	Hatef Otroshi Shahreza et.al.	2507.10300	null
2025-07-14	Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection	Jinglun Li et.al.	2507.10225	null
2025-07-14	A Training-Free, Task-Agnostic Framework for Enhancing MLLM Performance on High-Resolution Images	Jaeseong Lee et.al.	2507.10202	null
2025-07-14	FIX-CLIP: Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text	Bingchao Wang et.al.	2507.10095	null
2025-07-14	ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism	Zedong Liu et.al.	2507.10069	null
2025-07-14	The Man Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Model Agents	Lixu Wang et.al.	2507.10016	null
2025-07-14	Deep Hidden Cognition Facilitates Reliable Chain-of-Thought Reasoning	Zijun Chen et.al.	2507.10007	null
2025-07-11	ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way	Rajarshi Roy et.al.	2507.08679	null
2025-07-11	Introspection of Thought Helps AI Agents	Haoran Sun et.al.	2507.08664	null
2025-07-11	DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images	Haoran Sun et.al.	2507.08648	null
2025-07-11	Visual Semantic Description Generation with MLLMs for Image-Text Matching	Junyu Chen et.al.	2507.08590	null
2025-07-11	Advancing Multimodal LLMs by Large-Scale 3D Visual Instruction Dataset Generation	Liu He et.al.	2507.08513	null
2025-07-14	Efficient Deployment of Vision-Language Models on Mobile Devices: A Case Study on OnePlus 13R	Pablo Robin Guerrero et.al.	2507.08505	null
2025-07-11	Multi-modal Mutual-Guidance Conditional Prompt Learning for Vision-Language Models	Shijun Yang et.al.	2507.08410	null
2025-07-11	MM-Gesture: Towards Precise Micro-Gesture Recognition through Multimodal Fusion	Jihao Gu et.al.	2507.08344	null
2025-07-11	Improving MLLM's Document Image Machine Translation via Synchronously Self-reviewing Its OCR Proficiency	Yupu Liang et.al.	2507.08309	null
2025-07-11	M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning	Inclusion AI et.al.	2507.08306	null
2025-07-10	PyVision: Agentic Vision with Dynamic Tooling	Shitian Zhao et.al.	2507.07998	null
2025-07-10	OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding	JingLi Lin et.al.	2507.07984	null
2025-07-10	MIRA: A Novel Framework for Fusing Modalities in Medical RAG	Jinhong Wang et.al.	2507.07902	null
2025-07-10	SpatialViz-Bench: Automatically Generated Spatial Visualization Reasoning Tasks for MLLMs	Siting Wang et.al.	2507.07610	null
2025-07-10	Single-to-mix Modality Alignment with Multimodal Large Language Model for Document Image Machine Translation	Yupu Liang et.al.	2507.07572	null
2025-07-11	StarDojo: Benchmarking Open-Ended Behaviors of Agentic Multimodal LLMs in Production-Living Simulations with Stardew Valley	Weihao Tan et.al.	2507.07445	null
2025-07-10	Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning	Jingjing Jiang et.al.	2507.07424	null
2025-07-09	Robust Multimodal Large Language Models Against Modality Conflict	Zongmeng Zhang et.al.	2507.07151	null
2025-07-09	Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor	Vatsal Agarwal et.al.	2507.07106	null
2025-07-09	Learning Deliberately, Acting Intuitively: Unlocking Test-Time Reasoning in Multimodal LLMs	Yahan Yu et.al.	2507.06999	null
2025-07-09	Omni-Video: Democratizing Unified Video Understanding and Generation	Zhiyu Tan et.al.	2507.06119	null
2025-07-08	Enhancing Synthetic CT from CBCT via Multimodal Fusion and End-To-End Registration	Maximilian Tschuchnig et.al.	2507.06067	null
2025-07-08	BlueLM-2.5-3B Technical Report	Baojiao Xiong et.al.	2507.05934	null
2025-07-08	From ID-based to ID-free: Rethinking ID Effectiveness in Multimodal Collaborative Filtering Recommendation	Guohao Li et.al.	2507.05715	null
2025-07-08	MLlm-DR: Towards Explainable Depression Recognition with MultiModal Large Language Models	Wei Zhang et.al.	2507.05591	null
2025-07-07	MindFlow: Revolutionizing E-commerce Customer Support with Multimodal LLM Agents	Ming Gong et.al.	2507.05330	null
2025-07-07	Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing	Chun-Hsiao Yeh et.al.	2507.05259	null
2025-07-07	Spatio-Temporal LLM: Reasoning about Environments and Actions	Haozhen Zheng et.al.	2507.05258	null
2025-07-07	Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning	Yana Wei et.al.	2507.05255	null
2025-07-07	Differential Attention for Multimodal Crisis Event Analysis	Nusrat Munia et.al.	2507.05165	null
2025-07-07	Robust Incomplete-Modality Alignment for Ophthalmic Disease Grading and Diagnosis via Labeled Optimal Transport	Qinkai Yu et.al.	2507.04999	null
2025-07-07	ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation	Chenchen Zhang et.al.	2507.04952	null
2025-07-07	ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding	Jianjiang Yang et.al.	2507.04943	null
2025-07-07	HV-MMBench: Benchmarking MLLMs for Human-Centric Video Understanding	Yuxuan Cai et.al.	2507.04909	null
2025-07-07	Spatial and Semantic Embedding Integration for Stereo Sound Event Localization and Detection in Regular Videos	Davide Berghi et.al.	2507.04845	null
2025-07-07	From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection	Zexi Jia et.al.	2507.04769	null
2025-07-03	Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation	Jiaer Xia et.al.	2507.02859	null
2025-07-03	Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection	Ziqi Miao et.al.	2507.02844	null
2025-07-03	Multimodal Mathematical Reasoning with Diverse Solving Perspective	Wenhao Shi et.al.	2507.02804	null
2025-07-03	AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models	Ziyin Zhou et.al.	2507.02664	null
2025-07-03	VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning	Siran Chen et.al.	2507.02626	null
2025-07-03	AuroraLong: Bringing RNNs Back to Efficient Open-Ended Video Understanding	Weili Xu et.al.	2507.02591	null
2025-07-03	LaCo: Efficient Layer-wise Compression of Visual Tokens for Multimodal Large Language Models	Juntao Liu et.al.	2507.02279	null
2025-07-03	SurgVisAgent: Multimodal Agentic Model for Versatile Surgical Visual Enhancement	Zeyu Lei et.al.	2507.02252	null
2025-07-02	Kwai Keye-VL Technical Report	Kwai Keye Team et.al.	2507.01949	null
2025-07-02	Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning	Qingdong He et.al.	2507.01908	null
2025-07-02	TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types	Yuhao Lin et.al.	2507.01857	null
2025-07-02	Token Communication in the Era of Large Models: An Information Bottleneck-Based Approach	Hao Wei et.al.	2507.01728	null
2025-07-02	AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness	Zixin Chen et.al.	2507.01702	null
2025-07-02	SAILViT: Towards Robust and Generalizable Visual Backbones for MLLMs via Gradual Feature Refinement	Weijie Yin et.al.	2507.01643	null
2025-07-02	SafePTR: Token-Level Jailbreak Defense in Multimodal LLMs via Prune-then-Restore Mechanism	Beitao Chen et.al.	2507.01513	null
2025-07-02	AVC-DPO: Aligned Video Captioning via Direct Preference Optimization	Jiyang Tang et.al.	2507.01492	null
2025-07-02	AI Agents and Agentic AI-Navigating a Plethora of Concepts for Future Manufacturing	Yinwang Ren et.al.	2507.01376	null
2025-07-02	Dynamical Multimodal Fusion with Mixture-of-Experts for Localizations	Bohao Wang et.al.	2507.01337	null
2025-07-01	Teaching Time Series to See and Speak: Forecasting with Aligned Visual and Textual Perspectives	Sixun Dong et.al.	2506.24124	null
2025-06-30	DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World	Xiangtai Li et.al.	2506.24102	null
2025-06-30	A Survey on Vision-Language-Action Models for Autonomous Driving	Sicong Jiang et.al.	2506.24044	null
2025-07-01	Graft: Integrating the Domain Knowledge via Efficient Parameter Synergy for MLLMs	Yang Dai et.al.	2506.23940	null
2025-06-30	VAP-Diffusion: Enriching Descriptions with MLLMs for Enhanced Medical Image Generation	Peng Huang et.al.	2506.23641	null
2025-06-30	Unified Multimodal Understanding via Byte-Pair Visual Encoding	Wanpeng Zhang et.al.	2506.23639	null
2025-06-30	PGOV3D: Open-Vocabulary 3D Semantic Segmentation with Partial-to-Global Curriculum	Shiqi Zhang et.al.	2506.23607	null
2025-06-30	MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI	Huanjin Yao et.al.	2506.23563	null
2025-06-30	Reinforcement Fine-Tuning Enables MLLMs Learning Novel Tasks Stably	Zhihao Zhang et.al.	2506.23508	null
2025-06-30	Evaluation of Geolocation Capabilities of Multimodal Large Language Models and Analysis of Associated Privacy Risks	Xian Zhang et.al.	2506.23481	null
2025-06-27	Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs	Shaojie Zhang et.al.	2506.22139	null
2025-06-27	Towards Scalable and Robust White Matter Lesion Localization via Multimodal Deep Learning	Julia Machnio et.al.	2506.22041	null
2025-06-27	R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning	Biao Wang et.al.	2506.21980	null
2025-06-27	Grounding-Aware Token Pruning: Recovering from Drastic Performance Drops in Visual Grounding Caused by Pruning	Tzu-Chun Chien et.al.	2506.21873	null
2025-06-27	DeepTalk: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE	Hang Shao et.al.	2506.21864	null
2025-06-26	FOCUS: Internal MLLM Representations for Efficient Fine-Grained Visual Question Answering	Liangyu Zhong et.al.	2506.21710	null
2025-06-26	APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization	Minjie Hong et.al.	2506.21655	null
2025-06-26	Exploring the Design Space of 3D MLLMs for CT Report Generation	Mohammed Baharoon et.al.	2506.21535	null
2025-06-26	TableMoE: Neuro-Symbolic Routing for Structured Expert Reasoning in Multimodal Table Understanding	Junwen Zhang et.al.	2506.21393	null
2025-06-26	SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning	Melanie Rieff et.al.	2506.21355	null
2025-06-27	FairyGen: Storied Cartoon Video from a Single Child-Drawn Character	Jiayi Zheng et.al.	2506.21272	null
2025-06-26	Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents	Tianyi Men et.al.	2506.21252	null
2025-06-26	Task-Aware KV Compression For Cost-Effective Long Video Understanding	Minghao Qin et.al.	2506.21184	null
2025-06-26	OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography	Caoshuo Li et.al.	2506.21101	null
2025-06-26	V2X-REALM: Vision-Language Model-Based Robust End-to-End Cooperative Autonomous Driving with Adaptive Long-Tail Modeling	Junwei You et.al.	2506.21041	null
2025-06-26	Evidence-based diagnostic reasoning with multi-agent copilot for human pathology	Chengkuan Chen et.al.	2506.20964	null
2025-06-26	E-FreeM2: Efficient Training-Free Multi-Scale and Cross-Modal News Verification via MLLMs	Van-Hoang Phan et.al.	2506.20944	null
2025-06-25	UniCode $^2$ : Cascaded Large-scale Codebooks for Unified Multimodal Understanding and Generation	Yanzhe Chen et.al.	2506.20214	null
2025-06-25	BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos	Jiahao Lin et.al.	2506.20103	null
2025-06-24	MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection	Zhengxiang Huang et.al.	2506.19884	null
2025-06-24	Multimodal large language models and physics visual tasks: comparative analysis of performance and costs	Giulia Polverini et.al.	2506.19662	null
2025-06-24	Surgery-R1: Advancing Surgical-VQLA with Reasoning Multimodal Large Language Model via Reinforcement Learning	Pengfei Hao et.al.	2506.19469	null
2025-06-24	Mem4Nav: Boosting Vision-and-Language Navigation in Urban Environments with a Hierarchical Spatial-Cognition Long-Short Memory System	Lixuan He et.al.	2506.19433	null
2025-06-24	Memory-Augmented Incomplete Multimodal Survival Prediction via Cross-Slide and Gene-Attentive Hypergraph Learning	Mingcheng Qu et.al.	2506.19324	null
2025-06-24	MSR-Align: Policy-Grounded Multimodal Alignment for Safety-Aware Reasoning in Vision-Language Models	Yinan Xia et.al.	2506.19257	null
2025-06-24	Video-XL-2: Towards Very Long-Video Understanding Through Task-Aware KV Sparsification	Minghao Qin et.al.	2506.19225	null
2025-06-24	MedErr-CT: A Visual Question Answering Benchmark for Identifying and Correcting Errors in CT Reports	Sunggu Kyung et.al.	2506.19217	null
2025-06-23	MOSCARD -- Causal Reasoning and De-confounding for Multimodal Opportunistic Screening of Cardiovascular Adverse Events	Jialu Pi et.al.	2506.19174	null
2025-06-23	Universal Video Temporal Grounding with Generative Multi-modal Large Language Models	Zeqian Li et.al.	2506.18883	null
2025-06-23	TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting	Zhongbin Guo et.al.	2506.18862	null
2025-06-23	SIM-Net: A Multimodal Fusion Network Using Inferred 3D Object Shape Point Clouds from RGB Images for 2D Classification	Youcef Sklab et.al.	2506.18683	null
2025-06-24	Object-aware Sound Source Localization via Audio-Visual Scene Understanding	Sung Jin Um et.al.	2506.18557	null
2025-06-23	MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis	Yuting Zhang et.al.	2506.18512	null
2025-06-23	Generalizing Vision-Language Models to Novel Domains: A Comprehensive Survey	Xinyao Li et.al.	2506.18504	null
2025-06-23	AViLA: Asynchronous Vision-Language Agent for Streaming Multimodal Data Interaction	Gengyuan Zhang et.al.	2506.18472	null
2025-06-23	What You Think Is What You Get: Bridge User Intent and Transfer Function Design through Multimodal Large Language Models	Yiyao Wang et.al.	2506.18407	null
2025-06-23	RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language Models	Yeongtak Oh et.al.	2506.18369	null
2025-06-24	Multimodal Fusion SLAM with Fourier Attention	Youjie Zhou et.al.	2506.18204	null
2025-06-20	MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models	Xiaolong Wang et.al.	2506.17046	null
2025-06-20	MM-AttacKG: A Multimodal Approach to Attack Graph Construction with Large Language Models	Yongheng Zhang et.al.	2506.16968	null
2025-06-20	Enhancing Step-by-Step and Verifiable Medical Reasoning in MLLMs	Haoran Sun et.al.	2506.16962	link
2025-06-20	LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models	Fanfei Li et.al.	2506.16950	null
2025-06-20	Multimodal Fused Learning for Solving the Generalized Traveling Salesman Problem in Robotic Task Planning	Jiaqi Chen et.al.	2506.16931	null
2025-06-20	With Limited Data for Multimodal Alignment, Let the STRUCTURE Guide You	Fabian Gröger et.al.	2506.16895	null
2025-06-20	IsoNet: Causal Analysis of Multimodal Transformers for Neuromuscular Gesture Classification	Eion Tyacke et.al.	2506.16744	null
2025-06-19	How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?	Giuseppe Lando et.al.	2506.16450	null
2025-06-19	GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning	Yi Chen et.al.	2506.16141	link
2025-06-18	Demystifying the Visual Quality Paradox in Multimodal Large Language Models	Shuo Xing et.al.	2506.15645	null
2025-06-18	Creating User-steerable Projections with Interactive Semantic Mapping	Artur André Oliveira et.al.	2506.15479	null
2025-06-18	Multimodal Large Language Models for Medical Report Generation via Customized Prompt Tuning	Chunlei Li et.al.	2506.15477	null
2025-06-18	Understanding GUI Agent Localization Biases through Logit Sharpness	Xingjian Tao et.al.	2506.15425	null
2025-06-18	MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering	Xinqi Fan et.al.	2506.15298	null
2025-06-18	From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem	Yanxu Mao et.al.	2506.15170	null
2025-06-17	ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM	Yujun Wang et.al.	2506.14766	null
2025-06-17	Exploring MLLMs Perception of Network Visualization Principles	Jacob Miller et.al.	2506.14611	null
2025-06-17	M2BeamLLM: Multimodal Sensing-empowered mmWave Beam Prediction with Large Language Models	Can Zheng et.al.	2506.14532	null
2025-06-17	LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops	Jiyuan Fu et.al.	2506.14493	null
2025-06-17	GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies	Jingqi Yang et.al.	2506.14477	link
2025-06-17	Dense360: Dense Understanding from Omnidirectional Panoramas	Yikang Zhou et.al.	2506.14471	null
2025-06-17	Vela: Scalable Embeddings with Voice Large Language Models for Multimodal Retrieval	Ruofan Hu et.al.	2506.14445	null
2025-06-17	From Black Boxes to Transparent Minds: Evaluating and Enhancing the Theory of Mind in Multimodal Large Language Models	Xinyang Li et.al.	2506.14224	null
2025-06-17	A multi-stage augmented multimodal interaction network for fish feeding intensity quantification	Shulong Zhang et.al.	2506.14170	null
2025-06-17	SceneAware: Scene-Constrained Pedestrian Trajectory Prediction with LLM-Guided Walkability	Juho Bai et.al.	2506.14144	null
2025-06-16	Discrete Diffusion in Large Language and Multimodal Models: A Survey	Runpeng Yu et.al.	2506.13759	link
2025-06-16	TimeMaster: Training Time-Series Multimodal LLMs to Reason via Reinforcement Learning	Junru Zhang et.al.	2506.13705	link
2025-06-16	DesignCoder: Hierarchy-Aware and Self-Correcting UI Code Generation with Large Language Models	Yunnong Chen et.al.	2506.13663	null
2025-06-16	Omni-AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented for Efficient Long Video Understanding	Zhucun Xue et.al.	2506.13589	null
2025-06-16	RealHiTBench: A Comprehensive Realistic Hierarchical Table Benchmark for Evaluating LLM-Based Table Analysis	Pengzuo Wu et.al.	2506.13405	null
2025-06-16	VIS-Shepherd: Constructing Critic for LLM-based Data Visualization Generation	Bo Pan et.al.	2506.13326	link
2025-06-16	ZINA: Multimodal Fine-grained Hallucination Detection and Editing	Yuiga Wada et.al.	2506.13130	null
2025-06-16	Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning	Haibo Qiu et.al.	2506.13056	null
2025-06-16	CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model	Jiangtong Li et.al.	2506.13055	null
2025-06-15	SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language Models	Xinyi Zhao et.al.	2506.12992	link
2025-06-16	VGR: Visual Grounded Reasoning	Jiacong Wang et.al.	2506.11991	null
2025-06-13	Are Multimodal Large Language Models Pragmatically Competent Listeners in Simple Reference Resolution Tasks?	Simeon Junker et.al.	2506.11807	null
2025-06-13	Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization	Wenqi Liu et.al.	2506.11712	null
2025-06-13	Dynamic Mixture of Curriculum LoRA Experts for Continual Multimodal Instruction Tuning	Chendi Ge et.al.	2506.11672	null
2025-06-13	VFaith: Do Large Multimodal Models Really Reason on Seen Images Rather than Previous Memories?	Jiachen Yu et.al.	2506.11571	null
2025-06-13	DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs	Bo-Cheng Chiu et.al.	2506.11558	null
2025-06-13	Investigating Vulnerabilities and Defenses Against Audio-Visual Attacks: A Comprehensive Survey Emphasizing Multimodal Models	Jinming Wen et.al.	2506.11521	null
2025-06-13	Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs	Xiao Xu et.al.	2506.11515	null
2025-06-13	Stop learning it all to mitigate visual hallucination, Focus on the hallucination target	Dokyoon Yoon et.al.	2506.11417	null
2025-06-12	Combining Log Data and Collaborative Dialogue Features to Predict Project Quality in Middle School AI Education	Conrad Borchers et.al.	2506.11326	null
2025-06-12	Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs	Qizhe Zhang et.al.	2506.10967	link
2025-06-12	Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?	Fei Lin et.al.	2506.10912	null
2025-06-12	VideoDeepResearch: Long Video Understanding With Agentic Tool Using	Huaying Yuan et.al.	2506.10821	link
2025-06-13	Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning	Yuhao Zhou et.al.	2506.10521	null
2025-06-12	MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models	Yu Huang et.al.	2506.10465	null
2025-06-12	Towards Robust Multimodal Emotion Recognition under Missing Modalities and Distribution Shifts	Guowei Zhong et.al.	2506.10452	link
2025-06-12	MF2Summ: Multimodal Fusion for Video Summarization with Temporal Alignment	Shuo wang et.al.	2506.10430	null
2025-06-12	Can Sound Replace Vision in LLaVA With Token Substitution?	Ali Vosoughi et.al.	2506.10416	null
2025-06-12	Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences?	Yingjin Song et.al.	2506.10415	null
2025-06-12	Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series	Ching Chang et.al.	2506.10412	null
2025-06-11	OctoNav: Towards Generalist Embodied Navigation	Chen Gao et.al.	2506.09839	null
2025-06-11	MMME: A Spontaneous Multi-Modal Micro-Expression Dataset Enabling Visual-Physiological Fusion	Chuang Maa et.al.	2506.09834	link
2025-06-11	Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning	Yuting Li et.al.	2506.09736	link
2025-06-11	HSENet: Hybrid Spatial Encoding Network for 3D Medical Vision-Language Understanding	Yanzhao Shi et.al.	2506.09634	null
2025-06-11	AD^2-Bench: A Hierarchical CoT Benchmark for MLLM in Autonomous Driving under Adverse Conditions	Zhaoyang Wei et.al.	2506.09557	null
2025-06-10	BioLangFusion: Multimodal Fusion of DNA, mRNA, and Protein Language Models	Amina Mollaysa et.al.	2506.08936	null
2025-06-10	What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities	Wendong Bu et.al.	2506.08933	null
2025-06-10	Enhancing Synthetic CT from CBCT via Multimodal Fusion: A Study on the Impact of CBCT Quality and Alignment	Maximilian Tschuchnig et.al.	2506.08716	null
2025-06-10	From Pixels to Graphs: using Scene and Knowledge Graphs for HD-EPIC VQA Challenge	Agnese Taluzzi et.al.	2506.08553	null
2025-06-09	Serendipitous Recommendation with Multimodal LLM	Haoting Wang et.al.	2506.08283	null
2025-06-09	Instruction-Tuned Video-Audio Models Elucidate Functional Specialization in the Brain	Subba Reddy Oota et.al.	2506.08277	link
2025-06-09	GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior	Penghao Wu et.al.	2506.08012	null
2025-06-09	Play to Generalize: Learning to Reason Through Game Play	Yunfei Xie et.al.	2506.08011	link
2025-06-09	CyberV: Cybernetics for Test-time Scaling in Video Understanding	Jiahao Meng et.al.	2506.07971	link
2025-06-09	SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence	Ziyang Gong et.al.	2506.07966	link
2025-06-09	WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning	Jie Yang et.al.	2506.07905	link
2025-06-09	PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement	Teng Hu et.al.	2506.07848	null
2025-06-09	HAIBU-ReMUD: Reasoning Multimodal Ultrasound Dataset and Model Bridging to General Specific Domains	Shijie Wang et.al.	2506.07837	link
2025-06-09	WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code	Zhiyu Lin et.al.	2506.07818	link
2025-06-09	Evaluating Visual Mathematics in Multimodal LLMs: A Multilingual Benchmark Based on the Kangaroo Tests	Arnau Igualde Sáez et.al.	2506.07418	null
2025-06-08	Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification	Tianyi Bai et.al.	2506.07235	null
2025-06-06	DesignBench: A Comprehensive Benchmark for MLLM-based Front-end Code Generation	Jingyu Xiao et.al.	2506.06251	link
2025-06-06	VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning	Zikang Wang et.al.	2506.06097	null
2025-06-06	MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems?	Zhitao He et.al.	2506.06034	null
2025-06-06	Object Navigation with Structure-Semantic Reasoning-Based Multi-level Map and Multimodal Decision-Making LLM	Chongshang Yan et.al.	2506.05896	null
2025-06-06	Human-AI Alignment of Multimodal Large Language Models with Speech-Language Pathologists in Parent-Child Interactions	Weiyan Shi et.al.	2506.05879	null
2025-06-09	Heartcare Suite: Multi-dimensional Understanding of ECG with Raw Multi-lead Signal Modeling	Yihan Xie et.al.	2506.05831	null
2025-06-06	Pts3D-LLM: Studying the Impact of Token Structure for 3D Scene Understanding With Large Language Models	Hugues Thomas et.al.	2506.05689	null
2025-06-05	MLLM-CL: Continual Learning for Multimodal Large Language Models	Hongbo Zhao et.al.	2506.05453	null
2025-06-05	SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs	Jiahui Wang et.al.	2506.05344	link
2025-06-05	AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs	Lidong Lu et.al.	2506.05328	null
2025-06-05	EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?	Yuqian Yuan et.al.	2506.05287	null
2025-06-05	MokA: Multimodal Low-Rank Adaptation for MLLMs	Yake Wei et.al.	2506.05191	null
2025-06-05	On the Comprehensibility of Multi-structured Financial Documents using LLMs and Pre-processing Tools	Shivani Upadhyay et.al.	2506.05182	link
2025-06-05	The NTNU System at the S&I Challenge 2025 SLA Open Track	Hong-Yun Lin et.al.	2506.05121	null
2025-06-05	FinMultiTime: A Four-Modal Bilingual Dataset for Financial Time-Series Analysis	Wenyan Xu et.al.	2506.05019	link
2025-06-05	TextVidBench: A Benchmark for Long Video Scene Text Understanding	Yangyang Zhong et.al.	2506.04983	null
2025-06-05	APVR: Hour-Level Long Video Understanding with Adaptive Pivot Visual Information Retrieval	Hong Gao et.al.	2506.04953	null
2025-06-05	From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes	Tianxu Wang et.al.	2506.04897	null
2025-06-04	Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning	Shuang Chen et.al.	2506.04207	null
2025-06-04	MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos	Kejian Zhu et.al.	2506.04141	null
2025-06-04	Multimodal Tabular Reasoning with Privileged Structured Information	Jun-Peng Jiang et.al.	2506.04088	null
2025-06-04	Vision Remember: Alleviating Visual Forgetting in Efficient MLLM with Vision Feature Resample	Ze Feng et.al.	2506.03928	null
2025-06-04	HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models	Zhaolu Kang et.al.	2506.03922	link
2025-06-04	ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning	Feng Han et.al.	2506.03596	link
2025-06-04	Resolving Task Objective Conflicts in Unified Multimodal Understanding and Generation via Task-Aware Mixture-of-Experts	Jiaxing Zhang et.al.	2506.03591	null
2025-06-04	WIFE-Fusion:Wavelet-aware Intra-inter Frequency Enhancement for Multi-model Image Fusion	Tianpei Zhang et.al.	2506.03555	null
2025-06-05	Geometric Visual Fusion Graph Neural Networks for Multi-Person Human-Object Interaction Recognition in Videos	Tanqiu Qiao et.al.	2506.03440	null
2025-06-03	A Multimodal, Multilingual, and Multidimensional Pipeline for Fine-grained Crowdsourcing Earthquake Damage Evaluation	Zihui Ma et.al.	2506.03360	link
2025-06-03	MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query	Wei Chow et.al.	2506.03144	null
2025-06-03	AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation	Lu Qiu et.al.	2506.03126	null
2025-06-03	Kernel-based Unsupervised Embedding Alignment for Enhanced Visual Representation in Vision-language Models	Shizhan Gong et.al.	2506.02557	null
2025-06-03	VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning	Hao Yan et.al.	2506.02537	null
2025-06-03	Minos: A Multimodal Evaluation Model for Bidirectional Generation Between Image and Text	Junzhe Zhang et.al.	2506.02494	null
2025-06-02	From Street Views to Urban Science: Discovering Road Safety Factors with Multimodal Large Language Models	Yihong Tang et.al.	2506.02242	null
2025-06-02	MLLMs Need 3D-Aware Representation Supervision for Scene Understanding	Xiaohu Huang et.al.	2506.01946	null
2025-06-02	Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency	Hongyu Li et.al.	2506.01908	link
2025-06-02	MoDA: Modulation Adapter for Fine-Grained Visual Grounding in Instructional MLLMs	Wayner Barrios et.al.	2506.01850	null
2025-06-02	FaceCoT: A Benchmark Dataset for Face Anti-Spoofing with Chain-of-Thought Reasoning	Honglu Zhang et.al.	2506.01783	null
2025-05-30	Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents	Yaxin Luo et.al.	2505.24878	link
2025-05-30	MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning	Yiqing Liang et.al.	2505.24871	null
2025-05-30	SiLVR: A Simple Language-based Video Reasoning Framework	Ce Zhang et.al.	2505.24869	link
2025-05-30	FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation	Junyu Luo et.al.	2505.24714	link
2025-05-30	Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors	Duo Zheng et.al.	2505.24625	null
2025-05-30	Mixpert: Mitigating Multimodal Learning Conflicts with Efficient Mixture-of-Vision-Experts	Xin He et.al.	2505.24541	null
2025-05-30	Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model	Yuting Zhang et.al.	2505.24476	link
2025-05-30	SORCE: Small Object Retrieval in Complex Environments	Chunxu Liu et.al.	2505.24441	link
2025-05-30	KEVER^2: Knowledge-Enhanced Visual Emotion Reasoning and Retrieval	Fanhang Man et.al.	2505.24342	null
2025-06-02	MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM	Bowen Dong et.al.	2505.24238	null
2025-05-29	Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought	Yunze Man et.al.	2505.23766	null
2025-05-29	MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence	Sihan Yang et.al.	2505.23764	null
2025-05-29	Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence	Diankun Wu et.al.	2505.23747	null
2025-05-29	PixelThink: Towards Efficient Chain-of-Pixel Reasoning	Song Wang et.al.	2505.23727	null
2025-05-29	VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos	Tingyu Song et.al.	2505.23693	link
2025-05-29	Human Empathy as Encoder: AI-Assisted Depression Assessment in Special Education	Boning Zhao et.al.	2505.23631	null
2025-05-29	A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis	Shengyuan Liu et.al.	2505.23601	null
2025-05-29	MAPLE: A Mobile Assistant with Persistent Finite State Machines for Recovery Reasoning	Linqiang Guo et.al.	2505.23596	null
2025-05-29	Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles	Zifu Wang et.al.	2505.23590	link
2025-05-29	OmniEarth-Bench: Towards Holistic Evaluation of Earth's Six Spheres and Cross-Spheres Interactions with Multimodal Observational Earth Data	Fengxiang Wang et.al.	2505.23522	null
2025-05-28	Spatial Knowledge Graph-Guided Multimodal Synthesis	Yida Xue et.al.	2505.22633	null
2025-05-28	RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction	Yuchi Wang et.al.	2505.22613	null
2025-05-28	Multi-MLLM Knowledge Distillation for Out-of-Context News Detection	Yimeng Gu et.al.	2505.22517	null
2025-05-28	A Closer Look at Multimodal Representation Collapse	Abhra Chaudhuri et.al.	2505.22483	null
2025-05-28	Fostering Video Reasoning via Next-Event Prediction	Haonan Wang et.al.	2505.22457	null
2025-05-28	Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO	Lai Wei et.al.	2505.22453	link
2025-05-28	Privacy-preserving Prompt Personalization in Federated Learning for Multimodal Large Language Models	Sizai Hou et.al.	2505.22447	null
2025-05-28	Zooming from Context to Cue: Hierarchical Preference Optimization for Multi-Image MLLMs	Xudong Li et.al.	2505.22396	null
2025-05-28	Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start	Lai Wei et.al.	2505.22334	link
2025-05-28	CADReview: Automatically Reviewing CAD Programs with Error Detection and Correction	Jiali Chen et.al.	2505.22304	null
2025-05-27	UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents	Han Xiao et.al.	2505.21496	link
2025-05-27	Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment	Xiaojun Jia et.al.	2505.21494	link
2025-05-27	Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO	Muzhi Zhu et.al.	2505.21457	null
2025-05-27	AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs	Xuanwen Ding et.al.	2505.21389	link
2025-05-27	Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?	Junhao Cheng et.al.	2505.21374	link
2025-05-27	MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios	Yang Shi et.al.	2505.21333	null
2025-05-27	MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs	Jiakang Yuan et.al.	2505.21327	null
2025-05-27	SOLIDGEO: Measuring Multimodal Spatial Math Reasoning in Solid Geometry	Peijie Wang et.al.	2505.21177	null
2025-05-27	IKMo: Image-Keyframed Motion Generation with Trajectory-Pose Conditioned Motion Diffusion Model	Yang Zhao et.al.	2505.21146	null
2025-05-27	Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts	Yue Zhang et.al.	2505.21079	null
2025-05-27	MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents	Ziming Wei et.al.	2505.20148	link
2025-05-26	FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities	Jin Wang et.al.	2505.20147	null
2025-05-26	Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion	Zheqi Lv et.al.	2505.20053	link
2025-05-26	Correlating instruction-tuning (in multimodal models) with vision-language processing (in the brain)	Subba Reddy Oota et.al.	2505.20029	link
2025-05-26	ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving	Xueyi Liu et.al.	2505.20024	link
2025-05-26	NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID	Shihao Li et.al.	2505.20001	null
2025-05-27	Dynamic-I2V: Exploring Image-to-Video Generation Models via Multimodal LLM	Peng Liu et.al.	2505.19901	null
2025-05-26	Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging	Yongxian Wei et.al.	2505.19892	link
2025-05-26	Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought	Chao Huang et.al.	2505.19877	link
2025-05-26	Efficient Multi-modal Long Context Learning for Training-free Adaptation	Zehong Ma et.al.	2505.19812	link
2025-05-23	Few-Shot Learning from Gigapixel Images via Hierarchical Vision-Language Alignment and Modeling	Bryan Wong et.al.	2505.17982	null
2025-05-23	T2I-Eval-R1: Reinforcement Learning-Driven Reasoning for Interpretable Text-to-Image Evaluation	Zi-Ao Ma et.al.	2505.17897	null
2025-05-23	Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities	Ziwei Zhou et.al.	2505.17862	link
2025-05-23	Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM	Donghwan Chi et.al.	2505.17726	null
2025-05-23	HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning	Chuhao Zhou et.al.	2505.17645	null
2025-05-23	RoHyDR: Robust Hybrid Diffusion Recovery for Incomplete Multimodal Emotion Recognition	Yuehan Jin et.al.	2505.17501	null
2025-05-23	The Coherence Trap: When MLLM-Crafted Narratives Exploit Manipulated Visual Contexts	Yuchen Zhang et.al.	2505.17476	null
2025-05-23	FinRAGBench-V: A Benchmark for Multimodal RAG with Visual Citation in the Financial Domain	Suifeng Zhao et.al.	2505.17471	null
2025-05-23	FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow	Haoyu Sun et.al.	2505.17399	link
2025-05-23	Chart-to-Experience: Benchmarking Multimodal LLMs for Predicting Experiential Impact of Charts	Seon Gyeom Kim et.al.	2505.17374	null
2025-05-22	GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning	Chengqi Duan et.al.	2505.17022	link
2025-05-22	Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework	Chenhao Zhang et.al.	2505.17019	link
2025-05-22	SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward	Kaixuan Fan et.al.	2505.17018	link
2025-05-22	Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models	Runsen Xu et.al.	2505.17015	null
2025-05-22	SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding	Haoning Wu et.al.	2505.17012	link
2025-05-22	LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning	Zebin You et.al.	2505.16933	null
2025-05-22	Backdoor Cleaning without External Guidance in MLLM Fine-tuning	Xuankun Rong et.al.	2505.16916	link
2025-05-22	GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent	Bin Xie et.al.	2505.16827	link
2025-05-22	Locate-then-Merge: Neuron-Level Parameter Fusion for Mitigating Catastrophic Forgetting in Multimodal LLMs	Zeping Yu et.al.	2505.16703	null
2025-05-22	R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO	Huanjin Yao et.al.	2505.16673	link
2025-05-20	UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation	Rui Tian et.al.	2505.14682	null
2025-05-20	Representation Learning for Semantic Alignment of Language, Audio, and Visual Modalities	Parthasaarathy Sudarsanam et.al.	2505.14562	null
2025-05-20	ModRWKV: Transformer Multimodality in Linear Time	Jiale Kang et.al.	2505.14505	link
2025-05-20	Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents	Pengzhou Cheng et.al.	2505.14418	null
2025-05-20	ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations	Xuecheng Wu et.al.	2505.14404	null
2025-05-20	TF-Mamba: Text-enhanced Fusion Mamba with Missing Modalities for Robust Multimodal Sentiment Analysis	Xiang Li et.al.	2505.14329	link
2025-05-20	Speculative Decoding Reimagined for Multimodal Large Language Models	Luxi Lin et.al.	2505.14260	link
2025-05-20	UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning	Sule Bai et.al.	2505.14231	null
2025-05-20	Towards Omnidirectional Reasoning with 360-R1: A Dataset, Benchmark, and GRPO-based Method	Xinshen Zhang et.al.	2505.14197	null
2025-05-20	Texts or Images? A Fine-grained Analysis on the Effectiveness of Input Representations and Models for Table Question Answering	Wei Zhou et.al.	2505.14131	null
2025-05-19	MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision	Lingxiao Du et.al.	2505.13427	link
2025-05-19	FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning	Zhuozhao Hu et.al.	2505.13419	link
2025-05-19	MR. Judge: Multimodal Reasoner as a Judge	Renjie Pi et.al.	2505.13403	null
2025-05-19	MultiActor-Audiobook: Zero-Shot Audiobook Generation with Faces and Voices of Multiple Speakers	Kyeongman Park et.al.	2505.13082	null
2025-05-19	Walking the Tightrope: Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning	Xiaoyu Yang et.al.	2505.13081	null
2025-05-19	Advancing Sequential Numerical Prediction in Autoregressive Models	Xiang Fei et.al.	2505.13077	link
2025-05-19	FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models	Hengxing Cai et.al.	2505.12835	link
2025-05-19	Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering	Jianfeng Cai et.al.	2505.12826	null
2025-05-19	Correspondence of high-dimensional emotion structures elicited by video clips between humans and Multimodal LLMs	Haruka Asanuma et.al.	2505.12746	null
2025-05-19	Shadow-FT: Tuning Instruct via Base	Taiqiang Wu et.al.	2505.12716	link
2025-05-16	GODBench: A Benchmark for Multimodal Large Language Models in Video Comment Art	Chenkai Zhang et.al.	2505.11436	link
2025-05-16	Visual Planning: Let's Think Only with Images	Yi Xu et.al.	2505.11409	link
2025-05-16	EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models	Bohao Xing et.al.	2505.11405	link
2025-05-19	TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs	Pengju Xu et.al.	2505.11275	link
2025-05-16	A Step towards Interpretable Multimodal AI Models with MultiFIX	Mafalda Malafaia et.al.	2505.11262	null
2025-05-16	CompAlign: Improving Compositional Text-to-Image Generation with a Complex Benchmark and Fine-Grained Feedback	Yixin Wan et.al.	2505.11178	null
2025-05-16	Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans	Yansheng Qiu et.al.	2505.11141	null
2025-05-16	WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?	An-Lan Wang et.al.	2505.11015	null
2025-05-16	ToDMA: Large Model-Driven Token-Domain Multiple Access for Semantic Communications	Li Qiao et.al.	2505.10946	null
2025-05-16	VISTA: Enhancing Vision-Text Alignment in MLLMs via Cross-Modal Mutual Information Maximization	Mingxiao Li et.al.	2505.10917	null
2025-05-15	Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis	Pengfei Wang et.al.	2505.10541	link
2025-05-15	Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence	Xiang He et.al.	2505.10176	link
2025-05-15	Why 1 + 1 < 1 in Visual Token Pruning: Beyond Naive Integration via Multi-Objective Balanced Covering	Yangfu Li et.al.	2505.10118	null
2025-05-15	CartoAgent: a multimodal large language model-powered multi-agent cartographic framework for map style transfer and evaluation	Chenglong Wang et.al.	2505.09936	null
2025-05-15	UICopilot: Automating UI Synthesis via Hierarchical Code Generation from Webpage Designs	Yi Gui et.al.	2505.09904	link
2025-05-14	A Multimodal Multi-Agent Framework for Radiology Report Generation	Ziruo Yi et.al.	2505.09787	null
2025-05-14	FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models	Hongyang Wang et.al.	2505.09415	null
2025-05-14	Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping	Yinuo Wang et.al.	2505.09252	link
2025-05-14	AMSnet 2.0: A Large AMS Database with AI Segmentation for Net Detection	Yichen Shi et.al.	2505.09155	null
2025-05-13	Multimodal Fusion of Glucose Monitoring and Food Imagery for Caloric Content Prediction	Adarsh Kumar et.al.	2505.09018	null
2025-05-14	Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology	Yatai Ji et.al.	2505.08765	null
2025-05-12	Visually Interpretable Subtask Reasoning for Visual Question Answering	Yu Cheng et.al.	2505.08084	null
2025-05-12	MilChat: Introducing Chain of Thought Reasoning and GRPO to a Multimodal Small Language Model for Remote Sensing	Aybora Koksal et.al.	2505.07984	null
2025-05-12	Multimodal Assessment of Classroom Discourse Quality: A Text-Centered Attention-Based Multi-Task Learning Approach	Ruikun Hou et.al.	2505.07902	null
2025-05-12	Multimodal Survival Modeling in the Age of Foundation Models	Steven Song et.al.	2505.07683	link
2025-05-12	Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning	Xiaokun Wang et.al.	2505.07263	null
2025-05-11	DriveSOTIF: Advancing Perception SOTIF Through Multimodal Large Language Models	Shucheng Huang et.al.	2505.07084	link
2025-05-11	ParaView-MCP: An Autonomous Visualization Agent with Direct Tool Use	Shusen Liu et.al.	2505.07064	null
2025-05-11	MELLM: Exploring LLM-Powered Micro-Expression Understanding Enhanced by Subtle Motion Perception	Zhengye Zhang et.al.	2505.07007	link
2025-05-11	Visual Evolutionary Optimization on Combinatorial Problems with Multimodal Large Language Models: A Case Study of Influence Maximization	Jie Zhao et.al.	2505.06850	null
2025-05-11	Visual Instruction Tuning with Chain of Region-of-Interest	Yixin Chen et.al.	2505.06840	null
2025-05-09	Is your multimodal large language model a good science tutor?	Ming Liu et.al.	2505.06418	null
2025-05-09	NSF-MAP: Neurosymbolic Multimodal Fusion for Robust and Interpretable Anomaly Prediction in Assembly Pipelines	Chathurangi Shyalika et.al.	2505.06333	link
2025-05-09	MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills	Niladri Shekhar Dutt et.al.	2505.06176	null
2025-05-09	The Application of Deep Learning for Lymph Node Segmentation: A Systematic Review	Jingguo Qu et.al.	2505.06118	null
2025-05-09	ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding	Shuai Wang et.al.	2505.06020	null
2025-05-09	BMMDetect: A Multimodal Deep Learning Framework for Comprehensive Biomedical Misconduct Detection	Yize Zhou et.al.	2505.05763	null
2025-05-08	Fine-Tuning Video-Text Contrastive Model for Primate Behavior Retrieval from Unlabeled Raw Videos	Giulio Cesare Mastrocinque Santo et.al.	2505.05681	null
2025-05-08	Looking Beyond Language Priors: Enhancing Visual Comprehension and Attention in Multimodal Models	Aarti Ghatkesar et.al.	2505.05626	null
2025-05-08	Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding	Han Xiao et.al.	2505.05446	link
2025-05-09	EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation	Biao Yi et.al.	2505.05440	null
2025-05-08	Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization	Sooyoung Park et.al.	2505.05343	link
2025-05-08	PADriver: Towards Personalized Autonomous Driving	Genghua Kou et.al.	2505.05240	null
2025-05-08	X-Driver: Explainable Autonomous Driving with Vision-Language Models	Wei Liu et.al.	2505.05098	null
2025-05-08	Learning Item Representations Directly from Multimodal Features for Effective Recommendation	Xin Zhou et.al.	2505.04960	link
2025-05-07	EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning	Zhenghao Xing et.al.	2505.04623	link
2025-05-07	On Path to Multimodal Generalist: General-Level and General-Bench	Hao Fei et.al.	2505.04620	null
2025-05-07	M2Rec: Multi-scale Mamba for Efficient Sequential Recommendation	Qianru Zhang et.al.	2505.04445	null
2025-05-06	VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model	Zuwei Long et.al.	2505.03739	link
2025-05-06	Multi-Agent System for Comprehensive Soccer Understanding	Jiayuan Rao et.al.	2505.03735	null
2025-05-06	RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration	Huajie Tan et.al.	2505.03673	link
2025-05-06	ReGraP-LLaVA: Reasoning enabled Graph-based Personalized Large Language and Vision Assistant	Yifan Xiang et.al.	2505.03654	link
2025-05-06	LogisticsVLN: Vision-Language Navigation For Low-Altitude Terminal Delivery Based on Agentic UAVs	Xinyuan Zhang et.al.	2505.03460	null
2025-05-06	Reinforced Correlation Between Vision and Language for Precise Medical AI Assistant	Haonan Wang et.al.	2505.03380	null
2025-05-05	R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning	Yi-Fan Zhang et.al.	2505.02835	link
2025-05-06	MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation	Mingcheng Li et.al.	2505.02648	null
2025-05-05	SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning	Jinpeng Chen et.al.	2505.02486	link
2025-05-07	Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction	Inclusion AI et.al.	2505.02471	link
2025-05-05	Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection	Sungheon Jeong et.al.	2505.02393	link
2025-05-04	Retrieval-augmented in-context learning for multimodal large language models in disease classification	Zaifu Zhan et.al.	2505.02087	null
2025-05-06	RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video	Shuhang Xun et.al.	2505.02064	link
2025-05-04	R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation	Meng-Hao Guo et.al.	2505.02018	null
2025-05-04	MLLM-Enhanced Face Forgery Detection: A Vision-Language Fusion Solution	Siran Peng et.al.	2505.02013	null
2025-05-02	VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos	Zongxia Li et.al.	2505.01481	link
2025-05-02	FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors	Chenxi Li et.al.	2505.01322	null
2025-05-02	Multimodal Transformers are Hierarchical Modal-wise Heterogeneous Graphs	Yijie Jin et.al.	2505.01068	null
2025-05-02	Efficient Vocabulary-Free Fine-Grained Visual Recognition in the Age of Multimodal LLMs	Hari Chandana Kuchibhotla et.al.	2505.01064	null
2025-05-01	Multi-Modal Language Models as Text-to-Image Model Evaluators	Jiahui Chen et.al.	2505.00759	null
2025-05-01	InstructAttribute: Fine-grained Object Attributes editing with Instruction	Xingxi Yin et.al.	2505.00751	null
2025-05-01	A Methodological and Structural Review of Parkinsons Disease Detection Across Diverse Data Modalities	Abu Saleh Musa Miah et.al.	2505.00525	null
2025-05-01	Toward Automated Regulatory Decision-Making: Trustworthy Medical Device Risk Classification with Multimodal Transformers and Self-Training	Yu Han et.al.	2505.00422	null
2025-04-30	Audo-Sight: Enabling Ambient Interaction For Blind And Visually Impaired Individuals	Bhanuja Ainary et.al.	2505.00153	null
2025-04-30	GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling	Siqi Li et.al.	2505.00063	null
2025-04-30	COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning	Xindi Wu et.al.	2504.21850	null
2025-04-30	Visual Text Processing: A Comprehensive Review and Unified Evaluation	Yan Shu et.al.	2504.21682	link
2025-04-30	Rethinking Visual Layer Selection in Multimodal LLMs	Haoran Chen et.al.	2504.21447	null
2025-04-30	SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding	Chenkai Zhang et.al.	2504.21435	link
2025-04-30	Nexus-Gen: A Unified Model for Image Understanding, Generation, and Editing	Hong Zhang et.al.	2504.21356	link
2025-04-30	UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation	Linshan Wu et.al.	2504.21336	link
2025-04-30	Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models	Guanghao Zhou et.al.	2504.21277	null
2025-04-29	ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification	Ziqing Fan et.al.	2504.20930	link
2025-04-29	AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation	Jeongsoo Choi et.al.	2504.20629	null
2025-04-29	A Summary on GUI Agents with Foundation Models Enhanced by Reinforcement Learning	Jiahao Li et.al.	2504.20464	null
2025-04-29	APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech	Zhicheng Lian et.al.	2504.20447	null
2025-04-29	MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation	Amaan Izhar et.al.	2504.20343	link
2025-04-28	A Transformer-based Multimodal Fusion Model for Efficient Crowd Counting Using Visual and Wireless Signals	Zhe Cui et.al.	2504.20178	null
2025-04-28	CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback	Chenhan Jiang et.al.	2504.19860	null
2025-04-28	SRMF: A Data Augmentation and Multimodal Fusion Approach for Long-Tail UHR Satellite Image Segmentation	Yulong Guo et.al.	2504.19839	null
2025-04-28	DEEMO: De-identity Multimodal Emotion Recognition and Reasoning	Deng Li et.al.	2504.19549	null
2025-04-28	LR-IAD:Mask-Free Industrial Anomaly Detection with Logical Reasoning	Peijian Zeng et.al.	2504.19524	null
2025-04-26	Deep Learning-Based Multi-Modal Fusion for Robust Robot Perception and Navigation	Delun Lai et.al.	2504.19002	null
2025-04-26	Advancing Face-to-Face Emotion Communication: A Multimodal Dataset (AFFEC)	Meisam J. Sekiavandi et.al.	2504.18969	link
2025-04-26	Feature Fusion Revisited: Multimodal CTR Prediction for MMCTR Challenge	Junjie Zhou et.al.	2504.18961	link
2025-04-25	Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization	Kesen Zhao et.al.	2504.18397	link
2025-04-25	ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding	Yi-Xing Peng et.al.	2504.18152	null
2025-04-25	DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models	Jianyu Liu et.al.	2504.18053	link
2025-04-27	Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models	Xu Ma et.al.	2504.17789	null
2025-04-24	Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs	Tiancheng Gu et.al.	2504.17432	null
2025-04-25	TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation	Ling You et.al.	2504.17365	null
2025-04-24	V $^2$ R-Bench: Holistically Evaluating LVLM Robustness to Fundamental Visual Variations	Zhiyuan Fan et.al.	2504.16727	null
2025-04-24	Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark	Hanlei Zhang et.al.	2504.16427	link
2025-04-23	EEmo-Bench: A Benchmark for Multi-modal Large Language Models on Image Evoked Emotion Assessment	Lancheng Gao et.al.	2504.16405	null
2025-04-22	Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs	Merve Cerit et.al.	2504.16323	link
2025-04-21	Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends	Mohammad Abu Tami et.al.	2504.16134	null
2025-04-22	TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving	Daocheng Fu et.al.	2504.15780	null
2025-04-22	FaceInsight: A Multimodal Large Language Model for Face Perception	Jingzhi Li et.al.	2504.15624	null
2025-04-22	AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization	Jinda Lu et.al.	2504.15619	null
2025-04-21	IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs	David Ma et.al.	2504.15415	link
2025-04-21	Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs	Chun-Hsiao Yeh et.al.	2504.15280	link
2025-04-21	VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models	Weiye Xu et.al.	2504.15279	null
2025-04-21	A Call for New Recipes to Enhance Spatial Reasoning in MLLMs	Huanyu Zhang et.al.	2504.15037	null
2025-04-21	IoT-AMLHP: Aligned Multimodal Learning of Header-Payload Representations for Resource-Efficient Malicious IoT Traffic Classification	Fengyuan Nie et.al.	2504.14833	null
2025-04-20	Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens	Kaihang Pan et.al.	2504.14666	null
2025-04-20	Relation-R1: Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relational Comprehension	Lin Li et.al.	2504.14642	null
2025-04-20	Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction	Wenke Xia et.al.	2504.14588	link
2025-04-19	Towards Explainable Fake Image Detection with Multi-Modal Large Language Models	Yikun Ji et.al.	2504.14245	link
2025-04-19	InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners	Yuhang Liu et.al.	2504.14239	link
2025-04-18	Scaling LLaNA: Advancing NeRF-Language Understanding Through Large-Scale Training	Andrea Amaduzzi et.al.	2504.13995	null
2025-04-18	Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing	Joowon Kim et.al.	2504.13490	null
2025-04-17	SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs	Haoxuan Li et.al.	2504.13172	null
2025-04-17	Hadamard product in deep learning: Introduction, Advances and Challenges	Grigorios G Chrysos et.al.	2504.13112	null
2025-04-17	EventVAD: Training-Free Event-Aware Video Anomaly Detection	Yihua Shao et.al.	2504.13092	null
2025-04-18	SkyReels-V2: Infinite-length Film Generative Model	Guibin Chen et.al.	2504.13074	link
2025-04-17	ChatEXAONEPath: An Expert-level Multimodal Large Language Model for Histopathology Using Whole Slide Images	Sangwook Kim et.al.	2504.13023	null
2025-04-17	EarthGPT-X: Enabling MLLMs to Flexibly and Comprehensively Understand Multi-Source Remote Sensing Imagery	Wei Zhang et.al.	2504.12795	null
2025-04-17	Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration	Yicheng Pan et.al.	2504.12773	link
2025-04-17	SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding	Qianqian Sun et.al.	2504.12704	null
2025-04-17	GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning	Liangyu Xu et.al.	2504.12597	null
2025-04-16	Multimodal LLM Augmented Reasoning for Interpretable Visual Perception Analysis	Shravan Chaudhari et.al.	2504.12511	null
2025-04-16	Towards Explainable Fusion and Balanced Learning in Multimodal Sentiment Analysis	Miaosen Luo et.al.	2504.12151	null
2025-04-16	Instruction-augmented Multimodal Alignment for Image-Text and Element Matching	Xinli Yue et.al.	2504.12018	null
2025-04-16	AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection	Yuhao Chao et.al.	2504.11914	null
2025-04-16	Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation	Julia Kreutzer et.al.	2504.11829	null
2025-04-15	DeepMLF: Multimodal language model with learnable tokens for deep fusion in sentiment analysis	Efthymios Georgiou et.al.	2504.11082	null
2025-04-15	Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Human-like Audiobook Generation	Yan Rong et.al.	2504.11002	null
2025-04-14	CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates	Ankit Kumar Shaw et.al.	2504.10738	null
2025-04-14	Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization	Darryl Hannan et.al.	2504.10727	null
2025-04-14	Relation-Rich Visual Document Generator for Visual Information Extraction	Zi-Han Jiang et.al.	2504.10659	link
2025-04-15	InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models	Jinguo Zhu et.al.	2504.10479	link
2025-04-14	Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding	Tao Zhang et.al.	2504.10465	link
2025-04-14	The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer	Weixian Lei et.al.	2504.10462	link
2025-04-14	FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos	Rui Chen et.al.	2504.10358	null
2025-04-14	CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation	Junchen Fu et.al.	2504.10307	link
2025-04-14	PRM-BAS: Enhancing Multimodal Reasoning through PRM-guided Beam Annealing Search	Pengfei Hu et.al.	2504.10222	null
2025-04-14	The Future of MLLM Prompting is Adaptive: A Comprehensive Experimental Evaluation of Prompt Engineering Methods for Robust Multimodal Performance	Anwesha Mohanty et.al.	2504.10179	null
2025-04-14	COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts	Jiansheng Li et.al.	2504.10158	null
2025-04-14	CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography	I-Sheng Fang et.al.	2504.10090	null
2025-04-15	MMKB-RAG: A Multi-Modal Knowledge-Based Retrieval-Augmented Generation Framework	Zihan Ling et.al.	2504.10074	null
2025-04-11	Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images	Boyang Deng et.al.	2504.08727	null
2025-04-10	POEM: Precise Object-level Editing via MLLM control	Marco Schouten et.al.	2504.08111	null
2025-04-10	GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation	Lang Lin et.al.	2504.07962	null
2025-04-10	MM-IFEngine: Towards Multimodal Instruction Following	Shengyuan Ding et.al.	2504.07957	link
2025-04-10	Perception-R1: Pioneering Perception Policy with Reinforcement Learning	En Yu et.al.	2504.07954	link
2025-04-10	MARS: a Multimodal Alignment and Ranking System for Few-Shot Segmentation	Nico Catalano et.al.	2504.07942	null
2025-04-10	VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding	Henghao Zhao et.al.	2504.07519	null
2025-04-10	How Can Objects Help Video-Language Understanding?	Zitian Tang et.al.	2504.07454	null
2025-04-10	Routing to the Right Expertise: A Trustworthy Judge for Instruction-based Image Editing	Chenxi Sun et.al.	2504.07424	null
2025-04-10	Leveraging LLMs for Multimodal Retrieval-Augmented Radiology Report Generation via Key Phrase Extraction	Kyoyun Choi et.al.	2504.07415	null
2025-04-09	Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning	Ashutosh Chaubey et.al.	2504.07198	null
2025-04-10	VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning	Xinhao Li et.al.	2504.06958	null
2025-04-09	MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking	Chang Nie et.al.	2504.06863	null
2025-04-09	Integrating Cognitive Processing Signals into Language Models: A Review of Advances, Applications and Future Directions	Angela Lopez-Cardona et.al.	2504.06843	null
2025-04-09	Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception	Ruotian Peng et.al.	2504.06666	null
2025-04-09	Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program	Minghe Gao et.al.	2504.06606	link
2025-04-08	Mind the Gap: Evaluating Vision Systems in Small Data Applications	Samuel Stevens et.al.	2504.06486	link
2025-04-08	Transfer between Modalities with MetaQueries	Xichen Pan et.al.	2504.06256	null
2025-04-08	V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models	Xiangxi Zheng et.al.	2504.06148	link
2025-04-08	MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models	Pengfei Zhou et.al.	2504.05782	link
2025-04-08	On the Suitability of Reinforcement Fine-Tuning to Visual Tasks	Xiaxu Chen et.al.	2504.05682	null
2025-04-07	URECA: Unique Region Caption Anything	Sangbeom Lim et.al.	2504.05305	null
2025-04-07	LiveVQA: Live Visual Knowledge Seeking	Mingyang Fu et.al.	2504.05288	null
2025-04-07	Explaining Low Perception Model Competency with High-Competency Counterfactuals	Sara Pohland et.al.	2504.05254	null
2025-04-07	Towards Visual Text Grounding of Multimodal Large Language Model	Ming Li et.al.	2504.04974	null
2025-04-07	Video-Bench: Human-Aligned Video Generation Benchmark	Hui Han et.al.	2504.04907	null
2025-04-07	OrderChain: A General Prompting Paradigm to Improve Ordinal Understanding Ability of MLLM	Jinhong Wang et.al.	2504.04801	null
2025-04-07	OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance	Chaoyi Wang et.al.	2504.04781	null
2025-04-07	Enhancing Compositional Reasoning in Vision-Language Models with Synthetic Preference Data	Samarth Mishra et.al.	2504.04740	link
2025-04-07	LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts	Yimu Wang et.al.	2504.04653	null
2025-04-06	Advancing Egocentric Video Question Answering with Multimodal Large Language Models	Alkesh Patel et.al.	2504.04550	null
2025-04-04	MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models	Wulin Xie et.al.	2504.03641	null
2025-04-03	Hummus: A Dataset of Humorous Multimodal Metaphor Use	Xiaoyu Tong et.al.	2504.02983	link
2025-04-03	Enhancing Chart-to-Code Generation in Multimodal Large Language Models via Iterative Dual Preference Learning	Zhihan Zhang et.al.	2504.02906	link
2025-04-03	Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision	Xiaofeng Han et.al.	2504.02477	null
2025-04-03	The Plot Thickens: Quantitative Part-by-Part Exploration of MLLM Visualization Literacy	Matheus Valentim et.al.	2504.02217	null
2025-04-03	ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement	Runhui Huang et.al.	2504.01934	null
2025-04-02	Spatial-R1: Enhancing MLLMs in Video Spatial Reasoning	Kun Ouyang et.al.	2504.01805	link
2025-04-02	PiCo: Jailbreaking Multimodal Large Language Models via $\textbf{Pi}$ctorial $\textbf{Co}$ de Contextualization	Aofan Liu et.al.	2504.01444	null
2025-04-02	Slow-Fast Architecture for Video Multi-Modal Large Language Models	Min Shi et.al.	2504.01328	link
2025-04-01	AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction	Junhao Cheng et.al.	2504.01014	link
2025-04-01	IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval	Bangwei Liu et.al.	2504.00954	null
2025-04-02	Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning	Ram Ramrakhya et.al.	2504.00907	null
2025-04-01	Improved Visual-Spatial Reasoning via R1-Zero-Like Training	Zhenyi Liao et.al.	2504.00883	null
2025-04-01	Context-Aware Human Behavior Prediction Using Multimodal Large Language Models: Challenges and Insights	Yuchen Liu et.al.	2504.00839	null
2025-04-01	QG-VTC: Question-Guided Visual Token Compression in MLLMs for Efficient VQA	Shuai Li et.al.	2504.00654	null
2025-03-31	Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation	Shengqiong Wu et.al.	2503.24379	null
2025-03-31	Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1	Yi Chen et.al.	2503.24376	link
2025-03-31	H2VU-Benchmark: A Comprehensive Benchmark for Hierarchical Holistic Video Understanding	Qi Wu et.al.	2503.24008	null
2025-03-31	BeMERC: Behavior-Aware MLLM-based Framework for Multimodal Emotion Recognition in Conversation	Yumeng Fu et.al.	2503.23990	null
2025-03-31	Boosting MLLM Reasoning with Text-Debiased Hint-GRPO	Qihan Huang et.al.	2503.23905	null
2025-04-01	Evaluating small vision-language models as AI assistants for radio astronomical source analysis tasks	S. Riggi et.al.	2503.23859	link
2025-03-31	OrchMLLM: Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training	Yijie Zheng et.al.	2503.23830	null
2025-03-31	XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?	Fengxiang Wang et.al.	2503.23771	null
2025-03-31	STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?	Yun Li et.al.	2503.23765	null
2025-03-31	AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization	Yiyang Du et.al.	2503.23733	link
2025-03-28	Q-Insight: Understanding Image Quality via Visual Reinforcement Learning	Weiqi Li et.al.	2503.22679	link
2025-03-28	Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users	Antonia Karamolegkou et.al.	2503.22610	null
2025-03-28	NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving	Fuhao Li et.al.	2503.22436	null
2025-03-31	Agent-Centric Personalized Multiple Clustering with Multi-Modal LLMs	Ziye Chen et.al.	2503.22241	null
2025-03-28	Learning to Instruct for Visual Instruction Tuning	Zhihan Zhou et.al.	2503.22215	null
2025-03-28	DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos	Yunming Liang et.al.	2503.22208	null
2025-03-28	EgoToM: Benchmarking Theory of Mind Reasoning from Egocentric Videos	Yuxuan Li et.al.	2503.22152	link
2025-03-28	Tokenization of Gaze Data	Tim Rolff et.al.	2503.22145	null
2025-03-28	A Survey on Remote Sensing Foundation Models: From Vision to Multimodality	Ziyue Huang et.al.	2503.22081	link
2025-03-27	Video-R1: Reinforcing Video Reasoning in MLLMs	Kaituo Feng et.al.	2503.21776	link
2025-03-27	3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models	Yuhan Zhang et.al.	2503.21745	null
2025-03-27	UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning	Zhengxi Lu et.al.	2503.21620	link
2025-03-27	FusionSegReID: Advancing Person Re-Identification with Multimodal Retrieval and Precise Segmentation	Jincheng Yan et.al.	2503.21595	null
2025-03-27	FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs	Xiaoqin Wang et.al.	2503.21457	link
2025-03-27	InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression	Dongchen Lu et.al.	2503.21307	link
2025-03-26	ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and Prediction	Yiqiao Jin et.al.	2503.20978	null
2025-03-26	MATHGLANCE: Multimodal Large Language Models Do Not Know Where to Look in Mathematical Diagrams	Yanpeng Sun et.al.	2503.20745	null
2025-03-26	Vision as LoRA	Han Wang et.al.	2503.20680	link
2025-03-26	Beyond Intermediate States: Explaining Visual Redundancy through Language	Dingchen Yang et.al.	2503.20540	link
2025-03-26	Vision-Amplified Semantic Entropy for Hallucination Detection in Medical Visual Question Answering	Zehui Liao et.al.	2503.20504	null
2025-03-26	MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning	Yiwei Ma et.al.	2503.20502	null
2025-03-26	From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment	Yucheng Suo et.al.	2503.20472	null
2025-03-26	MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation	Rongyu Zhang et.al.	2503.20384	null
2025-03-26	Dynamic Pyramid Network for Efficient Multimodal Large Language Model	Hao Ai et.al.	2503.20322	null
2025-03-26	Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs	Zitian Wang et.al.	2503.20309	null
2025-03-25	LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?	Kexian Tang et.al.	2503.19990	null
2025-03-25	CoLLM: A Large Language Model for Composed Image Retrieval	Chuong Huynh et.al.	2503.19910	link
2025-03-25	Scaling Vision Pre-Training to 4K Resolution	Baifeng Shi et.al.	2503.19903	null
2025-03-25	Perception-Enhanced Multitask Multimodal Semantic Communication for UAV-Assisted Integrated Sensing and Communication System	Ziji Guo et.al.	2503.19594	null
2025-03-25	DomainCQA: Crafting Expert-Level QA from Domain-Specific Charts	Ling Zhong et.al.	2503.19498	null
2025-03-25	ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning	Jiaqi Liao et.al.	2503.19312	null
2025-03-24	MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks	Wenhao You et.al.	2503.19134	null
2025-03-24	LLaVAction: evaluating and training multi-modal large language models for action recognition	Shaokai Ye et.al.	2503.18712	link
2025-03-25	Commander-GPT: Fully Unleashing the Sarcasm Detection Capability of Multi-Modal Large Language Models	Yazhou Zhang et.al.	2503.18681	null
2025-03-24	Boosting Virtual Agent Learning and Reasoning: A Step-wise, Multi-dimensional, and Generalist Reward Model with Benchmark	Bingchen Miao et.al.	2503.18665	link
2025-03-24	Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding	Xiangrui Liu et.al.	2503.18478	null
2025-03-24	A Simple yet Effective Layout Token in Large Language Models for Document Understanding	Zhaoqing Zhu et.al.	2503.18434	null
2025-03-23	Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering	Zixin Chen et.al.	2503.18172	null
2025-03-23	MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation	Jiaxin Huang et.al.	2503.18135	null
2025-03-23	MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection	Yibo Yan et.al.	2503.18132	null
2025-03-23	Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models	Qiao Liang et.al.	2503.18034	null
2025-03-22	4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding	Wenxuan Zhu et.al.	2503.17827	link
2025-03-21	LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models	Jian Liang et.al.	2503.16843	null
2025-03-21	When Tom Eats Kimchi: Evaluating Cultural Bias of Multimodal Large Language Models in Cultural Mixture Contexts	Jun Seong Kim et.al.	2503.16826	null
2025-03-20	Distributed LLMs and Multimodal Large Language Models: A Survey on Advances, Challenges, and Future Directions	Hadi Amini et.al.	2503.16585	link
2025-03-20	OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence	Long Yuan et.al.	2503.16326	null
2025-03-20	Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data	Zijian Li et.al.	2503.16260	null
2025-03-20	CLS-RL: Image Classification with Rule-Based Reinforcement Learning	Ming Li et.al.	2503.16188	link
2025-03-20	OThink-MR1: Stimulating multimodal generalized reasoning capabilities through dynamic reinforcement learning	Zhiyuan Liu et.al.	2503.16081	null
2025-03-20	Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models	Zhihang Liu et.al.	2503.16036	link
2025-03-20	BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models	Zenghui Yuan et.al.	2503.16023	null
2025-03-20	DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering	Haochen Wang et.al.	2503.15887	null
2025-03-20	A Vision Centric Remote Sensing Benchmark	Abduljaleel Adejumo et.al.	2503.15816	null
2025-03-19	LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning	Federico Cocchi et.al.	2503.15621	link
2025-03-19	Visual Position Prompt for MLLM based Visual Grounding	Wei Tang et.al.	2503.15426	link
2025-03-19	Leveraging Perfect Multimodal Alignment and Gaussian Assumptions for Cross-modal Transfer	Abhi Kamboj et.al.	2503.15352	null
2025-03-19	LEGION: Learning to Ground and Explain for Synthetic Image Detection	Hengrui Kang et.al.	2503.15264	null
2025-03-20	Benchmarking Large Language Models for Handwritten Text Recognition	Giorgia Crosilla et.al.	2503.15195	null
2025-03-19	UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation	Qihui Zhang et.al.	2503.14941	null
2025-03-19	VisNumBench: Evaluating Number Sense of Multimodal Large Language Models	Tengjin Weng et.al.	2503.14939	null
2025-03-19	FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding	Chongjun Tu et.al.	2503.14935	null
2025-03-19	POSTA: A Go-to Framework for Customized Artistic Poster Generation	Haoyu Chen et.al.	2503.14908	null
2025-03-19	Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations	Shuo Li et.al.	2503.14895	null
2025-03-18	Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives	Sara Sarto et.al.	2503.14604	link
2025-03-18	Aligning Multimodal LLM with Human Preference: A Survey	Tao Yu et.al.	2503.14504	link
2025-03-19	Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM	Xinyu Fang et.al.	2503.14478	link
2025-03-18	VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation	Shoubin Yu et.al.	2503.14350	null
2025-03-19	DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies	Wei Song et.al.	2503.14324	link
2025-03-18	Towards Harmless Multimodal Assistants with Blind Preference Optimization	Yongqi Li et.al.	2503.14189	null
2025-03-18	Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding	Zining Wang et.al.	2503.14140	null
2025-03-18	MP-GUI: Modality Perception with MLLMs for GUI Understanding	Ziwei Wang et.al.	2503.14021	link
2025-03-18	SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability	Jiankang Wang et.al.	2503.13983	null
2025-03-18	Survey of Adversarial Robustness in Multimodal Large Language Models	Chengze Jiang et.al.	2503.13962	null
2025-03-18	Conformal Prediction and MLLM aided Uncertainty Quantification in Scene Graph Generation	Sayak Nag et.al.	2503.13947	null
2025-03-17	MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research	James Burgess et.al.	2503.13399	link
2025-03-17	Cream of the Crop: Harvesting Rich, Scalable and Transferable Multi-Modal Data for Instruction Fine-Tuning	Mengyao Lyu et.al.	2503.13383	null
2025-03-17	Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning	Hai-Long Sun et.al.	2503.13360	null
2025-03-17	3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o	Dingning Liu et.al.	2503.13185	null
2025-03-17	MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs	Erik Daxberger et.al.	2503.13111	null
2025-03-17	Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference	Hao Yin et.al.	2503.13108	link
2025-03-17	ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models	Hao Yin et.al.	2503.13107	link
2025-03-17	Mitigating Cross-Modal Distraction and Ensuring Geometric Feasibility via Affordance-Guided, Self-Consistent MLLMs for Food Preparation Task Planning	Yu-Hong Shen et.al.	2503.13055	null
2025-03-17	Efficient Motion-Aware Video MLLM	Zijia Zhao et.al.	2503.13016	null
2025-03-17	HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model	Haiyang Guo et.al.	2503.12941	null
2025-03-14	VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity	Jing Bi et.al.	2503.11557	null
2025-03-14	A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving	Tin Stribor Sohn et.al.	2503.11400	null
2025-03-14	Cornstarch: Distributed Multimodal Training Must Be Multimodality-Aware	Insu Jang et.al.	2503.11367	link
2025-03-14	Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space	Weichen Zhan et.al.	2503.11094	link
2025-03-14	EmbodiedVSR: Dynamic Scene Graph-Guided Chain-of-Thought Reasoning for Visual Spatial Tasks	Yi Zhang et.al.	2503.11089	null
2025-03-14	BannerAgency: Advertising Banner Design with Multimodal LLM Agents	Heng Wang et.al.	2503.11060	null
2025-03-14	RONA: Pragmatically Diverse Image Captioning with Coherence Relations	Aashish Anantha Ramakrishnan et.al.	2503.10997	link
2025-03-13	Learning to Inference Adaptively for Multimodal Large Language Models	Zhuoyan Xu et.al.	2503.10905	null
2025-03-13	PiSA: A Self-Augmented Data Engine and Training Strategy for 3D Understanding with Large Models	Zilu Guo et.al.	2503.10529	null
2025-03-13	Interactive Multimodal Fusion with Temporal Modeling	Jun Yu et.al.	2503.10523	null
2025-03-13	TokenCarve: Information-Preserving Visual Token Compression in Multimodal Large Language Models	Xudong Tan et.al.	2503.10501	link
2025-03-13	4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models	Wanhua Li et.al.	2503.10437	link
2025-03-13	CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance	Yufan Deng et.al.	2503.10391	null
2025-03-13	A Multimodal Fusion Model Leveraging MLP Mixer and Handcrafted Features-based Deep Learning Networks for Facial Palsy Detection	Heng Yim Nicole Oo et.al.	2503.10371	null
2025-03-13	IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification	Yuhao Wang et.al.	2503.10324	null
2025-03-13	VisualPRM: An Effective Process Reward Model for Multimodal Reasoning	Weiyun Wang et.al.	2503.10291	null
2025-03-13	LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents	Boyu Chen et.al.	2503.10200	null
2025-03-13	Hybrid Agents for Image Restoration	Bingchen Li et.al.	2503.10120	null
2025-03-13	BIMBA: Selective-Scan Compression for Long-Range Video Question Answering	Md Mohaiminul Islam et.al.	2503.09590	link
2025-03-12	Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding	Haoyu Zhang et.al.	2503.09143	null
2025-03-11	Seeing What's Not There: Spurious Correlation in Multimodal LLMs	Parsa Hosseini et.al.	2503.08884	null
2025-03-11	Language-Depth Navigated Thermal and Visible Image Fusion	Jinchang Zhang et.al.	2503.08676	null
2025-03-11	SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories	Muzhi Zhu et.al.	2503.08625	link
2025-03-11	LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization	Xianfeng Wu et.al.	2503.08619	link
2025-03-11	HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding	Shehreen Azad et.al.	2503.08585	null
2025-03-11	RAG-Adapter: A Plug-and-Play RAG-enhanced Framework for Long Video Understanding	Xichen Tan et.al.	2503.08576	null
2025-03-11	FastCache: Optimizing Multimodal LLM Serving through Lightweight KV-Cache Compression Framework	Jianian Zhu et.al.	2503.08461	null
2025-03-11	KAP: MLLM-assisted OCR Text Enhancement for Hybrid Retrieval in Chinese Non-Narrative Documents	Hsin-Ling Hsu et.al.	2503.08452	link
2025-03-11	Embodied Crowd Counting	Runling Long et.al.	2503.08367	null
2025-03-12	Attention Reallocation: Towards Zero-cost and Controllable Hallucination Mitigation of MLLMs	Chongjun Tu et.al.	2503.08342	null
2025-03-11	Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework	Zhuo Zhi et.al.	2503.08308	null
2025-03-10	Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts	Shiu-hong Kao et.al.	2503.07503	null
2025-03-10	LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition?	Bangyan Li et.al.	2503.07487	null
2025-03-10	REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding	Yan Tai et.al.	2503.07413	link
2025-03-10	ALLVB: All-in-One Long Video Understanding Benchmark	Xichen Tan et.al.	2503.07298	null
2025-03-10	A Novel Ophthalmic Benchmark for Evaluating Multimodal Large Language Models with Fundus Photographs and OCT Images	Xiaoyi Liang et.al.	2503.07094	null
2025-03-10	Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning	Jiazheng Liu et.al.	2503.07002	null
2025-03-10	Utilizing Jailbreak Probability to Attack and Safeguard Multimodal LLMs	Wenzhuo Xu et.al.	2503.06989	null
2025-03-10	Lightweight Multimodal Artificial Intelligence Framework for Maritime Multi-Scene Recognition	Xinyu Xi et.al.	2503.06978	null
2025-03-10	ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks	Yan Yang et.al.	2503.06885	null
2025-03-09	SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation	Zisheng Chen et.al.	2503.06764	link
2025-03-11	Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models	Wenxuan Huang et.al.	2503.06749	link
2025-03-07	Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information	Junbo Zhao et.al.	2503.05543	null
2025-03-07	Can Large Language Models Grasp Concepts in Visual Content? A Case Study on YouTube Shorts about Depression	Jiaying "Lizzy" Liu et.al.	2503.05109	null
2025-03-06	FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement	Ian Huang et.al.	2503.04919	null
2025-03-06	Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model	Wenke Huang et.al.	2503.04543	link
2025-03-06	Knowledge-Decoupled Synergetic Learning: An MLLM based Collaborative Approach to Few-shot Multimodal Dialogue Intention Recognition	Bin Chen et.al.	2503.04201	null
2025-03-06	MASTER: Multimodal Segmentation with Text Prompts	Fuyang Liu et.al.	2503.04199	null
2025-03-06	Biological Sequence with Language Model Prompting: A Survey	Jiyue Jiang et.al.	2503.04135	null
2025-03-07	Chart-HQA: A Benchmark for Hypothetical Question Answering in Charts	Xiangnan Chen et.al.	2503.04095	null
2025-03-06	RetinalGPT: A Retinal Clinical Preference Conversational Assistant Powered by Large Vision-Language Models	Wenhui Zhu et.al.	2503.03987	null
2025-03-05	DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance	Zhao Yang et.al.	2503.03689	link
2025-03-05	BEVMOSNet: Multimodal Fusion for BEV Moving Object Segmentation	Hiep Truong Cong et.al.	2503.03280	null
2025-03-05	COSINT-Agent: A Knowledge-Driven Multimodal Agent for Chinese Open Source Intelligence	Wentao Li et.al.	2503.03215	null
2025-03-05	Variance-Aware Loss Scheduling for Multimodal Alignment in Low-Data Settings	Sneh Pillai et.al.	2503.03202	null
2025-03-04	Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs	Wei-Yao Wang et.al.	2503.02597	link
2025-03-05	MCiteBench: A Benchmark for Multimodal Citation Text Generation in MLLMs	Caiyu Hu et.al.	2503.02589	link
2025-03-04	A Token-level Text Image Foundation Model for Document Understanding	Tongkun Guan et.al.	2503.02304	null
2025-03-03	Distilled Prompt Learning for Incomplete Multimodal Survival Prediction	Yingxue Xu et.al.	2503.01653	null
2025-03-03	RemiHaven: Integrating "In-Town" and "Out-of-Town" Peers to Provide Personalized Reminiscence Support for Older Drifters	Xuechen Zhang et.al.	2503.01358	null
2025-03-04	UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface	Hao Tang et.al.	2503.01342	link
2025-03-03	Retrieval-Augmented Perception: High-Resolution Image Perception Meets Visual RAG	Wenbin Wang et.al.	2503.01222	link
2025-03-03	Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models	Tianjie Ju et.al.	2503.01208	link
2025-03-03	Scientific Reasoning: Assessment of Multimodal Generative LLMs	Florian Dreyer et.al.	2503.01064	null
2025-03-02	LLM-Fusion: A Novel Multimodal Fusion Model for Accelerated Material Discovery	Onur Boyar et.al.	2503.01022	null
2025-02-28	Adaptive Keyframe Sampling for Long Video Understanding	Xi Tang et.al.	2502.21271	null
2025-02-28	RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete	Yuheng Ji et.al.	2502.21257	null
2025-02-28	Fine-Grained Retrieval-Augmented Generation for Visual Question Answering	Zhengxuan Zhang et.al.	2502.20964	null
2025-02-28	HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models	Xiao Wang et.al.	2502.20811	null
2025-03-03	MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts	Peijie Wang et.al.	2502.20808	null
2025-02-28	Towards General Visual-Linguistic Face Forgery Detection(V2)	Ke Sun et.al.	2502.20698	link
2025-02-27	Visual Reasoning at Urban Intersections: FineTuning GPT-4o for Traffic Conflict Detection	Sari Masri et.al.	2502.20573	null
2025-02-27	Protecting multimodal large language models against misleading visualizations	Jonathan Tonglet et.al.	2502.20503	link
2025-02-27	VideoA11y: Method and Dataset for Accessible Video Description	Chaoyu Li et.al.	2502.20480	null
2025-02-27	Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription	Benjamin Gutteridge et.al.	2502.20295	link
2025-02-27	Mixture of Experts for Recognizing Depression from Interview and Reading Tasks	Loukas Ilias et.al.	2502.20213	null
2025-02-27	New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration	Xuzheng Yang et.al.	2502.20104	null
2025-02-27	AsymLoRA: Harmonizing Data Conflicts and Commonalities in MLLMs	Xuyang Wei et.al.	2502.20035	link
2025-02-27	Joint Fusion and Encoding: Advancing Multimodal Retrieval from the Ground Up	Lang Huang et.al.	2502.20008	null
2025-02-27	Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents	Zhenyu Liu et.al.	2502.19917	link
2025-02-27	Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy	Zaijing Li et.al.	2502.19902	null
2025-02-27	Towards Multimodal Large-Language Models for Parent-Child Interaction: A Focus on Joint Attention	Weiyan Shi et.al.	2502.19877	null
2025-02-27	One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion	Chunyang Cheng et.al.	2502.19854	link
2025-02-27	Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack	Chenhe Gu et.al.	2502.19672	null
2025-02-26	ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models	Danae Sánchez Villegas et.al.	2502.19409	null
2025-02-26	M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance	Qingpei Guo et.al.	2502.18778	null
2025-02-25	OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference	Xiangyu Zhao et.al.	2502.18411	link
2025-02-25	ChatMotion: A Multimodal Multi-Agent for Human Motion Analysis	Li Lei et.al.	2502.18180	null
2025-02-25	VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion	Pei Liu et.al.	2502.18042	null
2025-02-25	MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks	Hyeonjeong Ha et.al.	2502.17832	link
2025-02-25	Can Multimodal LLMs Perform Time Series Anomaly Detection?	Xiongxiao Xu et.al.	2502.17812	link
2025-02-24	MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference	Zhongwei Wan et.al.	2502.17599	link
2025-02-24	PosterSum: A Multimodal Benchmark for Scientific Poster Summarization	Rohit Saxena et.al.	2502.17540	link
2025-02-24	Introducing Visual Perception Token into Multimodal Large Language Model	Runpeng Yu et.al.	2502.17425	link
2025-02-24	MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs	Jiarui Zhang et.al.	2502.17422	link
2025-02-24	HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization	Zhenghao Liu et.al.	2502.17315	link
2025-02-24	Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts	Zhenghao Liu et.al.	2502.17297	link
2025-02-24	Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence	Wenzhe Yin et.al.	2502.17028	null
2025-02-24	Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs	Himanshu Beniwal et.al.	2502.16901	link
2025-02-24	SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding	Liangtao Shi et.al.	2502.16786	link
2025-02-23	AeroReformer: Aerial Referring Transformer for UAV-based Referring Image Segmentation	Rui Li et.al.	2502.16680	link
2025-02-23	Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries	Yin Wu et.al.	2502.16636	link
2025-02-23	Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review	Pei Fu et.al.	2502.16586	null
2025-02-21	Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models	Anirudh Sundar et.al.	2502.15639	null
2025-02-21	Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs	Gengyuan Zhang et.al.	2502.15457	null
2025-02-21	Research advances on fish feeding behavior recognition and intensity quantification methods in aquaculture	Shulong Zhang et.al.	2502.15311	null
2025-02-21	M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment	Chuan Cui et.al.	2502.15167	link
2025-02-20	Reducing Hallucinations of Medical Multimodal Large Language Models with Visual Retrieval-Augmented Generation	Yun-Wei Chu et.al.	2502.15040	null
2025-02-20	Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework	Yuming Yang et.al.	2502.14864	link
2025-02-20	Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension	Amir Hossein Yari et.al.	2502.14315	null
2025-02-20	Vulnerability of Text-to-Image Models to Prompt Template Stealing: A Differential Evolution Approach	Yurong Wu et.al.	2502.14285	null
2025-02-21	PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC	Haowei Liu et.al.	2502.14282	null
2025-02-19	ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities	Chanjin Zheng et.al.	2502.13832	link
2025-02-19	From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education	Yi-Fan Zhang et.al.	2502.13789	null
2025-02-18	Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation	Bencheng Liao et.al.	2502.13145	link
2025-02-18	SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models	Xianfu Cheng et.al.	2502.13059	null
2025-02-18	AEIA-MN: Evaluating the Robustness of Multimodal LLM-Powered Mobile Agents Against Active Environmental Injection Attacks	Yurun Chen et.al.	2502.13053	null
2025-02-18	Towards Text-Image Interleaved Retrieval	Xin Zhang et.al.	2502.12799	link
2025-02-18	Corrupted but Not Broken: Rethinking the Impact of Corrupted Data in Visual Instruction Tuning	Yunhao Gou et.al.	2502.12635	null
2025-02-18	SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings	Weikai Lu et.al.	2502.12562	link
2025-02-18	MomentSeeker: A Comprehensive Benchmark and A Strong Baseline For Moment Retrieval Within Long Videos	Huaying Yuan et.al.	2502.12558	null
2025-02-18	SAFEERASER: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning	Junkai Chen et.al.	2502.12520	null
2025-02-17	HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation	Ling Yang et.al.	2502.12148	link
2025-02-17	PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection	Jinhe Bi et.al.	2502.12119	null
2025-02-17	Token Communications: A Unified Framework for Cross-modal Context-aware Semantic Communications	Li Qiao et.al.	2502.12096	null
2025-02-17	Unhackable Temporal Rewarding for Scalable Video MLLMs	En Yu et.al.	2502.12081	null
2025-02-17	GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs	Yi Fang et.al.	2502.11925	null
2025-02-17	EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models	Jiamin Su et.al.	2502.11916	link
2025-02-17	MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation	Haochen Xue et.al.	2502.11903	null
2025-02-17	Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities	Hanbin Wang et.al.	2502.11829	link
2025-02-17	Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning	Yuqi Pang et.al.	2502.11751	link
2025-02-17	Mitigating Visual Knowledge Forgetting in MLLM Instruction-tuning via Modality-decoupled Gradient Descent	Junda Wu et.al.	2502.11740	null
2025-02-14	MM-RLHF: The Next Step Forward in Multimodal LLM Alignment	Yi-Fan Zhang et.al.	2502.10391	null
2025-02-14	AutoS $^2$ earch: Unlocking the Reasoning Potential of Large Models for Web-based Source Search	Zhengqiu Zhu et.al.	2502.09913	null
2025-02-13	EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents	Rui Yang et.al.	2502.09560	null
2025-02-13	A Benchmark for Crime Surveillance Video Analysis with Large Models	Haoran Chen et.al.	2502.09325	null
2025-02-13	From Visuals to Vocabulary: Establishing Equivalence Between Image and Text Token Through Autoregressive Pre-training in MLLMs	Mingxiao Li et.al.	2502.09093	null
2025-02-12	FixDrive: Automatically Repairing Autonomous Vehicle Driving Behaviour for $0.08 per Violation	Yang Sun et.al.	2502.08260	link
2025-02-12	Learning Human Skill Generators at Key-Step Levels	Yilu Wu et.al.	2502.08234	null
2025-02-13	Universal Adversarial Attack on Aligned Multimodal LLMs	Temurbek Rahmatullaev et.al.	2502.07987	null
2025-02-11	DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities	Chashi Mahiul Islam et.al.	2502.07905	null
2025-02-11	Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models	Jiacong Xu et.al.	2502.07601	null
2025-02-11	MLLM4PUE: Toward Universal Embeddings in Computational Pathology through Multimodal LLMs	Qifeng Zhou et.al.	2502.07221	null
2025-02-11	Early Risk Prediction of Pediatric Cardiac Arrest from Electronic Health Records via Multimodal Fused Transformer	Jiaying Lu et.al.	2502.07158	null
2025-02-09	AI-Driven HSI: Multimodality, Fusion, Challenges, and the Deep Learning Revolution	David S. Bhatti et.al.	2502.06894	null
2025-02-11	CoS: Chain-of-Shot Prompting for Long Video Understanding	Jian Hu et.al.	2502.06428	null
2025-02-07	Survey on AI-Generated Media Detection: From Non-MLLM to MLLM	Yueying Zou et.al.	2502.05240	null
2025-02-07	Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray	Yunhang Shen et.al.	2502.05177	link
2025-02-07	Multitwine: Multi-Object Compositing with Text and Layout Control	Gemma Canet Tarrés et.al.	2502.05165	null
2025-02-07	Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs	Rohit Saxena et.al.	2502.05092	null
2025-02-07	Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark	Han Zhang et.al.	2502.04976	null
2025-02-07	Cached Multi-Lora Composition for Multi-Concept Image Generation	Xiandong Zou et.al.	2502.04923	link
2025-02-07	MedMimic: Physician-Inspired Multimodal Fusion for Early Diagnosis of Fever of Unknown Origin	Minrui Chen et.al.	2502.04794	null
2025-02-06	EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models	He Hu et.al.	2502.04424	null
2025-02-05	PerPO: Perceptual Preference Optimization via Discriminative Rewarding	Zining Zhu et.al.	2502.04371	link
2025-02-06	PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models?	Mennatullah Siam et.al.	2502.04192	link
2025-02-06	MRAMG-Bench: A BeyondText Benchmark for Multimodal Retrieval-Augmented Multimodal Generation	Qinhan Yu et.al.	2502.04176	link
2025-02-05	Large Language Models Are Universal Recommendation Learners	Junguang Jiang et.al.	2502.03041	null
2025-02-05	Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning	Yibo Yan et.al.	2502.02871	null
2025-02-04	SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency	Qianhao Yuan et.al.	2502.02458	link
2025-02-04	Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment	Yaling Shen et.al.	2502.02438	null
2025-02-06	LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models	Tzu-Tao Chang et.al.	2502.02406	null
2025-02-04	Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking	Jinyang Wu et.al.	2502.02339	null
2025-02-04	Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration	Younan Zhu et.al.	2502.01969	null
2025-02-04	MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving	Shiju Zhao et.al.	2502.01960	null
2025-02-04	DAMO: Data- and Model-aware Alignment of Multi-modal LLMs	Jinda Lu et.al.	2502.01943	link
2025-02-03	Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models	Hashmat Shadab Malik et.al.	2502.01576	link
2025-02-03	Position: Empowering Time Series Reasoning with Multimodal LLMs	Yaxuan Kong et.al.	2502.01477	null
2025-02-03	Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models	Mingi Jung et.al.	2502.01419	null
2025-01-31	Efficient Reasoning with Hidden Thinking	Xuan Shen et.al.	2501.19201	link
2025-01-31	Beyond Token Compression: A Training-Free Reduction Framework for Efficient Visual Processing in MLLMs	Hongliang Li et.al.	2501.19036	null
2025-01-31	Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation	Bin Zhu et.al.	2501.19017	null
2025-01-30	BounTCHA: A CAPTCHA Utilizing Boundary Identification in AI-extended Videos	Lehao Lin et.al.	2501.18565	null
2025-01-29	Generative AI for Vision: A Comprehensive Study of Frameworks and Applications	Fouad Bousetouane et.al.	2501.18033	null
2025-01-29	Topological Signatures of Adversaries in Multimodal Alignments	Minh Vu et.al.	2501.18006	null
2025-01-30	Leveraging Multimodal LLM for Inspirational User Interface Search	Seokhyeon Park et.al.	2501.17799	link
2025-01-29	Learning Free Token Reduction for Multi-Modal LLM	Zihui Zhao et.al.	2501.17391	null
2025-01-31	Multimodal Magic Elevating Depression Detection with a Fusion of Text and Audio Intelligence	Lindy Gan et.al.	2501.16813	null
2025-01-28	Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding	Yun Li et.al.	2501.16786	null
2025-01-28	MME-Industry: A Cross-Industry Multimodal Evaluation Benchmark	Dongyi Yi et.al.	2501.16688	null
2025-01-28	CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs	Jinlan Fu et.al.	2501.16629	link
2025-01-27	AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models	Zheng Lian et.al.	2501.16566	link
2025-01-27	LUCY: Linguistic Understanding and Control Yielding Early Stage of Her	Heting Gao et.al.	2501.16327	link
2025-01-27	FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers	Renshan Zhang et.al.	2501.16297	null
2025-01-27	Brain-Adapter: Enhancing Neurological Disorder Analysis with Adapter-Tuning Multimodal Large Language Models	Jing Zhang et.al.	2501.16282	null
2025-01-27	Can Multimodal Large Language Models be Guided to Improve Industrial Anomaly Detection?	Zhiling Chen et.al.	2501.15795	null
2025-01-27	Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning	Michael Xieyang Liu et.al.	2501.15727	null
2025-01-26	Ocean-OCR: Towards General OCR Application via a Vision-Language Model	Song Chen et.al.	2501.15558	link
2025-01-26	Unveiling the Potential of Multimodal Retrieval Augmented Generation with Planning	Xiaohan Yu et.al.	2501.15470	null
2025-01-26	Zero-Shot Interactive Text-to-Image Retrieval via Diffusion-Augmented Representations	Zijun Long et.al.	2501.15379	null
2025-01-26	Baichuan-Omni-1.5 Technical Report	Yadong Li et.al.	2501.15368	link
2025-01-25	Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink	Yining Wang et.al.	2501.15269	null
2025-01-23	Pilot: Building the Federated Multimodal Instruction Tuning Framework	Baochen Xiong et.al.	2501.13985	null
2025-01-23	GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration	Yue Fan et.al.	2501.13896	null
2025-01-23	EventVL: Understand Event Streams via Multimodal Large Language Model	Pengteng Li et.al.	2501.13707	null
2025-01-23	LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models	Yizheng Sun et.al.	2501.13652	null
2025-01-23	ReasVQA: Advancing VideoQA with Imperfect Reasoning Process	Jianxin Liang et.al.	2501.13536	null
2025-01-23	50 Shades of Deceptive Patterns: A Unified Taxonomy, Multimodal Detection, and Security Implications	Zewei Shi et.al.	2501.13351	link
2025-01-24	Multi-aspect Knowledge Distillation with Large Language Model	Taegyeong Lee et.al.	2501.13341	link
2025-01-22	Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning	Bohao Yang et.al.	2501.13042	link
2025-01-22	InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling	Yi Wang et.al.	2501.12386	link
2025-01-21	VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model	Xianwei Zhuang et.al.	2501.12327	link
2025-01-21	Bridging Visualization and Optimization: Multimodal Large Language Models on Graph-Structured Combinatorial Optimization	Jie Zhao et.al.	2501.11968	null
2025-01-21	EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents	Zhili Cheng et.al.	2501.11858	link
2025-01-20	Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution	Zhiyuan You et.al.	2501.11561	null
2025-01-20	EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery	Guankun Wang et.al.	2501.11347	link
2025-01-20	ITCFN: Incomplete Triple-Modal Co-Attention Fusion Network for Mild Cognitive Impairment Conversion Prediction	Xiangyang Hu et.al.	2501.11276	link
2025-01-20	A Survey of World Models for Autonomous Driving	Tuo Feng et.al.	2501.11260	link
2025-01-19	Rethinking Early-Fusion Strategies for Improved Multimodal Image Segmentation	Zhengwen Shen et.al.	2501.10958	null
2025-01-18	Visual RAG: Expanding MLLM visual knowledge without fine-tuning	Mirco Bonomo et.al.	2501.10834	null
2025-01-17	FaceXBench: Evaluating Multimodal LLMs on Face Understanding	Kartik Narayan et.al.	2501.10360	link
2025-01-16	A Simple Aerial Detection Baseline of Multimodal Language Models	Qingyun Li et.al.	2501.09720	link
2025-01-16	Omni-Emotion: Extending Video MLLM with Detailed Face and Audio Modeling for Multimodal Emotion Analysis	Qize Yang et.al.	2501.09502	null
2025-01-16	Interpretable Droplet Digital PCR Assay for Trustworthy Molecular Diagnostics	Yuanyuan Wei et.al.	2501.09218	null
2025-01-15	Multimodal LLMs Can Reason about Aesthetics in Zero-Shot	Ruixiang Jiang et.al.	2501.09012	link
2025-01-15	The Devil is in Temporal Token: High Quality Video Reasoning Segmentation	Sitong Gong et.al.	2501.08549	link
2025-01-14	LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding	Hongyu Li et.al.	2501.08282	link
2025-01-14	Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness	Jiaxing Zhao et.al.	2501.07978	link
2025-01-14	Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models	Yifang Xu et.al.	2501.07972	null
2025-01-14	3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding	Haomiao Xiong et.al.	2501.07819	link
2025-01-13	Imagine while Reasoning in Space: Multimodal Visualization-of-Thought	Chengzu Li et.al.	2501.07542	null
2025-01-13	Aligning First, Then Fusing: A Novel Weakly Supervised Multimodal Violence Detection Method	Wenping Jin et.al.	2501.07496	link
2025-01-13	Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation	Han Liu et.al.	2501.07110	link
2025-01-13	LEO: Boosting Mixture of Vision Encoders for Multimodal Large Language Models	Mozhgan Nasr Azadani et.al.	2501.06986	link
2025-01-12	X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding	Wenqi Zhou et.al.	2501.06835	null
2025-01-12	GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing	Ruizhe Ou et.al.	2501.06828	null
2025-01-12	MTPareto: A MultiModal Targeted Pareto Framework for Fake News Detection	Kaiying Yan et.al.	2501.06764	null
2025-01-12	Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints	Ming Dai et.al.	2501.06710	link
2025-01-11	ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation	Xuanle Zhao et.al.	2501.06598	link
2025-01-11	Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs	Shan Zhang et.al.	2501.06430	link
2025-01-10	PEACE: Empowering Geologic Map Holistic Understanding with MLLMs	Yangyu Huang et.al.	2501.06184	null
2025-01-10	Text-to-Edit: Controllable End-to-End Video Ad Creation via Multimodal LLMs	Dabing Cheng et.al.	2501.05884	null
2025-01-10	Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models	You Li et.al.	2501.05767	null
2025-01-10	TB-Bench: Training and Testing Multi-Modal AI for Understanding Spatio-Temporal Traffic Behaviors from Dashcam Images/Videos	Korawat Charoenpitaks et.al.	2501.05733	link
2025-01-09	MECASA: Motor Execution Classification using Additive Self-Attention for Hybrid EEG-fNIRS Data	Gourav Siddhad et.al.	2501.05525	null
2025-01-09	Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark	Yunzhuo Hao et.al.	2501.05444	link
2025-01-09	Compression with Global Guidance: Towards Training-free High-Resolution MLLMs Acceleration	Xuyang Liu et.al.	2501.05179	link
2025-01-09	Optimizing Multitask Industrial Processes with Predictive Action Guidance	Naval Kishore Mehta et.al.	2501.05108	null
2025-01-09	DriVLM: Domain Adaptation of Vision-Language Models in Autonomous Driving	Xuran Zheng et.al.	2501.05081	null
2025-01-09	Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency	Shiji Zhao et.al.	2501.04931	null
2025-01-08	Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs	Yikang Zhou et.al.	2501.04670	link
2025-01-08	InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection	Yuhang Liu et.al.	2501.04575	link
2025-01-08	Evidence-based multimodal fusion on structured EHRs and free-text notes for ICU outcome prediction	Yucheng Ruan et.al.	2501.04389	link
2025-01-08	Multimodal Graph Constrastive Learning and Prompt for ChartQA	Yue Dai et.al.	2501.04303	null
2025-01-08	H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving	Siran Chen et.al.	2501.04302	null
2025-01-07	RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance	Matin Mortaheb et.al.	2501.03995	null
2025-01-06	Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches	Alhassan Mumuni et.al.	2501.03151	null
2025-01-07	Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild	Wanpeng Hu et.al.	2501.02964	link
2025-01-06	A Novel Vision Transformer for Camera-LiDAR Fusion based Traffic Object Segmentation	Toomas Tahves et.al.	2501.02858	null
2025-01-06	Ultrasound-QBench: Can LLMs Aid in Quality Assessment of Ultrasound Imaging?	Hongyi Miao et.al.	2501.02751	null
2025-01-05	FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance	Haicheng Wang et.al.	2501.02430	link
2025-01-04	What Kind of Visual Tokens Do We Need? Training-free Visual Token Pruning for Multi-modal Large Language Models from the Perspective of Graph	Yutao Jiang et.al.	2501.02268	link
2025-01-03	AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs	Sanjoy Chowdhury et.al.	2501.02135	null
2025-01-03	VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction	Chaoyou Fu et.al.	2501.01957	link
2025-01-03	Virgo: A Preliminary Exploration on Reproducing o1-like MLLM	Yifan Du et.al.	2501.01904	link
2025-01-03	Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models	Guosheng Zhang et.al.	2501.01720	null
2025-01-02	Face-Human-Bench: A Comprehensive Benchmark of Face and Human Understanding for Multi-modal Assistants	Lixiong Qin et.al.	2501.01243	null
2025-01-02	Towards Interactive Deepfake Analysis	Lixiong Qin et.al.	2501.01164	link
2025-01-02	EliGen: Entity-Level Controlled Image Generation with Regional Attention	Hong Zhang et.al.	2501.01097	link
2025-01-02	Image-based Multimodal Models as Intruders: Transferable Multimodal Attacks on Video-based MLLMs	Linhao Huang et.al.	2501.01042	null
2025-01-01	Decoding the Flow: CauseMotion for Emotional Causality Analysis in Long-form Conversations	Yuxuan Zhang et.al.	2501.00778	null
2024-12-31	Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method	Zhenpeng Huang et.al.	2501.00584	null
2024-12-31	VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling	Xinhao Li et.al.	2501.00574	link
2024-12-31	Fine-grained Video-Text Retrieval: A New Benchmark and Method	Yifan Xu et.al.	2501.00513	null
2024-12-31	Exploring the Implicit Semantic Ability of Multimodal Large Language Models: A Pilot Study on Entity Set Expansion	Hebin Wang et.al.	2501.00330	null
2024-12-31	MLLM-as-a-Judge for Image Safety without Human Labeling	Zhenting Wang et.al.	2501.00192	null
2024-12-30	GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models	Shangyu Xing et.al.	2412.21036	null
2024-12-30	Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering	Junxiao Xue et.al.	2412.20927	null
2024-12-28	ST $^3$ : Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming	Jiedong Zhuang et.al.	2412.20105	null
2024-12-28	On the Compositional Generalization of Multimodal LLMs for Medical Imaging	Zhenyang Cai et.al.	2412.20070	link
2024-12-27	Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework	Jiang Liu et.al.	2412.19684	null
2024-12-27	CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs	Siyu Wang et.al.	2412.19663	null
2024-12-27	MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios	Jiaqi Fan et.al.	2412.19406	link
2024-12-26	Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment	Ziang Yan et.al.	2412.19326	link
2024-12-26	Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries	Roberto Amoroso et.al.	2412.19304	null
2024-12-26	SeaMo: A Multi-Seasonal and Multimodal Remote Sensing Foundation Model	Xuyang Li et.al.	2412.19237	null
2024-12-25	MedHallBench: A New Benchmark for Assessing Hallucination in Medical Large Language Models	Kaiwen Zuo et.al.	2412.18947	null
2024-12-25	RapGuard: Safeguarding Multimodal Large Language Models via Rationale-aware Defensive Prompting	Yilei Jiang et.al.	2412.18826	null
2024-12-24	Video Is Worth a Thousand Images: Exploring the Latest Trends in Long Video Generation	Faraz Waseem et.al.	2412.18688	null
2024-12-24	MixMAS: A Framework for Sampling-Based Mixer Architecture Search for Multimodal Fusion and Learning	Abdelmadjid Chergui et.al.	2412.18437	link
2024-12-24	Muse: A Multimodal Conversational Recommendation Dataset with Scenario-Grounded User Profiles	Zihan Wang et.al.	2412.18416	null
2024-12-24	Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search	Huanjin Yao et.al.	2412.18319	link
2024-12-24	ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation	Mengyang Wu et.al.	2412.18216	link
2024-12-24	Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation	Yucong Luo et.al.	2412.18176	null
2024-12-24	VisionLLM-based Multimodal Fusion Network for Glottic Carcinoma Early Detection	Zhaohui Jin et.al.	2412.18124	null
2024-12-24	Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach	Jing Bi et.al.	2412.18108	null
2024-12-24	An Ensemble Approach to Short-form Video Quality Assessment Using Multimodal LLM	Wen Wen et.al.	2412.18060	null
2024-12-23	A Multimodal Fusion Framework for Bridge Defect Detection with Cross-Verification	Ravi Datta Rachuri et.al.	2412.17968	null
2024-12-23	Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy	Priyaranjan Pattnayak et.al.	2412.17759	null
2024-12-23	HumanVBench: Exploring Human-Centric Video Understanding Capabilities of MLLMs with Synthetic Benchmark Data	Ting Zhou et.al.	2412.17574	link
2024-12-23	Multimodal Preference Data Synthetic Alignment with Reward Model	Robert Wijaya et.al.	2412.17417	link
2024-12-23	MineAgent: Towards Remote-Sensing Mineral Exploration with Multimodal Large Language Models	Beibei Yu et.al.	2412.17339	null
2024-12-23	Neural-MCRL: Neural Multimodal Contrastive Representation Learning for EEG-based Visual Decoding	Yueyang Li et.al.	2412.17337	link
2024-12-23	Revisiting Multimodal Fusion for 3D Anomaly Detection from an Architectural Perspective	Kaifang Long et.al.	2412.17297	null
2024-12-22	SubstationAI: Multimodal Large Model-Based Approaches for Analyzing Substation Equipment Faults	Jinzhi Wang et.al.	2412.17077	null
2024-12-22	CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language Models	Yeyuan Wang et.al.	2412.16869	link
2024-12-22	GME: Improving Universal Multimodal Retrieval by Multimodal LLMs	Xin Zhang et.al.	2412.16855	null
2024-12-21	AlzheimerRAG: Multimodal Retrieval Augmented Generation for PubMed articles	Aritra Kumar Lahiri et.al.	2412.16701	null
2024-12-20	MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and Detection	Andrea Moglia et.al.	2412.15925	link
2024-12-20	Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution	Wentao Tan et.al.	2412.15650	link
2024-12-20	Technical Report for ICML 2024 TiFA Workshop MLLM Attack Challenge: Suffix Injection and Projected Gradient Descent Can Easily Fool An MLLM	Yangyang Guo et.al.	2412.15614	null
2024-12-20	QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning	Xinyang Tong et.al.	2412.15576	null
2024-12-20	Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage	Saehyung Lee et.al.	2412.15484	null
2024-12-19	MRWeb: An Exploration of Generating Multi-Page Resource-Aware Web Code from UI Designs	Yuxuan Wan et.al.	2412.15310	link
2024-12-19	OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving	Shuo Xing et.al.	2412.15208	link
2024-12-19	Progressive Multimodal Reasoning via Active Retrieval	Guanting Dong et.al.	2412.14835	null
2024-12-19	Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models	Zijun Chen et.al.	2412.14660	link
2024-12-18	Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces	Jihan Yang et.al.	2412.14171	link
2024-12-18	InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models	Cong Wei et.al.	2412.14006	link
2024-12-18	LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer	Yipeng Zhang et.al.	2412.13871	link
2024-12-17	Modality-Inconsistent Continual Learning of Multimodal Large Language Models	Weiguo Pian et.al.	2412.13050	null
2024-12-17	ComprehendEdit: A Comprehensive Dataset and Evaluation Framework for Multimodal Knowledge Editing	Yaohui Ma et.al.	2412.12821	link
2024-12-17	PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model	Yuqing Wang et.al.	2412.12737	link
2024-12-17	ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding	Zhenxing Zhang et.al.	2412.12718	link
2024-12-17	Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation	Andong Chen et.al.	2412.12627	null
2024-12-17	FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning	Seunghee Kim et.al.	2412.12567	null
2024-12-17	Beyond Data Quantity: Key Factors Driving Performance in Multilingual Language Models	Sina Bagheri Nezhad et.al.	2412.12500	link
2024-12-16	Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering	Jinhe Bi et.al.	2412.12359	link
2024-12-16	Instruction-based Image Manipulation by Watching How Things Move	Mingdeng Cao et.al.	2412.12087	null
2024-12-16	CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding	Guo Chen et.al.	2412.12075	null
2024-12-16	Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning	Yuti Liu et.al.	2412.11952	null
2024-12-16	A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges	Yibo Yan et.al.	2412.11936	null
2024-12-16	PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension	Kun Ouyang et.al.	2412.11906	null
2024-12-16	GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training	Renqiu Xia et.al.	2412.11863	link
2024-12-16	IDEA-Bench: How Far are Generative Models from Professional Designing?	Chen Liang et.al.	2412.11767	link
2024-12-16	From Specific-MLLM to Omni-MLLM: A Survey about the MLLMs alligned with Multi-Modality	Shixin Jiang et.al.	2412.11694	null
2024-12-16	ACE- $M^3$ : Automatic Capability Evaluator for Multimodal Medical Models	Xiechi Zhang et.al.	2412.11453	null
2024-12-15	Drawing the Line: Enhancing Trustworthiness of MLLMs Through the Power of Refusal	Yuhao Wang et.al.	2412.11196	null
2024-12-13	Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining	Zhiqi Ge et.al.	2412.10342	null
2024-12-13	BrushEdit: All-In-One Image Inpainting and Editing	Yaowei Li et.al.	2412.10316	null
2024-12-13	Leveraging Multimodal Methods and Spontaneous Speech for Alzheimer's Disease Identification	Yifan Gao et.al.	2412.09928	null
2024-12-12	ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation	Ali Athar et.al.	2412.09754	null
2024-12-12	EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM	Zhuofan Zong et.al.	2412.09618	null
2024-12-13	Olympus: A Universal Task Router for Computer Vision Tasks	Yuanze Lin et.al.	2412.09612	link
2024-12-12	SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding	Hao Li et.al.	2412.09604	null
2024-12-12	Do Multimodal Large Language Models See Like Humans?	Jiaying Lin et.al.	2412.09603	null
2024-12-12	InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions	Pan Zhang et.al.	2412.09596	link
2024-12-12	OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation	Jitesh Jain et.al.	2412.09585	link
2024-12-12	Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition	Zhisheng Zhong et.al.	2412.09501	link
2024-12-12	Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation	Baisen Wang et.al.	2412.09428	link
2024-12-12	Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine	Xiaoshuang Huang et.al.	2412.09278	link
2024-12-11	LLaVA-Zip: Adaptive Visual Token Compression with Intrinsic Image Information	Ke Wang et.al.	2412.08771	null
2024-12-11	From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons	Andrew Szot et.al.	2412.08442	null
2024-12-11	HyViLM: Enhancing Fine-Grained Recognition with a Hybrid Encoder for Vision-Language Models	Shiding Zhu et.al.	2412.08378	null
2024-12-11	M2SE: A Multistage Multitask Instruction Tuning Strategy for Unified Sentiment and Emotion Analysis	Ao Li et.al.	2412.08049	link
2024-12-10	DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation	Jianzong Wu et.al.	2412.07589	null
2024-12-09	SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations	Zhaorun Chen et.al.	2412.06878	null
2024-12-09	ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance	Chunwei Wang et.al.	2412.06673	null
2024-12-09	3D Spatial Understanding in MLLMs: Disambiguation and Evaluation	Chun-Peng Chang et.al.	2412.06613	null
2024-12-12	World knowledge-enhanced Reasoning Using Instruction-guided Interactor in Autonomous Driving	Mingliang Zhai et.al.	2412.06324	null
2024-12-09	LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations	Mingjie Xu et.al.	2412.06322	link
2024-12-09	Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness	Qifan Yu et.al.	2412.06293	null
2024-12-09	ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models	Bingchen Gong et.al.	2412.06292	null
2024-12-08	GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis	Ashish Goswami et.al.	2412.06089	null
2024-12-08	Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models	Xiao Xu et.al.	2412.05939	null
2024-12-08	Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models	Ma Teng et.al.	2412.05934	link
2024-12-08	[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs	Ao Wang et.al.	2412.05819	link
2024-12-06	Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling	Zhe Chen et.al.	2412.05271	link
2024-12-06	CompCap: Improving Multimodal Large Language Models with Composite Captions	Xiaohui Chen et.al.	2412.05243	null
2024-12-06	MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale	Jarvis Guo et.al.	2412.05237	null
2024-12-06	LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation	Donald Shenaj et.al.	2412.05148	link
2024-12-06	Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models	Zehao Wang et.al.	2412.04939	null
2024-12-06	EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation	Yongxin Wang et.al.	2412.04903	null
2024-12-06	Parametric-ControlNet: Multimodal Control in Foundation Models for Precise Engineering Design Synthesis	Rui Zhou et.al.	2412.04707	null
2024-12-05	Assessing and Learning Alignment of Unimodal Vision and Language Models	Le Zhang et.al.	2412.04616	null
2024-12-05	p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay	Jun Zhang et.al.	2412.04449	link
2024-12-05	EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios	Lu Qiu et.al.	2412.04447	null
2024-12-05	GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration	Kaiyi Huang et.al.	2412.04440	null
2024-12-05	Grounding Descriptions in Images informs Zero-Shot Visual Recognition	Shaunak Halbe et.al.	2412.04429	link
2024-12-05	Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion	Jiuhai Chen et.al.	2412.04424	link
2024-12-05	Liquid: Language Models are Scalable Multi-modal Generators	Junfeng Wu et.al.	2412.04332	link
2024-12-05	FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression	Bo Tong et.al.	2412.04317	link
2024-12-04	VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding	Chaoyu Li et.al.	2412.03735	null
2024-12-04	DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation	Qingdong He et.al.	2412.03255	null
2024-12-04	Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges	Minghao Shao et.al.	2412.03220	null
2024-12-04	ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning	Zhe Xie et.al.	2412.03104	link
2024-12-03	AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?	Kaixiong Gong et.al.	2412.02611	null
2024-12-03	Multimodal Remote Sensing Scene Classification Using VLMs and Dual-Cross Attention Networks	Jinjin Cai et.al.	2412.02531	null
2024-12-03	VR Based Emotion Recognition Using Deep Multimodal Fusion With Biosignals Across Multiple Anatomical Domains	Pubudu L. Indrasiri et.al.	2412.02283	null
2024-12-03	Personalized Multimodal Large Language Models: A Survey	Junda Wu et.al.	2412.02142	null
2024-12-03	WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image	Yuci Liang et.al.	2412.02141	null
2024-12-03	Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey	Yunkai Dang et.al.	2412.02104	null
2024-12-02	PKRD-CoT: A Unified Chain-of-thought Prompting for Multi-Modal Large Language Models in Autonomous Driving	Xuewen Luo et.al.	2412.02025	null
2024-12-02	MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models	Xiaomin Li et.al.	2412.01343	null
2024-12-02	Enhancing Perception Capabilities of Multimodal LLMs with Training-free Fusion	Zhuokun Chen et.al.	2412.01289	null
2024-12-02	Ponder & Press: Advancing Visual GUI Agent towards General Computer Control	Yiqin Wang et.al.	2412.01268	null
2024-12-02	T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs	Shukang Yin et.al.	2411.19951	link
2024-11-29	VLSBench: Unveiling Visual Leakage in Multimodal Safety	Xuhao Hu et.al.	2411.19939	link
2024-11-29	On Domain-Specific Post-Training for Multimodal Large Language Models	Daixuan Cheng et.al.	2411.19930	null
2024-11-29	Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings	Qiong Wu et.al.	2411.19628	link
2024-11-28	Libra: Leveraging Temporal Images for Biomedical Radiology Analysis	Xi Zhang et.al.	2411.19378	link
2024-11-28	SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation	Yuhan Pei et.al.	2411.19182	null
2024-11-28	Detailed Object Description with Controllable Dimensions	Xinran Wang et.al.	2411.19106	link
2024-11-28	I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting	Nicola Fanelli et.al.	2411.19050	link
2024-11-28	DuetML: Human-LLM Collaborative Machine Learning Framework for Non-Expert Users	Wataru Kawabe et.al.	2411.18908	null
2024-11-27	Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment	Soumya Suvra Ghosal et.al.	2411.18688	null
2024-11-27	Cross-modal Information Flow in Multimodal Large Language Models	Zhi Zhang et.al.	2411.18620	link
2024-11-27	GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation	Pengfei Zhou et.al.	2411.18499	null
2024-11-27	ChatRex: Taming Multimodal LLM for Joint Perception and Understanding	Qing Jiang et.al.	2411.18363	link
2024-11-27	Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models	Jingming Liu et.al.	2411.18142	null
2024-11-26	NEMO: Can Multimodal LLMs Identify Attribute-Modified Objects?	Jiaxuan Li et.al.	2411.17794	null
2024-11-26	Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration	Yuhang Han et.al.	2411.17686	null
2024-11-26	What Differentiates Educational Literature? A Multimodal Fusion Approach of Transformers and Computational Linguistics	Jordan J. Bird et.al.	2411.17593	null
2024-11-26	Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey	Jiayi Kuang et.al.	2411.17558	null
2024-11-26	InsightEdit: Towards Better Instruction Following for Image Editing	Yingjing Xu et.al.	2411.17323	null
2024-11-26	in-Car Biometrics (iCarB) Datasets for Driver Recognition: Face, Fingerprint, and Voice	Vedrana Krivokuca Hahn et.al.	2411.17305	null
2024-11-26	A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs	Lehan He et.al.	2411.17265	null
2024-11-26	HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator	Fan Yang et.al.	2411.17261	null
2024-11-26	Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment	Zheng Chen et.al.	2411.17237	link
2024-11-26	DOGE: Towards Versatile Visual Document Grounding and Referring	Yinan Zhou et.al.	2411.17125	null
2024-11-26	Multimodal Alignment and Fusion: A Survey	Songtao Li et.al.	2411.17040	null
2024-11-25	TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation	Linqing Zhong et.al.	2411.16425	null
2024-11-25	Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models	Hao Yi et.al.	2411.16201	null
2024-11-25	Interpreting Object-level Foundation Models via Visual Precision Search	Ruoyu Chen et.al.	2411.16198	link
2024-11-25	ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration	Haozhan Shen et.al.	2411.16044	link
2024-11-23	Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark	Rong-Cheng Tu et.al.	2411.15488	link
2024-11-23	Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy	Te Yang et.al.	2411.15453	null
2024-11-22	MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs	Chaoyou Fu et.al.	2411.15296	link
2024-11-22	VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement	Daeun Lee et.al.	2411.15115	null
2024-11-22	mR $^2$ AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA	Tao Zhang et.al.	2411.15041	null
2024-11-22	De-biased Multimodal Electrocardiogram Analysis	Haitao Li et.al.	2411.14795	null
2024-11-22	Evaluating and Advancing Multimodal Large Language Models in Ability Lens	Feng Chen et.al.	2411.14725	null
2024-11-22	FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data	Binqian Xu et.al.	2411.14717	link
2024-11-22	Any-to-3D Generation via Hybrid Diffusion Supervision	Yijun Fan et.al.	2411.14715	null
2024-11-21	LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval	Weiheng Lu et.al.	2411.14505	link
2024-11-21	Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models	Yuhao Dong et.al.	2411.14432	link
2024-11-21	Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding	Yiming Zhang et.al.	2411.14401	null
2024-11-21	Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance	Haozhe Zhao et.al.	2411.14279	null
2024-11-21	Separable Mixture of Low-Rank Adaptation for Continual Visual Instruction Tuning	Ziqi Wang et.al.	2411.13949	null
2024-11-21	Panther: Illuminate the Sight of Multimodal LLMs with Instruction-Guided Visual Prompts	Honglin Li et.al.	2411.13909	null
2024-11-20	Decompose and Leverage Preferences from Expert Models for Improving Trustworthiness of MLLMs	Rui Cao et.al.	2411.13697	link
2024-11-20	AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations	Gaurav Verma et.al.	2411.13451	null
2024-11-20	DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving	Xianda Guo et.al.	2411.13112	link
2024-11-20	Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving	Hao Zhou et.al.	2411.13076	null
2024-11-19	Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models	Zhen Zeng et.al.	2411.12790	null
2024-11-19	Automated 3D Physical Simulation of Open-world Scene with Gaussian Splatting	Haoyu Zhao et.al.	2411.12789	null
2024-11-19	Visual Cue Enhancement and Dual Low-Rank Adaptation for Efficient Visual Instruction Fine-Tuning	Pengkun Jiao et.al.	2411.12787	null
2024-11-19	Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model	Yiming Shi et.al.	2411.12783	null
2024-11-18	Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning	Xudong Yan et.al.	2411.12584	link
2024-11-19	CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model	Dongyoung Go et.al.	2411.12287	null
2024-11-18	AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning	Kun Xiang et.al.	2411.11930	link
2024-11-18	Dissecting Misalignment of Multimodal Large Language Models via Influence Function	Lijie Hu et.al.	2411.11667	null
2024-11-18	MAIRA-Seg: Enhancing Radiology Report Generation with Segmentation-Aware Multimodal Large Language Models	Harshita Sharma et.al.	2411.11362	null
2024-11-18	CCExpert: Advancing MLLM Capability in Remote Sensing Change Captioning with Difference-Aware Integration and a Foundational Dataset	Zhiming Wang et.al.	2411.11360	link
2024-11-18	MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis	Yingjie Zhou et.al.	2411.11235	null
2024-11-19	Multilingual Large Language Models: A Systematic Survey	Shaolin Zhu et.al.	2411.11072	link
2024-11-19	VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?	Yunlong Tang et.al.	2411.10979	null
2024-11-17	Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering	Zeping Yu et.al.	2411.10950	link
2024-11-17	Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning	Wenke Huang et.al.	2411.10928	null
2024-11-16	BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization	Md. Nazmus Sadat Samin et.al.	2411.10879	link
2024-11-16	Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts	Jinqiang Long et.al.	2411.10669	link
2024-11-15	Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization	Weiyun Wang et.al.	2411.10442	null
2024-11-15	Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization	Yuhan Fu et.al.	2411.10436	null
2024-11-15	Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting	Ziqi Xie et.al.	2411.10309	link
2024-11-15	Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning	Jingru Yang et.al.	2411.10252	null
2024-11-15	CMATH: Cross-Modality Augmented Transformer with Hierarchical Variational Distillation for Multimodal Emotion Recognition in Conversation	Xiaofei Zhu et.al.	2411.10060	null
2024-11-15	VMID: A Multimodal Fusion LLM Framework for Detecting and Identifying Misinformation of Short Videos	Weihao Zhong et.al.	2411.10032	null
2024-11-15	Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs	Xiaofeng Zhang et.al.	2411.09968	null
2024-11-14	MagicQuill: An Intelligent Interactive Image Editing System	Zichen Liu et.al.	2411.09703	link
2024-11-14	Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models	Wei Wang et.al.	2411.09691	null
2024-11-14	Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models	Chutian Meng et.al.	2411.09449	null
2024-11-14	Spider: Any-to-Many Multimodal LLM	Jinxiang Lai et.al.	2411.09439	link
2024-11-14	LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation	Zhenshi Li et.al.	2411.09301	link
2024-11-13	Multimodal Instruction Tuning with Hybrid State Space Models	Jianing Zhou et.al.	2411.08840	null
2024-11-13	Can MLLMs Guide Weakly-Supervised Temporal Action Localization Tasks?	Quan Zhang et.al.	2411.08466	null
2024-11-13	Material Property Prediction with Element Attribute Knowledge Graphs and Multimodal Representation Learning	Chao Huang et.al.	2411.08414	null
2024-11-12	SimBase: A Simple Baseline for Temporal Video Grounding	Peijun Bao et.al.	2411.07945	null
2024-11-12	Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding	Zirui Shao et.al.	2411.07722	null
2024-11-12	Zer0-Jack: A Memory-efficient Gradient-based Jailbreaking Method for Black-box Multi-modal Large Language Models	Tiejin Chen et.al.	2411.07559	null
2024-11-11	Multimodal Fusion Balancing Through Game-Theoretic Regularization	Konstantinos Kontras et.al.	2411.07335	null
2024-11-11	CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models	Junho Kim et.al.	2411.06869	null
2024-11-11	Learning from Feedback: Semantic Enhancement for Object SLAM Using Foundation Models	Jungseok Hong et.al.	2411.06752	null
2024-11-10	KMM: Key Frame Mask Mamba for Extended Motion Generation	Zeyu Zhang et.al.	2411.06481	link
2024-11-09	A Comprehensive Survey and Guide to Multimodal Large Language Models in Vision-Language Tasks	Chia Xin Liang et.al.	2411.06284	null
2024-11-09	An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models	Fatemeh Shiri et.al.	2411.06048	link
2024-11-08	Exploring the Alignment Landscape: LLMs and Geometric Deep Models in Protein Representation	Dong Shu et.al.	2411.05316	link
2024-11-08	Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding	Jaeyoo Park et.al.	2411.05254	null
2024-11-07	On Erroneous Agreements of CLIP Image Embeddings	Siting Li et.al.	2411.05195	null
2024-11-07	Seeing is Deceiving: Exploitation of Visual Pathways in Multi-Modal Language Models	Pete Janowczyk et.al.	2411.05056	null
2024-11-07	CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM	Jingwei Xu et.al.	2411.04954	null
2024-11-07	GUI Agents with Foundation Models: A Comprehensive Survey	Shuai Wang et.al.	2411.04890	null
2024-11-07	Exploring Hierarchical Molecular Graph Representation in Multimodal LLMs	Chengxin Hu et.al.	2411.04708	null
2024-11-06	Improving Bilingual Capabilities of Language Models to Support Diverse Linguistic Practices in Education	Anand Syamkumar et.al.	2411.04308	null
2024-11-06	Analyzing Multimodal Features of Spontaneous Voice Assistant Commands for Mild Cognitive Impairment Detection	Nana Lin et.al.	2411.04158	null
2024-11-06	Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination	Dingjie Song et.al.	2411.03823	link
2024-11-06	StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding	Junming Lin et.al.	2411.03628	link
2024-11-05	MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning	Ziliang Gan et.al.	2411.03314	null
2024-11-05	Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation?	Jingyu Xiao et.al.	2411.03292	link
2024-11-06	Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent	Yangning Li et.al.	2411.02937	link
2024-11-05	Toward Robust Incomplete Multimodal Sentiment Analysis via Hierarchical Representation Learning	Mingcheng Li et.al.	2411.02793	null
2024-11-05	Multimodal Commonsense Knowledge Distillation for Visual Question Answering	Shuo Yang et.al.	2411.02722	null
2024-11-05	Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios	Yunkai Dang et.al.	2411.02708	null
2024-11-04	MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs	Sheng-Chieh Lin et.al.	2411.02571	null
2024-11-04	DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution	Yang Yue et.al.	2411.02359	link
2024-11-04	KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension	Jie Yang et.al.	2411.01846	null
2024-11-04	ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model	Yiming Sun et.al.	2411.01756	null
2024-11-03	UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models	Sejoon Oh et.al.	2411.01703	null
2024-11-03	Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation	Seongsu Ha et.al.	2411.01494	null
2024-11-02	Can Multimodal Large Language Model Think Analogically?	Diandian Guo et.al.	2411.01307	null
2024-11-02	Reasoning Limitations of Multimodal Large Language Models. A case study of Bongard Problems	Mikołaj Małkiński et.al.	2411.01173	null
2024-11-01	Exploring Multi-Modality Dynamics: Insights and Challenges in Multimodal Fusion for Biomedical Tasks	Laura Wenderoth et.al.	2411.00725	null
2024-11-01	Unified Generative and Discriminative Training for Multi-modal Large Language Models	Wei Chow et.al.	2411.00304	null
2024-10-31	JEMA: A Joint Embedding Framework for Scalable Co-Learning with Multimodal Alignment	Joao Sousa et.al.	2410.23988	null
2024-10-31	Leveraging LLMs for MT in Crisis Scenarios: a blueprint for low-resource languages	Séamus Lankford et.al.	2410.23890	null
2024-10-31	Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual Grounding	Jinlong He et.al.	2410.23822	null
2024-10-30	PIP-MM: Pre-Integrating Prompt Information into Visual Encoding via Existing MLLM Structures	Tianxiang Wu et.al.	2410.23089	null
2024-10-29	Unsupervised Multimodal Fusion of In-process Sensor Data for Advanced Manufacturing Process Monitoring	Matthew McKinney et.al.	2410.22558	null
2024-10-29	Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench	Zheyuan Liu et.al.	2410.22108	link
2024-10-28	LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior	Hanyu Wang et.al.	2410.21264	null
2024-10-28	Face-MLLM: A Large Face Perception Model	Haomiao Sun et.al.	2410.20717	null
2024-10-27	Deep Learning-Driven Microstructure Characterization and Vickers Hardness Prediction of Mg-Gd Alloys	Lu Wang et.al.	2410.20402	null
2024-10-26	LLMs Can Evolve Continually on Modality for X-Modal Reasoning	Jiazuo Yu et.al.	2410.20178	link
2024-10-25	Evaluating Cost-Accuracy Trade-offs in Multimodal Search Relevance Judgements	Silvia Terragni et.al.	2410.19974	null
2024-10-25	Improving Multimodal Large Language Models Using Continual Learning	Shikhar Srivastava et.al.	2410.19925	null
2024-10-25	TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning	Xiangyu Zeng et.al.	2410.19702	null
2024-10-28	BIFRÖST: 3D-Aware Image compositing with Language Instructions	Lingxiao Li et.al.	2410.19079	link
2024-10-24	Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms	Zhangheng Li et.al.	2410.18967	null
2024-10-24	SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models	Zonghao Ying et.al.	2410.18927	null
2024-10-24	Distill Visual Chart Reasoning Ability from LLMs to MLLMs	Wei He et.al.	2410.18798	link
2024-10-24	DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation	Yuang Ai et.al.	2410.18666	link
2024-10-25	Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks	Lehan Wang et.al.	2410.18387	null
2024-10-23	TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts	Yuxuan Xie et.al.	2410.18071	null
2024-10-23	CLEAR: Character Unlearning in Textual and Visual Modalities	Alexey Dontsov et.al.	2410.18057	null
2024-10-23	Addressing Asynchronicity in Clinical Multimodal Fusion via Individualized Chest X-ray Generation	Wenfang Yao et.al.	2410.17918	link
2024-10-23	ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning	Zhiwei Hao et.al.	2410.17779	link
2024-10-23	YOLO-Vehicle-Pro: A Cloud-Edge Collaborative Framework for Object Detection in Autonomous Driving under Adverse Weather Conditions	Xiguang Li et.al.	2410.17734	null
2024-10-23	Responsible Multilingual Large Language Models: A Survey of Development, Applications, and Societal Impact	Junhua Liu et.al.	2410.17532	null
2024-10-22	LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding	Xiaoqian Shen et.al.	2410.17434	link
2024-10-22	Order Matters: Exploring Order Sensitivity in Multimodal Large Language Models	Zhijie Tan et.al.	2410.16983	null
2024-10-22	IPL: Leveraging Multimodal Large Language Models for Intelligent Product Listing	Kang Chen et.al.	2410.16977	null
2024-10-22	Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance	Zhangwei Gao et.al.	2410.16261	link
2024-10-21	LLaVA-KD: A Framework of Distilling Multimodal Large Language Models	Yuxuan Cai et.al.	2410.16236	link
2024-10-21	Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining	Han Huang et.al.	2410.16166	link
2024-10-21	Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages	Xiang Yue et.al.	2410.16153	null
2024-10-21	Mitigating Object Hallucination via Concentric Causal Attention	Yun Xing et.al.	2410.15926	link
2024-10-21	AMPLE: Emotion-Aware Multimodal Fusion Prompt Learning for Fake News Detection	Xiaoman Xu et.al.	2410.15591	link
2024-10-20	Generalized Multimodal Fusion via Poisson-Nernst-Planck Equation	Jiayu Xiong et.al.	2410.15475	null
2024-10-20	Modality-Fair Preference Optimization for Trustworthy MLLM Alignment	Songtao Jiang et.al.	2410.15334	null
2024-10-19	SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation	Jingxuan Chen et.al.	2410.15164	link
2024-10-19	LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound	Xuechen Guo et.al.	2410.15074	null
2024-10-18	MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps	Xiongtao Zhou et.al.	2410.14668	link
2024-10-18	MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems	Zifeng Zhu et.al.	2410.14179	link
2024-10-18	RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training	Muhe Ding et.al.	2410.14154	null
2024-10-17	PUMA: Empowering Unified MLLM with Multi-granular Visual Generation	Rongyao Fang et.al.	2410.13861	link
2024-10-17	$γ-$ MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models	Yaxin Luo et.al.	2410.13859	null
2024-10-17	Can MLLMs Understand the Deep Implication Behind Chinese Images?	Chenhao Zhang et.al.	2410.13854	link
2024-10-18	Harnessing Webpage UIs for Text-Rich Visual Understanding	Junpeng Liu et.al.	2410.13824	null
2024-10-17	MobA: A Two-Level Agent System for Efficient Mobile Task Automation	Zichen Zhu et.al.	2410.13757	link
2024-10-17	Exploring the Design Space of Visual Context Representation in Video MLLMs	Yifan Du et.al.	2410.13694	link
2024-10-17	Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant	Haoran Hao et.al.	2410.13360	link
2024-10-16	MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs	Yunqiu Xu et.al.	2410.12332	null
2024-10-16	Understanding the Role of LLMs in Multimodal Evaluation Benchmarks	Botian Jiang et.al.	2410.12329	link
2024-10-16	Multimodal Fusion with Relational Learning for Molecular Property Prediction	Zhengyang Zhou et.al.	2410.12128	null
2024-10-15	MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding	Yue Cao et.al.	2410.11829	link
2024-10-15	MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation	Chenxi Wang et.al.	2410.11779	link
2024-10-15	SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding	Ying Chen et.al.	2410.11761	null
2024-10-15	Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions	Yuhan Fu et.al.	2410.11701	null
2024-10-15	VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI	Sijie Cheng et.al.	2410.11623	null
2024-10-15	MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark	Bin Shan et.al.	2410.11538	link
2024-10-15	Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs	Sihang Zhao et.al.	2410.11437	link
2024-10-15	Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models	Zhongye Liu et.al.	2410.11242	link
2024-10-15	MANet: Fine-Tuning Segment Anything Model for Multimodal Remote Sensing Semantic Segmentation	Xianping Ma et.al.	2410.11160	link
2024-10-14	Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes	Tim Broedermann et.al.	2410.10791	link
2024-10-14	MMCFND: Multimodal Multilingual Caption-aware Fake News Detection for Low-resource Indic Languages	Shubhi Bansal et.al.	2410.10407	link
2024-10-14	Spatial-Aware Efficient Projector for MLLMs via Multi-Layer Feature Aggregation	Shun Qian et.al.	2410.10319	null
2024-10-14	ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization	Jiawei Li et.al.	2410.10238	null
2024-10-14	Tracing Human Stress from Physiological Signals using UWB Radar	Jia Xu et.al.	2410.10155	null
2024-10-15	LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models	Han Qiu et.al.	2410.09962	link
2024-10-13	Improving Colorectal Cancer Screening and Risk Assessment through Predictive Modeling on Medical Images and Records	Shuai Jiang et.al.	2410.09880	null
2024-10-13	Text4Seg: Reimagining Image Segmentation as Text Generation	Mengcheng Lan et.al.	2410.09855	link
2024-10-12	Skipping Computations in Multimodal LLMs	Mustafa Shukor et.al.	2410.09454	link
2024-10-12	MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection	Xi Jiang et.al.	2410.09453	link
2024-10-11	Multi-modal Fusion based Q-distribution Prediction for Controlled Nuclear Fusion	Shiao Wang et.al.	2410.08879	null
2024-10-11	Dual-AEB: Synergizing Rule-Based and Multimodal Large Language Models for Effective Emergency Braking	Wei Zhang et.al.	2410.08616	null
2024-10-11	Baichuan-Omni Technical Report	Yadong Li et.al.	2410.08565	link
2024-10-11	SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models	Haotian Xia et.al.	2410.08474	link
2024-10-10	Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training	Gen Luo et.al.	2410.08202	null
2024-10-10	Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models	Qingni Wang et.al.	2410.08174	null
2024-10-10	Agent S: An Open Agentic Framework that Uses Computers Like a Human	Saaket Agashe et.al.	2410.08164	link
2024-10-10	Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs	Xiaoyuan Liu et.al.	2410.08145	link
2024-10-09	Retrieval Replace Reduction: An effective visual token reduction method via semantic match	Yingen Liu et.al.	2410.07278	null
2024-10-09	Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis	Bohan Zeng et.al.	2410.07155	link
2024-10-09	Personalized Visual Instruction Tuning	Renjie Pi et.al.	2410.07113	link
2024-10-10	Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology	Xiangyu Wang et.al.	2410.07087	null
2024-10-09	HERM: Benchmarking and Enhancing Multimodal LLMs for Human-Centric Understanding	Keliang Li et.al.	2410.06777	null
2024-10-09	To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models	Junyan Lin et.al.	2410.06765	link
2024-10-09	ING-VP: MLLMs cannot Play Easy Vision-based Games Yet	Haoran Zhang et.al.	2410.06555	link
2024-10-09	Gumbel Rao Monte Carlo based Bi-Modal Neural Architecture Search for Audio-Visual Deepfake Detection	Aravinda Reddy PN et.al.	2410.06543	null
2024-10-08	Multimodal Situational Safety	Kaiwen Zhou et.al.	2410.06172	null
2024-10-08	Quadratic Is Not What You Need For Multimodal Large Language Models	Phu Pham et.al.	2410.06169	link
2024-10-08	$\textit{X}^2$-DFD: A framework for e${X}$plainable and e${X}$ tendable Deepfake Detection	Yize Chen et.al.	2410.06126	null
2024-10-07	Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents	Boyu Gou et.al.	2410.05243	link
2024-10-07	Organizing Unstructured Image Collections using Natural Language	Mingxuan Liu et.al.	2410.05217	null
2024-10-07	Multimodal Fusion Strategies for Mapping Biophysical Landscape Features	Lucia Gordon et.al.	2410.04833	link
2024-10-07	MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models	Kaichen Huang et.al.	2410.04819	link
2024-10-07	**Mitigating

Name		Name	Last commit message	Last commit date
Latest commit History 773 Commits
.github		.github
assets		assets
docs		docs
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
daily_arxiv.py		daily_arxiv.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Updated on 2025.08.14

Website

Tracking

HDR

Low-Level

Image Matching

MutilModal

About

Uh oh!

Releases

Packages

Languages

License

Jasper0068/arxiv-papers-daily

Folders and files

Latest commit

History

Repository files navigation

Updated on 2025.08.14

Website

Tracking

HDR

Low-Level

Image Matching

MutilModal

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages