Skip to content

astalavistababe/HumanAIGC-arxiv-daily-suruoxi

 
 

Repository files navigation

HumanAIGC Research Papers

Updated on 2025.06.26

Table of Contents
  1. Talking Face
  2. Image Animation
  3. Video Generation
  4. TryOn
  5. Visual Edit
  6. Others
  7. Music2Dance and Co-speech
  8. Speech and Interaction

Talking Face

Publish Date Title Authors PDF Code
2025-06-24 Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router Yubo Huang et.al. 2506.19833 null
2025-06-23 OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation Qijun Gan et.al. 2506.18866 null
2025-06-17 SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting Ziqiao Peng et.al. 2506.14742 null
2025-06-17 Compressed Video Super-Resolution based on Hierarchical Encoding Yuxuan Jiang et.al. 2506.14381 null
2025-06-16 Audio-Visual Driven Compression for Low-Bitrate Talking Head Videos Riku Takahashi et.al. 2506.13419 null
2025-06-15 iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer Zhelun Shen et.al. 2506.12847 null
2025-06-13 ICME 2025 Grand Challenge on Video Super-Resolution for Video Conferencing Babak Naderi et.al. 2506.12269 link
2025-06-10 HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation Ziyao Huang et.al. 2506.08797 null
2025-06-03 NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results Xiaohong Liu et.al. 2506.02875 null
2025-06-02 Cocktail-Party Audio-Visual Speech Recognition Thai-Binh Nguyen et.al. 2506.02178 null
2025-06-02 Low-Rank Head Avatar Personalization with Registers Sai Tanmay Reddy Chakkera et.al. 2506.01935 null
2025-06-02 Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation Yuan Gan et.al. 2506.01591 link
2025-06-01 SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers Zhengcong Fei et.al. 2506.00830 null
2025-05-30 TalkingHeadBench: A Multi-Modal Benchmark & Analysis of Talking-Head DeepFake Detection Xinqi Xiong et.al. 2505.24866 null
2025-05-29 Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion Modulation Jiahao Cui et.al. 2505.23525 link
2025-05-29 Video Editing for Audio-Visual Dubbing Binyamin Manela et.al. 2505.23406 link
2025-05-29 Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation Hao Li et.al. 2505.23290 link
2025-05-29 MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation Siyuan Wang et.al. 2505.23120 link
2025-05-28 Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation Zhe Kong et.al. 2505.22647 link
2025-05-28 Tell me Habibi, is it Real or Fake? Kartik Kuckreja et.al. 2505.22581 null
2025-05-28 Neural Face Skinning for Mesh-agnostic Facial Expression Cloning Sihun Cha et.al. 2505.22416 null
2025-05-28 FaceEditTalker: Interactive Talking Head Generation with Facial Attribute Editing Guanwen Feng et.al. 2505.22141 null
2025-05-28 RESOUND: Speech Reconstruction from Silent Videos via Acoustic-Semantic Decomposed Modeling Long-Khanh Pham et.al. 2505.22024 null
2025-05-27 OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers Ziqiao Peng et.al. 2505.21448 null
2025-05-26 Total-Editing: Head Avatar with Editable Appearance, Motion, and Lighting Yizhou Zhao et.al. 2505.20582 null
2025-05-26 DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations Ziqiao Peng et.al. 2505.18096 null
2025-05-22 Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis Radek Daněček et.al. 2504.13386 null
2025-05-14 Test-Time Augmentation for Pose-invariant Face Recognition Jaemin Jung et.al. 2505.09256 null
2025-05-10 VTutor: An Animated Pedagogical Agent SDK that Provide Real Time Multi-Model Feedback Eason Chen et.al. 2505.06676 null
2025-05-10 OT-Talk: Animating 3D Talking Head with Optimal Transportation Xinmu Wang et.al. 2505.01932 null
2025-05-10 MagicPortrait: Temporally Consistent Face Reenactment with 3D Geometric Guidance Mengting Wei et.al. 2504.21497 link
2025-05-08 OXSeg: Multidimensional attention UNet-based lip segmentation using semi-supervised lip contours Hanie Moghaddasi et.al. 2505.05531 null
2025-05-03 GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting Anushka Agarwal et.al. 2505.01928 null
2025-05-02 Model See Model Do: Speech-Driven Facial Animation with Style Control Yifang Pan et.al. 2505.01319 null
2025-05-02 FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing Gaoxiang Cong et.al. 2505.01263 null
2025-05-01 KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution Antoni Bigata et.al. 2505.00497 null
2025-04-29 IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos Yuan Li et.al. 2504.19165 null
2025-04-27 Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions Mohammad Mahdi Abootorabi et.al. 2504.19056 link
2025-04-26 Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning Yifan Xie et.al. 2504.18810 null
2025-04-25 Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation Weipeng Tan et.al. 2504.18087 null
2025-04-14 SpinMeRound: Consistent Multi-View Identity Generation Using Diffusion Models Stathis Galanakis et.al. 2504.10716 null
2025-04-10 ChildlikeSHAPES: Semantic Hierarchical Region Parsing for Animating Figure Drawings Astitva Srivastava et.al. 2504.08022 null
2025-04-08 VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing Juan Luis Gonzalez Bello et.al. 2504.07146 null
2025-04-08 SE4Lip: Speech-Lip Encoder for Talking Head Synthesis to Solve Phoneme-Viseme Alignment Ambiguity Yihuan Huang et.al. 2504.05803 null
2025-04-08 Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation Zhihua Xu et.al. 2504.05746 null
2025-04-08 Contrastive Decoupled Representation Learning and Regularization for Speech-Preserving Facial Expression Manipulation Tianshui Chen et.al. 2504.05672 null
2025-04-07 Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation Fa-Ting Hong et.al. 2504.02542 link
2025-04-06 FluentLip: A Phonemes-Based Two-stage Approach for Audio-Driven Lip Synthesis with Optical Flow Consistency Shiyan Liu et.al. 2504.04427 null
2025-04-04 A Human Digital Twin Architecture for Knowledge-based Interactions and Context-Aware Conversations Abdul Mannan Mohammed et.al. 2504.03147 null
2025-04-03 OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication Zhongjian Wang et.al. 2504.02433 null
2025-04-03 VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models Kim Sung-Bin et.al. 2504.02386 null
2025-04-02 Detecting Lip-Syncing Deepfakes: Vision Temporal Transformer for Analyzing Mouth Inconsistencies Soumyya Kanti Datta et.al. 2504.01470 link
2025-04-02 EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters Xuli Shen et.al. 2503.19416 null
2025-04-01 Monocular and Generalizable Gaussian Talking Head Animation Shengjie Gong et.al. 2504.00665 null
2025-03-31 Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics Lee Chae-Yeon et.al. 2503.20308 null
2025-03-30 MoCha: Towards Movie-Grade Talking Character Synthesis Cong Wei et.al. 2503.23307 null
2025-03-29 STSA: Spatial-Temporal Semantic Alignment for Visual Dubbing Zijun Ding et.al. 2503.23039 link
2025-03-28 Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis Shuai Shen et.al. 2503.22605 null
2025-03-28 Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance Haijie Yang et.al. 2503.22225 null
2025-03-27 ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model Jinwei Qi et.al. 2503.21144 null
2025-03-26 Dual Audio-Centric Modality Coupling for Talking Head Generation Ao Fu et.al. 2503.22728 null
2025-03-25 AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers Jiazhi Guan et.al. 2503.19824 null
2025-03-25 MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation Yukang Lin et.al. 2503.19383 null
2025-03-25 HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation Zunnan Xu et.al. 2503.18860 null
2025-03-25 Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model Yingying Fan et.al. 2503.16942 null
2025-03-24 DisentTalk: Cross-lingual Talking Face Generation via Semantic Disentangled Diffusion Model Kangwei Liu et.al. 2503.19001 null
2025-03-24 Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation Dingcheng Zhen et.al. 2503.18429 null
2025-03-23 DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation Peng Chen et.al. 2503.18159 link
2025-03-21 TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting Jianchuan Chen et.al. 2503.17032 null
2025-03-21 From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech Ji-Hoon Kim et.al. 2503.16956 null
2025-03-20 UniSync: A Unified Framework for Audio-Visual Synchronization Tao Feng et.al. 2503.16357 null
2025-03-20 PC-Talk: Precise Facial Animation Control for Audio-Driven Talking Face Generation Baiqin Wang et.al. 2503.14295 null
2025-03-19 DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis Yuming Gu et.al. 2503.15667 link
2025-03-19 KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation Antoni Bigata et.al. 2503.01715 null
2025-03-17 SyncDiff: Diffusion-based Talking Head Synthesis with Bottlenecked Temporal Visual Prior for Improved Synchronization Xulin Fan et.al. 2503.13371 null
2025-03-17 Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait Chaolong Yang et.al. 2503.12963 link
2025-03-16 Versatile Multimodal Controls for Whole-Body Talking Human Animation Zheng Qin et.al. 2503.08714 null
2025-03-14 Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control Hejia Chen et.al. 2503.14517 null
2025-03-14 EmoDiffusion: Enhancing Emotional 3D Facial Animation with Latent Diffusion Models Yixuan Zhang et.al. 2503.11028 null
2025-03-12 StyleSpeaker: Audio-Enhanced Fine-Grained Style Modeling for Speech-Driven 3D Facial Animation An Yang et.al. 2503.09852 null
2025-03-12 Bidirectional Learned Facial Animation Codec for Low Bitrate Talking Head Videos Riku Takahashi et.al. 2503.09787 null
2025-03-09 Removing Averaging: Personalized Lip-Sync Driven Characters Based on Identity Adapter Yanyu Zhu et.al. 2503.06397 null
2025-03-07 MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice Hongwei Yi et.al. 2503.05978 null
2025-03-06 FREAK: Frequency-modulated High-fidelity and Real-time Audio-driven Talking Portrait Synthesis Ziqi Ni et.al. 2503.04067 null
2025-03-02 FaceShot: Bring Any Character into Life Junyao Gao et.al. 2503.00740 null
2025-03-01 Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture Xuanchen Li et.al. 2503.00495 null
2025-02-28 Two-Stream Spatial-Temporal Transformer Framework for Person Identification via Natural Conversational Keypoints Masoumeh Chapariniya et.al. 2502.20803 null
2025-02-28 ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model Xuangeng Chu et.al. 2502.20323 null
2025-02-27 InsTaG: Learning Personalized 3D Talking Head from Few-Second Video Jiahe Li et.al. 2502.20387 link
2025-02-27 High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model Mingtao Guo et.al. 2502.19894 link
2025-02-26 FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion mode Lingzhou Mu et.al. 2502.19455 null
2025-02-24 Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation Baptiste Chopin et.al. 2502.17198 null
2025-02-20 NeRF-3DTalker: Neural Radiance Field with 3D Prior Aided Audio Disentanglement for Talking Head Synthesis Xiaoxing Liu et.al. 2502.14178 null
2025-02-18 AV-Flow: Transforming Text to Audio-Visual Human-like Interactions Aggelina Chatziagapi et.al. 2502.13133 null
2025-02-17 SayAnything: Audio-Driven Lip Synchronization with Conditional Video Diffusion Junxian Ma et.al. 2502.11515 null
2025-02-15 SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers Di Qiu et.al. 2502.10841 link
2025-02-13 Long-Term TalkingFace Generation via Motion-Prior Conditional Diffusion Model Fei Shen et.al. 2502.09533 null
2025-02-13 VTutor: An Open-Source SDK for Generative AI-Powered Animated Pedagogical Agents with Multi-Media Output Eason Chen et.al. 2502.04103 null
2025-02-11 Playmate: Flexible Control of Portrait Animation via 3D-Implicit Space Guided Diffusion Xingpei Ma et.al. 2502.07203 null
2025-02-07 Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark Han Zhang et.al. 2502.04976 null
2025-02-02 EmoTalkingGaussian: Continuous Emotion-conditioned Talking Head Synthesis Junuk Cha et.al. 2502.00654 null
2025-01-24 SyncAnimation: A Real-Time End-to-End Framework for Audio-Driven Human Pose and Talking Head Animation Yujian Liu et.al. 2501.14646 null
2025-01-21 A Lightweight and Interpretable Deepfakes Detection Framework Muhammad Umar Farooq et.al. 2501.11927 null
2025-01-18 EMO2: End-Effector Guided Audio-Driven Avatar Video Generation Linrui Tian et.al. 2501.10687 null
2025-01-17 TalkingEyes: Pluralistic Speech-Driven 3D Eye Gaze Animation Yixiang Zhuang et.al. 2501.09921 null
2025-01-15 Joint Learning of Depth and Appearance for Portrait Image Animation Xinya Ji et.al. 2501.08649 null
2025-01-15 Make-A-Character 2: Animatable 3D Character Generation From a Single Image Lin Liu et.al. 2501.07870 null
2025-01-09 Towards Dynamic Neural Communication and Speech Neuroprosthesis Based on Viseme Decoding Ji-Ha Park et.al. 2501.14790 null
2025-01-09 Identity-Preserving Video Dubbing Using Motion Warping Runzhen Liu et.al. 2501.04586 null
2025-01-09 MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation Huaize Liu et.al. 2501.01808 null
2025-01-07 Generating and Detecting Various Types of Fake Image and Audio Content: A Review of Modern Deep Learning Technologies and Tools Arash Dehghani et.al. 2501.06227 null
2025-01-07 VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Yuanpeng Tu et.al. 2501.01427 null
2025-01-06 RDD4D: 4D Attention-Guided Road Damage Detection And Classification Asma Alkalbani et.al. 2501.02822 link
2025-01-06 Takeaways from Applying LLM Capabilities to Multiple Conversational Avatars in a VR Pilot Study Mykola Maslych et.al. 2501.00168 null
2025-01-03 JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing Qili Wang et.al. 2501.01798 link
2024-12-28 DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis Kaijun Deng et.al. 2412.20148 link
2024-12-26 UniAvatar: Taming Lifelike Audio-Driven Talking Head Generation with Comprehensive Motion and Lighting Control Wenzhang Sun et.al. 2412.19860 null
2024-12-26 Generating Editable Head Avatars with 3D Gaussian GANs Guohao Li et.al. 2412.19149 link
2024-12-23 FaceLift: Single Image to 3D Head with View Generation and GS-LRM Weijie Lyu et.al. 2412.17812 null
2024-12-22 FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation Tianyun Zhong et.al. 2412.16915 null
2024-12-18 Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters Steven Hogue et.al. 2412.14333 link
2024-12-18 GLCF: A Global-Local Multimodal Coherence Analysis Framework for Talking Face Generation Detection Xiaocan Chen et.al. 2412.13656 null
2024-12-18 Learning to Control an Android Robot Head for Facial Animation Marcel Heisler et.al. 2412.13641 null
2024-12-18 Real-time One-Step Diffusion-based Expressive Portrait Videos Generation Hanzhong Guo et.al. 2412.13479 link
2024-12-18 VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization Tao Liu et.al. 2412.09892 null
2024-12-16 Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content Rohit Kundu et.al. 2412.12278 null
2024-12-13 GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expression Ziqi Zhou et.al. 2412.09296 link
2024-12-12 LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync Chunyu Li et.al. 2412.09262 link
2024-12-12 EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing Gaoxiang Cong et.al. 2412.08988 null
2024-12-11 PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis Yifan Xie et.al. 2412.08504 null
2024-12-10 PortraitTalk: Towards Customizable One-Shot Audio-to-Talking Face Generation Fatemeh Nazarieh et.al. 2412.07754 null
2024-12-10 IF-MDM: Implicit Face Motion Diffusion Model for High-Fidelity Realtime Talking Head Generation Sejong Yang et.al. 2412.04000 null
2024-12-05 MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation Longtao Zheng et.al. 2412.04448 null
2024-12-05 Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks Jiahao Cui et.al. 2412.00733 link
2024-12-04 SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model Yan Li et.al. 2412.03430 null
2024-12-02 One Shot, One Talk: Whole-body Talking Avatar from a Single Image Jun Xiang et.al. 2412.01106 null
2024-12-01 Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation Shuling Zhao et.al. 2412.00719 null
2024-11-29 LokiTalk: Learning Fine-Grained and Generalizable Correspondences to Enhance NeRF-based Talking Head Synthesis Tianqi Li et.al. 2411.19525 null
2024-11-29 Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis Tianqi Li et.al. 2411.19509 link
2024-11-29 V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow Jeongsoo Choi et.al. 2411.19486 link
2024-11-26 Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey Hong-Hanh Nguyen-Le et.al. 2411.17911 null
2024-11-25 Sonic: Shifting Focus to Global Audio Perception in Portrait Animation Xiaozhong Ji et.al. 2411.16331 null
2024-11-25 ESARM: 3D Emotional Speech-to-Animation via Reward Model from Automatically-Ranked Demonstrations Xulong Zhang et.al. 2411.13089 null
2024-11-24 LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis Haojie Zhang et.al. 2411.16748 null
2024-11-23 EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion Haotian Wang et.al. 2411.16726 null
2024-11-23 ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance Haijie Yang et.al. 2411.15436 null
2024-11-20 Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis Pegah Salehi et.al. 2411.13209 link
2024-11-20 JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation Xuyang Cao et.al. 2411.09209 link
2024-11-14 LES-Talker: Fine-Grained Emotion Editing for Talking Head Generation in Linear Emotion Space Guanwen Feng et.al. 2411.09268 null
2024-11-06 Large Generative Model-assisted Talking-face Semantic Communication System Feibo Jiang et.al. 2411.03876 null
2024-10-31 Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts Xiang Deng et.al. 2410.23836 null
2024-10-29 Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing Haonan Tong et.al. 2410.22112 null
2024-10-24 Real-time 3D-aware Portrait Video Relighting Ziqi Cai et.al. 2410.18355 link
2024-10-21 Joker: Conditional 3D Head Synthesis with Extreme Facial Expressions Malte Prinzler et.al. 2410.16395 null
2024-10-18 Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization Bin Lin et.al. 2410.14283 null
2024-10-18 DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation Hanbo Cheng et.al. 2410.13726 link
2024-10-16 MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting Yue Zhang et.al. 2410.10122 link
2024-10-15 Titanic Calling: Low Bandwidth Video Conference from the Titanic Wreck Fevziye Irem Eyiokur et.al. 2410.11434 null
2024-10-15 MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes Zhenhui Ye et.al. 2410.06734 null
2024-10-14 Character-aware audio-visual subtitling in context Jaesung Huh et.al. 2410.11068 null
2024-10-14 Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads Federico Nocentini et.al. 2410.11041 null
2024-10-14 TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model Jiazhi Guan et.al. 2410.10696 null
2024-10-14 Generative Human Video Compression with Multi-granularity Temporal Trajectory Factorization Shanzhi Yin et.al. 2410.10171 null
2024-10-10 MMHead: Towards Fine-grained Multi-modal 3D Facial Animation Sijing Wu et.al. 2410.07757 null
2024-10-09 FreeAvatar: Robust 3D Facial Animation Transfer by Learning an Expression Foundation Model Feng Qiu et.al. 2409.13180 null
2024-10-01 LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details Jian Yang et.al. 2410.00990 null
2024-09-29 Learning Frame-Wise Emotion Intensity for Audio-Driven Talking-Head Generation Jingyi Xu et.al. 2409.19501 null
2024-09-27 Diverse Code Query Learning for Speech-Driven Facial Animation Chunzhi Gu et.al. 2409.19143 null
2024-09-26 Stable Video Portraits Mirela Ostrek et.al. 2409.18083 null
2024-09-25 ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE Sichun Wu et.al. 2409.07966 link
2024-09-24 FastTalker: Jointly Generating Speech and Conversational Gestures from Text Zixin Guo et.al. 2409.16404 null
2024-09-23 FaceVid-1K: A Large-Scale High-Quality Multiracial Human Face Video Dataset Donglin Di et.al. 2410.07151 null
2024-09-23 MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning Yue Han et.al. 2409.15179 null
2024-09-18 JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation Sai Tanmay Reddy Chakkera et.al. 2409.12156 null
2024-09-18 GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations Kartik Teotia et.al. 2409.11951 null
2024-09-17 3DFacePolicy: Speech-Driven 3D Facial Animation with Diffusion Policy Xuanmeng Sha et.al. 2409.10848 null
2024-09-16 DreamHead: Learning Spatial-Temporal Correspondence via Hierarchical Diffusion for Audio-driven Talking Head Synthesis Fa-Ting Hong et.al. 2409.10281 null
2024-09-14 StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads Suzhen Wang et.al. 2409.09292 null
2024-09-11 DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures Steven Hogue et.al. 2409.07649 null
2024-09-11 EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion Jian Zhang et.al. 2409.07255 link
2024-09-09 PersonaTalk: Bring Attention to Your Persona in Visual Dubbing Longhao Zhang et.al. 2409.05379 null
2024-09-09 KAN-Based Fusion of Dual-Domain for Audio-Driven Facial Landmarks Generation Hoang-Son Vo-Thanh et.al. 2409.05330 link
2024-09-05 SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing Lingyu Xiong et.al. 2409.03605 null
2024-09-05 SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model Weipeng Tan et.al. 2409.03270 null
2024-09-04 PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation Jun Ling et.al. 2409.02657 null
2024-09-02 KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding Zhihao Xu et.al. 2409.01113 link
2024-08-28 Micro and macro facial expressions by driven animations in realistic Virtual Humans Rubens Halbig Montanha et.al. 2408.16110 null
2024-08-27 MegActor- $Σ$ : Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer Shurong Yang et.al. 2408.14975 null
2024-08-25 TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation Jack Saunders et.al. 2408.13714 null
2024-08-23 G3FA: Geometry-guided GAN for Face Animation Alireza Javanmardi et.al. 2408.13049 null
2024-08-21 AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition Minheng Ni et.al. 2408.11564 null
2024-08-21 EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face with Mesh Attention Yihong Lin et.al. 2408.11518 null
2024-08-20 DEGAS: Detailed Expressions on Full-Body Gaussian Avatars Zhijing Shao et.al. 2408.10588 link
2024-08-18 FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model Ziyu Yao et.al. 2408.09384 null
2024-08-18 Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation Xukun Zhou et.al. 2408.09357 null
2024-08-18 S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis Dongze Li et.al. 2408.09347 null
2024-08-16 GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer Yihong Lin et.al. 2408.01826 null
2024-08-14 Content and Style Aware Audio-Driven Facial Animation Qingju Liu et.al. 2408.07005 null
2024-08-12 DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation Jisoo Kim et.al. 2408.06010 null
2024-08-10 High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model Weizhi Zhong et.al. 2408.05416 null
2024-08-10 Style-Preserving Lip Sync via Audio-Aware Style Reference Weizhi Zhong et.al. 2408.05412 null
2024-08-09 DeepSpeak Dataset v1.0 Sarah Barrington et.al. 2408.05366 null
2024-08-06 ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer Jiazhi Guan et.al. 2408.03284 null
2024-08-03 Landmark-guided Diffusion Model for High-fidelity and Temporally Coherent Talking Head Generation Jintao Tan et.al. 2408.01732 null
2024-08-03 JambaTalk: Speech-Driven 3D Talking Head Generation Based on Hybrid Transformer-Mamba Model Farzaneh Jafari et.al. 2408.01627 null
2024-08-01 UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model Xiangyu Fan et.al. 2408.00762 null
2024-08-01 Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion Manuel Kansy et.al. 2408.00458 null
2024-08-01 EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head Qianyun He et.al. 2408.00297 null
2024-07-31 Deformable 3D Shape Diffusion Model Dengsheng Chen et.al. 2407.21428 null
2024-07-26 LinguaLinker: Audio-Driven Portraits Animation with Implicit Facial Control Enhancement Rui Zhang et.al. 2407.18595 null
2024-07-24 A Comprehensive Review and Taxonomy of Audio-Visual Synchronization Techniques for Realistic Speech Animation Jose Geraldo Fernandes et.al. 2407.17430 null
2024-07-24 The impact of differences in facial features between real speakers and 3D face models on synthesized lip motions Rabab Algadhy et.al. 2407.17253 null
2024-07-22 PAV: Personalized Head Avatar from Unstructured Video Collection Akin Caliskan et.al. 2407.21047 null
2024-07-21 Anchored Diffusion for Video Face Reenactment Idan Kligvasser et.al. 2407.15153 null
2024-07-20 Text-based Talking Video Editing with Cascaded Conditional Diffusion Bo Han et.al. 2407.14841 null
2024-07-17 Universal Facial Encoding of Codec Avatars from VR Headsets Shaojie Bai et.al. 2407.13038 null
2024-07-17 EmoFace: Audio-driven Emotional 3D Face Animation Chang Liu et.al. 2407.12501 link
2024-07-13 Learning Online Scale Transformation for Talking Head Video Generation Fa-Ting Hong et.al. 2407.09965 null
2024-07-12 Real Face Video Animation Platform Xiaokai Chen et.al. 2407.18955 null
2024-07-12 One-Shot Pose-Driving Face Animation Platform He Feng et.al. 2407.08949 null
2024-07-12 EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions Zhiyuan Chen et.al. 2407.08136 link
2024-07-08 MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices Jianwen Jiang et.al. 2407.05712 null
2024-07-08 Audio-driven High-resolution Seamless Talking Head Video Editing via StyleGAN Jiacheng Su et.al. 2407.05577 null
2024-07-04 Compressed Skinning for Facial Blendshapes Ladislav Kavan et.al. 2406.11597 null
2024-07-03 LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control Jianzhu Guo et.al. 2407.03168 link
2024-07-01 Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert Han EunGi et.al. 2407.01034 null
2024-06-26 RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network Xiaozhong Ji et.al. 2406.18284 null
2024-06-24 The Effects of Embodiment and Personality Expression on Learning in LLM-based Educational Agents Sinan Sonlu et.al. 2407.10993 null
2024-06-21 EmpathyEar: An Open-source Avatar Multimodal Empathetic Chatbot Hao Fei et.al. 2406.15177 link
2024-06-20 MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset Kim Sung-Bin et.al. 2406.14272 null
2024-06-19 DF40: Toward Next-Generation Deepfake Detection Zhiyuan Yan et.al. 2406.13495 link
2024-06-19 AniFaceDiff: High-Fidelity Face Reenactment via Facial Parametric Conditioned Diffusion Models Ken Chen et.al. 2406.13272 null
2024-06-18 RITA: A Real-time Interactive Talking Avatars Framework Wuxinlin Cheng et.al. 2406.13093 null
2024-06-18 A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing Ming Meng et.al. 2406.10553 null
2024-06-17 NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head Generation Niu Guanchen et.al. 2406.11259 null
2024-06-17 Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement Runyi Yu et.al. 2406.08096 null
2024-06-16 Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation Mingwang Xu et.al. 2406.08801 null
2024-06-14 DNPM: A Neural Parametric Model for the Synthesis of Facial Geometric Details Haitao Cao et.al. 2405.19688 null
2024-06-13 Talking Heads: Understanding Inter-layer Communication in Transformer Language Models Jack Merullo et.al. 2406.09519 null
2024-06-13 DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing Neha Sahipjohn et.al. 2406.08802 null
2024-06-12 Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation Jiadong Liang et.al. 2406.07895 null
2024-06-07 Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation Yue Ma et.al. 2406.01900 null
2024-06-05 Controllable Talking Face Generation by Implicit Facial Keypoints Editing Dong Zhao et.al. 2406.02880 link
2024-05-31 MunchSonic: Tracking Fine-grained Dietary Actions through Active Acoustic Sensing on Eyeglasses Saif Mahmud et.al. 2405.21004 null
2024-05-31 MegActor: Harness the Power of Raw Video for Vivid Portrait Animation Shurong Yang et.al. 2405.20851 link
2024-05-30 Audio2Rig: Artist-oriented deep learning tool for facial animation Bastien Arcelin et.al. 2405.20412 null
2024-05-28 OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance Shuheng Ge et.al. 2405.14709 null
2024-05-24 InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation Yuchi Wang et.al. 2405.15758 link
2024-05-22 Metabook: An Automatically Generated Augmented Reality Storybook Interaction System to Improve Children's Engagement in Storytelling Yibo Wang et.al. 2405.13701 null
2024-05-21 Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control Yue Han et.al. 2405.12970 null
2024-05-16 Faces that Speak: Jointly Synthesising Talking Face and Speech from Text Youngjoon Jang et.al. 2405.10272 null
2024-05-14 PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset Yang Hou et.al. 2405.08838 link
2024-05-12 Listen, Disentangle, and Control: Controllable Speech-Driven Talking Head Generation Changpeng Cai et.al. 2405.07257 null
2024-05-10 NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior Gihoon Kim et.al. 2405.05749 null
2024-05-09 SwapTalk: Audio-Driven Talking Face Generation with One-Shot Customization in Latent Space Zeren Zhang et.al. 2405.05636 null
2024-05-08 Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention Ruijie Tao et.al. 2404.18501 link
2024-05-07 Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation Dogucan Yaman et.al. 2405.04327 null
2024-05-06 AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding Tao Liu et.al. 2405.03121 link
2024-04-29 EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars Nikita Drobyshev et.al. 2404.19110 null
2024-04-29 GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting Bo Chen et.al. 2404.19040 null
2024-04-29 Embedded Representation Learning Network for Animating Styled Video Portrait Tianyong Wang et.al. 2404.19038 null
2024-04-29 CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation Xiangyu Liang et.al. 2404.18604 null
2024-04-28 GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting Hongyun Yu et.al. 2404.14037 null
2024-04-25 GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting Kyusun Cho et.al. 2404.16012 link
2024-04-23 TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting Jiahe Li et.al. 2404.15264 link
2024-04-19 Learn2Talk: 3D Talking Face Learns from 2D Talking Face Yixiang Zhuang et.al. 2404.12888 null
2024-04-16 VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time Sicheng Xu et.al. 2404.10667 null
2024-04-15 FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance, Head-pose, and Facial Expression Features Andre Rochow et.al. 2404.09736 null
2024-04-13 THQA: A Perceptual Quality Assessment Database for Talking Heads Yingjie Zhou et.al. 2404.09003 link
2024-04-11 EFHQ: Multi-purpose ExtremePose-Face-HQ dataset Trung Tuan Dao et.al. 2312.17205 null
2024-04-09 Deepfake Generation and Detection: A Benchmark and Survey Gan Pei et.al. 2403.17881 link
2024-04-08 SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation Heyuan Li et.al. 2404.05680 null
2024-04-07 GvT: A Graph-based Vision Transformer with Talking-Heads Utilizing Sparsity, Trained from Scratch on Small Datasets Dongjing Shan et.al. 2404.04924 null
2024-04-07 Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation Renshuai Liu et.al. 2401.01207 null
2024-04-03 MI-NeRF: Learning a Single Face NeRF from Multiple Identities Aggelina Chatziagapi et.al. 2403.19920 null
2024-04-02 EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis Shuai Tan et.al. 2404.01647 null
2024-04-02 Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation Taekyung Ki et.al. 2404.00636 null
2024-04-01 FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio Chao Xu et.al. 2403.01901 link
2024-04-01 Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation Se Jin Park et.al. 2305.19556 null
2024-03-29 Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior Jaehoon Ko et.al. 2403.20153 link
2024-03-28 MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation Seyeon Kim et.al. 2403.19144 link
2024-03-28 GOTCHA: Real-Time Video Deepfake Detection via Challenge-Response Govind Mittal et.al. 2210.06186 link
2024-03-27 X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention You Xie et.al. 2403.15931 null
2024-03-26 Superior and Pragmatic Talking Face Generation with Teacher-Student Framework Chao Liang et.al. 2403.17883 null
2024-03-26 AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation Huawei Wei et.al. 2403.17694 link
2024-03-25 DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment Stella Bounareli et.al. 2403.17217 null
2024-03-25 AnimateMe: 4D Facial Expressions via Diffusion Models Dimitrios Gerogiannis et.al. 2403.17213 null
2024-03-25 Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework Ziyao Huang et.al. 2403.16510 link
2024-03-23 Adaptive Super Resolution For One-Shot Talking-Head Generation Luchuan Song et.al. 2403.15944 link
2024-03-23 Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis Zhenhui Ye et.al. 2401.08503 link
2024-03-22 LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One Example Soyeon Yoon et.al. 2403.15227 link
2024-03-22 Virbo: Multimodal Multilingual Avatar Video Generation in Digital Marketing Juan Zhang et.al. 2403.11700 null
2024-03-19 EmoVOCA: Speech-Driven Emotional 3D Talking Heads Federico Nocentini et.al. 2403.12886 link
2024-03-19 ScanTalk: 3D Talking Heads from Unregistered Scans Federico Nocentini et.al. 2403.10942 link
2024-03-15 StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation Dongchan Min et.al. 2208.10922 null
2024-03-14 GAIA: Zero-shot Talking Avatar Generation Tianyu He et.al. 2311.15230 null
2024-03-13 Say Anything with Any Style Shuai Tan et.al. 2403.06363 null
2024-03-12 FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization Shuai Tan et.al. 2403.06375 null
2024-03-12 Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style Shuai Tan et.al. 2403.06365 null
2024-03-11 A Comparative Study of Perceptual Quality Metrics for Audio-driven Talking Head Videos Weixia Zhang et.al. 2403.06421 link
2024-03-05 Memories are One-to-Many Mapping Alleviators in Talking Face Generation Anni Tang et.al. 2212.05005 null
2024-03-02 G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment Juan Zhang et.al. 2402.18122 null
2024-03-01 DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder Chenpeng Du et.al. 2303.17550 null
2024-02-29 Learning a Generalized Physical Face Model From Data Lingchen Yang et.al. 2402.19477 null
2024-02-28 Context-aware Talking Face Video Generation Meidai Xuanyuan et.al. 2402.18092 null
2024-02-27 EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions Linrui Tian et.al. 2402.17485 null
2024-02-27 Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis Zicheng Zhang et.al. 2402.17364 link
2024-02-26 Resolution-Agnostic Neural Compression for High-Fidelity Portrait Video Conferencing via Implicit Radiance Fields Yifei Li et.al. 2402.16599 null
2024-02-25 AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation Yasheng Sun et.al. 2402.16124 null
2024-02-21 Bring Your Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized Characters Zechen Bai et.al. 2402.13724 link
2024-02-21 StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing Gaoxiang Cong et.al. 2402.12636 link
2024-02-12 StyleLipSync: Style-based Personalized Lip-sync Video Generation Taekyung Ki et.al. 2305.00521 null
2024-02-08 DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer Zhiyuan Ma et.al. 2402.05712 link
2024-02-05 One-shot Neural Face Reenactment via Finding Directions in GAN's Latent Space Stella Bounareli et.al. 2402.03553 null
2024-02-02 EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation Guanwen Feng et.al. 2402.01422 null
2024-01-31 MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis Wenhao Guan et.al. 2312.10687 null
2024-01-30 Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance Qingcheng Zhao et.al. 2401.15687 null
2024-01-28 Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-Syncing DeepFakes Weifeng Liu et.al. 2401.15668 link
2024-01-27 An Implicit Physical Face Model Driven by Expression and Style Lingchen Yang et.al. 2401.15414 null
2024-01-26 Implicit Neural Representation for Physics-driven Actuated Soft Bodies Lingchen Yang et.al. 2401.14861 null
2024-01-25 SAiD: Speech-driven Blendshape Facial Animation with Diffusion Inkyu Park et.al. 2401.08655 link
2024-01-23 NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for Talking Face Synthesis Chongke Bi et.al. 2401.12568 null
2024-01-19 Fast Registration of Photorealistic Avatars for VR Facial Animation Chaitanya Patel et.al. 2401.11002 null
2024-01-18 Exposing Lip-syncing Deepfakes from Mouth Inconsistencies Soumyya Kanti Datta et.al. 2401.10113 link
2024-01-18 Text-driven Talking Face Synthesis by Reprogramming Audio-driven Models Jeongsoo Choi et.al. 2306.16003 null
2024-01-16 EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model Bingyuan Zhang et.al. 2401.08049 null
2024-01-12 DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder Tao Liu et.al. 2311.01811 link
2024-01-11 Dubbing for Everyone: Data-Efficient Visual Dubbing using Neural Rendering Priors Jack Saunders et.al. 2401.06126 null
2024-01-11 Jump Cut Smoothing for Talking Heads Xiaojuan Wang et.al. 2401.04718 null
2024-01-08 AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation Liyang Chen et.al. 2310.07236 null
2024-01-07 Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness Sicheng Yang et.al. 2401.03476 null
2024-01-04 Expressive Speech-driven Facial Animation with controllable emotions Yutong Chen et.al. 2301.02008 link
2023-12-23 TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation Xize Cheng et.al. 2312.15197 null
2023-12-21 DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation Chenxu Zhang et.al. 2312.13578 null
2023-12-20 FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability Linze Li et.al. 2312.03775 null
2023-12-19 Learning Dense Correspondence for NeRF-Based Face Reenactment Songlin Yang et.al. 2312.10422 null
2023-12-19 Gaussian3Diff: 3D Gaussian Diffusion for 3D Full Head Synthesis and Editing Yushi Lan et.al. 2312.03763 null
2023-12-18 VectorTalker: SVG Talking Face Generation with Progressive Vectorisation Hao Hu et.al. 2312.11568 null
2023-12-18 AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis Dongze Li et.al. 2312.10921 null
2023-12-18 Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation Hui Fu et.al. 2312.10877 null
2023-12-15 DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models Yifeng Ma et.al. 2312.09767 link
2023-12-15 Attention-Based VR Facial Animation with Visual Mouth Camera Guidance for Immersive Telepresence Avatars Andre Rochow et.al. 2312.09750 null
2023-12-13 uTalk: Bridging the Gap Between Humans and AI Hussam Azzuni et.al. 2310.02739 null
2023-12-13 MMFace4D: A Large-Scale Multi-Modal 4D Face Dataset for Audio-Driven 3D Face Animation Haozhe Wu et.al. 2303.09797 null
2023-12-12 GMTalker: Gaussian Mixture based Emotional talking video Portraits Yibo Xia et.al. 2312.07669 null
2023-12-12 GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance Haiming Zhang et.al. 2312.07385 null
2023-12-11 Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism Georgios Milis et.al. 2312.06613 link
2023-12-11 Study of Non-Verbal Behavior in Conversational Agents Camila Vicari Maccari et.al. 2312.06530 null
2023-12-11 DiT-Head: High-Resolution Talking Head Synthesis using Diffusion Transformers Aaron Mir et.al. 2312.06400 null
2023-12-11 Audio-driven Talking Face Generation by Overcoming Unintended Information Flow Dogucan Yaman et.al. 2307.09368 null
2023-12-10 DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation Fa-Ting Hong et.al. 2305.06225 link
2023-12-09 R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning Zhiling Ye et.al. 2312.05572 null
2023-12-09 FT2TF: First-Person Statement Text-To-Talking Face Generation Xingjian Diao et.al. 2312.05430 null
2023-12-08 SingingHead: A Large-scale 4D Dataset for Singing Head Animation Sijing Wu et.al. 2312.04369 null
2023-12-07 VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior Xusen Sun et.al. 2312.01841 null
2023-12-05 PMMTalk: Speech-Driven 3D Facial Animation from Complementary Pseudo Multi-modal Features Tianshun Han et.al. 2312.02781 null
2023-12-05 MyPortrait: Morphable Prior-Guided Personalized Portrait Generation Bo Ding et.al. 2312.02703 null
2023-12-02 DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser Peng Chen et.al. 2311.16565 null
2023-12-01 3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing Balamurugan Thambiraja et.al. 2312.00870 null
2023-11-30 Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data Yu Deng et.al. 2311.18729 null
2023-11-30 Talking Head(?) Anime from a Single Image 4: Improved Model and Its Distillation Pramook Khungurn et.al. 2311.17409 null
2023-11-29 SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis Ziqiao Peng et.al. 2311.17590 link
2023-11-28 THInImg: Cross-modal Steganography for Presenting Talking Heads in Images Lin Zhao et.al. 2311.17177 null
2023-11-28 BakedAvatar: Baking Neural Fields for Real-Time Head Avatar Synthesis Hao-Bin Duan et.al. 2311.05521 link
2023-11-28 Continuously Controllable Facial Expression Editing in Talking Face Videos Zhiyao Sun et.al. 2209.08289 null
2023-11-20 MemoryCompanion: A Smart Healthcare Solution to Empower Efficient Alzheimer's Care Via Unleashing Generative AI Lifei Zheng et.al. 2311.14730 null
2023-11-15 CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding Jianzong Wang et.al. 2311.08673 null
2023-11-13 DualTalker: A Cross-Modal Dual Learning Approach for Speech-Driven 3D Facial Animation Guinan Su et.al. 2311.04766 null
2023-11-12 ChatAnything: Facetime Chat with LLM-Enhanced Personas Yilin Zhao et.al. 2311.06772 null
2023-11-08 Synthetic Speaking Children -- Why We Need Them and How to Make Them Muhammad Ali Farooq et.al. 2311.06307 null
2023-11-06 RADIO: Reference-Agnostic Dubbing Video Synthesis Dongyeun Lee et.al. 2309.01950 null
2023-11-05 3D-Aware Talking-Head Video Motion Transfer Haomiao Ni et.al. 2311.02549 null
2023-11-03 Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading Songtao Luo et.al. 2310.05058 link
2023-11-02 LaughTalk: Expressive 3D Talking Head Generation with Laughter Kim Sung-Bin et.al. 2311.00994 null
2023-11-02 High-Fidelity and Freely Controllable Talking Head Video Generation Yue Gao et.al. 2304.10168 null
2023-10-31 Breathing Life into Faces: Speech-driven 3D Facial Animation with Natural Head Pose and Detailed Shape Wei Zhao et.al. 2310.20240 null
2023-10-29 On the Vulnerability of DeepFake Detectors to Attacks Generated by Denoising Diffusion Models Marija Ivanovska et.al. 2307.05397 null
2023-10-25 Personalized Speech-driven Expressive 3D Facial Animation Synthesis with Style Control Elif Bozkurt et.al. 2310.17011 null
2023-10-23 The Self 2.0: How AI-Enhanced Self-Clones Transform Self-Perception and Improve Presentation Skills Qingxiao Zheng et.al. 2310.15112 null
2023-10-19 Gemino: Practical and Robust Neural Compression for Video Conferencing Vibhaalakshmi Sivaraman et.al. 2209.10507 null
2023-10-17 CorrTalk: Correlation Between Hierarchical Speech and Facial Activity Variances for 3D Animation Zhaojie Chu et.al. 2310.11295 null
2023-10-15 HyperLips: Hyper Control Lips with High Resolution Decoder for Talking Face Generation Yaosen Chen et.al. 2310.05720 link
2023-10-12 CleftGAN: Adapting A Style-Based Generative Adversarial Network To Create Images Depicting Cleft Lip Deformity Abdullah Hayajneh et.al. 2310.07969 link
2023-10-12 Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation Yuan Gan et.al. 2309.04946 link
2023-10-08 GestSync: Determining who is speaking without a talking head Sindhu B Hegde et.al. 2310.05304 link
2023-09-30 DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models Zhiyao Sun et.al. 2310.00434 null
2023-09-28 OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions Jin Liu et.al. 2309.16148 null
2023-09-26 Emotional Speech-Driven Animation with Content-Emotion Disentanglement Radek Daněček et.al. 2306.08990 null
2023-09-20 FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion Stefan Stan et.al. 2309.11306 link
2023-09-20 Context-Aware Talking-Head Video Editing Songlin Yang et.al. 2308.00462 null
2023-09-18 That's What I Said: Fully-Controllable Talking Face Generation Youngjoon Jang et.al. 2304.03275 null
2023-09-15 Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech Junjie Li et.al. 2309.08408 link
2023-09-14 DT-NeRF: Decomposed Triplane-Hash Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis Yaoyu Su et.al. 2309.07752 null
2023-09-14 DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks Zipeng Qi et.al. 2309.07509 null
2023-09-14 HDTR-Net: A Real-Time High-Definition Teeth Restoration Network for Arbitrary Talking Face Generation Methods Yongyuan Li et.al. 2309.07495 link
2023-09-13 PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network Qinghua Liu et.al. 2309.06723 null
2023-09-12 DF-TransFusion: Multimodal Deepfake Detection via Lip-Audio Cross-Attention and Facial Self-Attention Aaditya Kharel et.al. 2309.06511 null
2023-09-12 Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos Ekta Prashnani et.al. 2305.03713 null
2023-09-11 ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment Yicheng Zhong et.al. 2308.14448 null
2023-09-10 MaskRenderer: 3D-Infused Multi-Mask Realistic Face Reenactment Tina Behrouzi et.al. 2309.05095 null
2023-09-09 Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video Xiuzhe Wu et.al. 2309.04814 link
2023-09-01 Unsupervised Learning of Style-Aware Facial Animation from Real Acting Performances Wolfgang Paier et.al. 2306.10006 null
2023-08-30 From Pixels to Portraits: A Comprehensive Survey of Talking Head Generation Techniques and Applications Shreyank N Gowda et.al. 2308.16041 null
2023-08-30 SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces Ziqiao Peng et.al. 2306.10799 link
2023-08-30 Laughing Matters: Introducing Laughing-Face Generation using Diffusion Models Antoni Bigata Casademunt et.al. 2305.08854 link
2023-08-29 Papeos: Augmenting Research Papers with Talk Videos Tae Soo Kim et.al. 2308.15224 null
2023-08-25 EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation Ziqiao Peng et.al. 2303.11089 link
2023-08-24 ToonTalker: Cross-Domain Face Reenactment Yuan Gong et.al. 2308.12866 null
2023-08-24 Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis Jiahe Li et.al. 2307.09323 link
2023-08-23 DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion Se Jin Park et.al. 2310.05934 null
2023-08-21 Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesis Tong Sha et.al. 2109.02081 null
2023-08-18 Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization Soumik Mukhopadhyay et.al. 2308.09716 link
2023-08-18 Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation Fa-Ting Hong et.al. 2307.09906 link
2023-08-17 A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation Li Liu et.al. 2308.08849 link
2023-08-16 Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with Instructions Yuqi Sun et.al. 2306.10813 null
2023-08-12 Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation Zhichao Wang et.al. 2308.06457 link
2023-08-12 DialogueNeRF: Towards Realistic Avatar Face-to-Face Conversation Video Generation Yichao Yan et.al. 2203.07931 null
2023-08-11 Versatile Face Animator: Driving Arbitrary 3D Facial Avatar in RGBD Space Haoyu Wang et.al. 2308.06076 link
2023-08-11 VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer Liyang Chen et.al. 2308.04830 null
2023-08-10 Near-realtime Facial Animation by Deep 3D Simulation Super-Resolution Hyojoon Park et.al. 2305.03216 null
2023-08-02 Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis Zhenhui Ye et.al. 2306.03504 null
2023-07-29 Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation Michał Stypułkowski et.al. 2301.03396 null
2023-07-26 Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation Federico Nocentini et.al. 2306.01415 link
2023-07-20 HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces Stella Bounareli et.al. 2307.10797 link
2023-07-19 MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions Yunfei Liu et.al. 2307.10008 null
2023-07-19 Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline Zhigang Chang et.al. 2307.09821 null
2023-07-19 OPHAvatars: One-shot Photo-realistic Head Avatars Shaoxu Li et.al. 2307.09153 link
2023-07-18 FACTS: Facial Animation Creation using the Transfer of Styles Jack Saunders et.al. 2307.09480 null
2023-07-09 Predictive Coding For Animation-Based Video Compression Goluck Konuko et.al. 2307.04187 null
2023-07-08 FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction Ganglai Wang et.al. 2307.03990 null
2023-07-05 Interactive Conversational Head Generation Mohan Zhou et.al. 2307.02090 null
2023-07-04 A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation Louis Airale et.al. 2307.03270 link
2023-07-04 Generating Animatable 3D Cartoon Faces from Single Portraits Chuanyu Pan et.al. 2307.01468 null
2023-07-03 RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting Self-Supervised Representations Neha Sahipjohn et.al. 2307.01233 null
2023-06-20 Audio-Driven 3D Facial Animation from In-the-Wild Videos Liying Lu et.al. 2306.11541 null
2023-06-13 Parametric Implicit Face Representation for Audio-Driven Facial Reenactment Ricong Huang et.al. 2306.07579 null
2023-06-13 AniFaceDrawing: Anime Portrait Exploration during Your Sketching Zhengyu Huang et.al. 2306.07476 null
2023-06-12 NPVForensics: Jointing Non-critical Phonemes and Visemes for Deepfake Detection Yu Chen et.al. 2306.06885 null
2023-06-10 StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles Yifeng Ma et.al. 2301.01081 link
2023-06-08 ReliableSwap: Boosting General Face Swapping Via Reliable Supervision Ge Yuan et.al. 2306.05356 link
2023-06-06 Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks Jianrong Wang et.al. 2306.03594 null
2023-06-05 Instruct-Video2Avatar: Video-to-Avatar Generation with Instructions Shaoxu Li et.al. 2306.02903 link
2023-05-31 High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning Chao Xu et.al. 2305.02572 null
2023-05-23 CPNet: Exploiting CLIP-based Attention Condenser and Probability Map Guidance for High-fidelity Talking Face Generation Jingning Xu et.al. 2305.13962 null
2023-05-22 RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars Dongwei Pan et.al. 2305.13353 link
2023-05-19 UniFLG: Unified Facial Landmark Generator from Text or Speech Kentaro Mitsui et.al. 2302.14337 null
2023-05-18 An Android Robot Head as Embodied Conversational Agent Marcel Heisler et.al. 2305.10945 null
2023-05-18 Audio-Visual Person-of-Interest DeepFake Detection Davide Cozzolino et.al. 2204.03083 link
2023-05-17 INCLG: Inpainting for Non-Cleft Lip Generation with a Multi-Task Image Processing Network Shuang Chen et.al. 2305.10589 null
2023-05-17 LPMM: Intuitive Pose Control for Neural Talking-Head Model via Landmark-Parameter Morphable Model Kwangho Lee et.al. 2305.10456 null
2023-05-15 Identity-Preserving Talking Face Generation with Landmark and Appearance Priors Weizhi Zhong et.al. 2305.08293 link
2023-05-09 Zero-shot personalized lip-to-speech synthesis with face image based voice control Zheng-Yan Sheng et.al. 2305.14359 null
2023-05-09 StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator Jiazhi Guan et.al. 2305.05445 null
2023-05-09 Multimodal-driven Talking Face Generation via a Unified Diffusion-based Generator Chao Xu et.al. 2305.02594 null
2023-05-01 StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video Lizhen Wang et.al. 2305.00942 link
2023-05-01 GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation Zhenhui Ye et.al. 2305.00787 null
2023-04-28 A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation Bo-Kyeong Kim et.al. 2304.00471 null
2023-04-27 Controllable One-Shot Face Video Synthesis With Semantic Aware Prior Kangning Liu et.al. 2304.14471 null
2023-04-25 AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head Rongjie Huang et.al. 2304.12995 link
2023-04-24 VR Facial Animation for Immersive Telepresence Avatars Andre Rochow et.al. 2304.12051 null
2023-04-21 Implicit Neural Head Synthesis via Controllable Local Deformation Fields Chuhan Chen et.al. 2304.11113 null
2023-04-20 DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation Shuai Shen et.al. 2301.03786 link
2023-04-18 Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations Rongliang Wu et.al. 2304.08945 null
2023-04-17 Autoregressive GAN for Semantic Unconditional Head Motion Generation Louis Airale et.al. 2211.00987 link
2023-04-11 One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field Weichuang Li et.al. 2304.05097 null
2023-04-06 Face Animation with an Attribute-Guided Diffusion Model Bohan Zeng et.al. 2304.03199 link
2023-04-06 4D Agnostic Real-Time Facial Animation Pipeline for Desktop Scenarios Wei Chen et.al. 2304.02814 null
2023-04-03 CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior Jinbo Xing et.al. 2301.02379 link
2023-04-01 DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance Longwen Zhang et.al. 2304.03117 null
2023-04-01 TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles Yifeng Ma et.al. 2304.00334 null
2023-03-31 FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions Jin Liu et.al. 2303.17789 null
2023-03-29 Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert Jiadong Wang et.al. 2303.17480 link
2023-03-27 OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis Hongyi Xu et.al. 2303.15539 null
2023-03-27 Accurate and Interpretable Solution of the Inverse Rig for Realistic Blendshape Models with Quadratic Corrective Terms Stevo Racković et.al. 2302.04843 null
2023-03-27 MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation Bowen Zhang et.al. 2212.08062 link
2023-03-27 A Majorization-Minimization Based Method for Nonconvex Inverse Rig Problems in Facial Animation: Algorithm Derivation Stevo Racković et.al. 2205.04289 null
2023-03-26 OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering Zhiyuan Ma et.al. 2303.14662 link
2023-03-26 Emotionally Enhanced Talking Face Generation Sahil Goyal et.al. 2303.11548 link
2023-03-26 Distributed Solution of the Inverse Rig Problem in Blendshape Facial Animation Stevo Racković et.al. 2303.06370 null
2023-03-24 Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement Siddarth Ravichandran et.al. 2209.01320 null
2023-03-23 PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360 $^{\circ}$ Sizhe An et.al. 2303.13071 null
2023-03-22 Style Transfer for 2D Talking Head Animation Trong-Thang Pham et.al. 2303.09799 link
2023-03-22 MARLIN: Masked Autoencoder for facial video Representation LearnINg Zhixi Cai et.al. 2211.06627 link
2023-03-14 DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions Geumbyeol Hwang et.al. 2303.07697 link
2023-03-13 SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation Wenxuan Zhang et.al. 2211.12194 link
2023-03-09 FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning Kazi Injamamul Haque et.al. 2303.05416 link
2023-03-09 Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation Qi Chen et.al. 2303.05322 link
2023-03-07 DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video Zhimeng Zhang et.al. 2303.03988 link
2023-03-05 Cyber Vaccine for Deepfake Immunity Ching-Chun Chang et.al. 2303.02659 null
2023-03-04 High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors Yunpeng Bai et.al. 2211.15064 null
2023-03-01 DPE: Disentanglement of Pose and Expression for General Video Portrait Editing Youxin Pang et.al. 2301.06281 link
2023-02-27 Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video Minsu Kim et.al. 2303.08670 null
2023-02-27 Memory-augmented Contrastive Learning for Talking Head Generation Jianrong Wang et.al. 2302.13469 link
2023-02-24 Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertex Attention Bin Liu et.al. 2302.12532 null
2023-02-16 OPT: One-shot Pose-Controllable Talking Head Generation Jin Liu et.al. 2302.08197 null
2023-02-14 Expressive Talking Head Video Encoding in StyleGAN2 Latent-Space Trevine Oorloff et.al. 2203.14512 link
2023-01-31 GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis Zhenhui Ye et.al. 2301.13430 null
2023-01-23 Data standardization for robust lip sync Chun Wang et.al. 2202.06198 null
2023-01-20 Neural Volumetric Blendshapes: Computationally Efficient Physics-Based Facial Blendshapes Nicolas Wagner et.al. 2212.14784 null
2023-01-15 Learning Audio-Driven Viseme Dynamics for 3D Face Animation Linchao Bao et.al. 2301.06059 null
2022-12-30 Imitator: Personalized Speech-driven 3D Facial Animation Balamurugan Thambiraja et.al. 2301.00023 null
2022-12-28 All's well that FID's well? Result quality and metric scores in GAN models for lip-sychronization tasks Carina Geldhauser et.al. 2212.13810 null
2022-12-23 Dubbing in Practice: A Large Scale Study of Human Localization With Insights for Automatic Dubbing William Brannon et.al. 2212.12137 null
2022-12-09 Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers Yasheng Sun et.al. 2212.04970 null
2022-12-07 Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors Zhentao Yu et.al. 2212.04248 null
2022-12-07 SPACE: Speech-driven Portrait Animation with Controllable Expression Siddharth Gururani et.al. 2211.09809 null
2022-11-30 Extracting Semantic Knowledge from GANs with Unsupervised Learning Jianjin Xu et.al. 2211.16710 null
2022-11-27 VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild Kun Cheng et.al. 2211.14758 null
2022-11-26 Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis Duomin Wang et.al. 2211.14506 link
2022-11-22 Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition Jiaxiang Tang et.al. 2211.12368 null
2022-11-10 On the role of Lip Articulation in Visual Speech Perception Zakaria Aldeneh et.al. 2203.10117 null
2022-11-03 SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory Se Jin Park et.al. 2211.00924 null
2022-10-21 Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection Alexandros Haliassos et.al. 2201.07131 link
2022-10-13 Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors Vladimir Iashin et.al. 2210.07055 link
2022-10-13 Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar Aolan Sun et.al. 2210.06877 null
2022-10-07 Compressing Video Calls using Synthetic Talking Heads Madhav Agarwal et.al. 2210.03692 null
2022-10-07 A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis Yichen Han et.al. 2210.03335 null
2022-10-06 Audio-Visual Face Reenactment Madhav Agarwal et.al. 2210.02755 link
2022-10-06 Finding Directions in GAN's Latent Space for Neural Face Reenactment Stella Bounareli et.al. 2202.00046 link
2022-10-04 Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at Scale Aditya Agarwal et.al. 2208.09796 null
2022-09-29 Facial Landmark Predictions with Applications to Metaverse Qiao Han et.al. 2209.14698 link
2022-09-27 StyleMask: Disentangling the Style Space of StyleGAN2 for Neural Face Reenactment Stella Bounareli et.al. 2209.13375 link
2022-09-23 EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model Xinya Ji et.al. 2205.15278 null
2022-09-21 FNeVR: Neural Volume Rendering for Face Animation Bohan Zeng et.al. 2209.10340 link
2022-09-19 AutoLV: Automatic Lecture Video Generator Wenbin Wang et.al. 2209.08795 null
2022-09-09 Talking Head from Speech Audio using a Pre-trained Image Generator Mohammed M. Alghamdi et.al. 2209.04252 null
2022-09-07 Restructurable Activation Networks Kartikeya Bhardwaj et.al. 2208.08562 link
2022-08-29 StableFace: Analyzing and Improving Motion Stability for Talking Face Generation Jun Ling et.al. 2208.13717 null
2022-08-17 Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors Sindhu B Hegde et.al. 2208.08118 link
2022-08-03 Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control Michail Christos Doukas et.al. 2208.02210 null
2022-08-02 Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer Ailin Huang et.al. 2206.12837 link
2022-08-01 A Feasibility Study on Image Inpainting for Non-cleft Lip Generation from Patients with Cleft Lip Shuang Chen et.al. 2208.01149 link
2022-07-27 A Hybrid Deep Animation Codec for Low-bitrate Video Conferencing Goluck Konuko et.al. 2207.13530 null
2022-07-24 Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis Shuai Shen et.al. 2207.11770 link
2022-07-22 Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos Panagiotis P. Filntisis et.al. 2207.11094 link
2022-07-20 NARRATE: A Normal Assisted Free-View Portrait Stylizer Youjia Wang et.al. 2207.00974 null
2022-07-20 VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection Joanna Hong et.al. 2206.07458 null
2022-07-20 Responsive Listening Head Generation: A Benchmark Dataset and Baseline Mohan Zhou et.al. 2112.13548 null
2022-07-13 FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis Yongqi Wang et.al. 2207.03800 link
2022-06-29 Cut Inner Layers: A Structured Pruning Strategy for Efficient U-Net GANs Bo-Kyeong Kim et.al. 2206.14658 null
2022-06-09 Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos Alexander Waibel et.al. 2206.04523 null
2022-05-31 Text/Speech-Driven Full-Body Animation Wenlin Zhuang et.al. 2205.15573 null
2022-05-27 Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast Boqing Zhu et.al. 2204.14057 link
2022-05-26 One-Shot Face Reenactment on Megapixels Wonjun Kang et.al. 2205.13368 null
2022-05-24 Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts Debjoy Saha et.al. 2205.12194 link
2022-05-20 MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement Alexander Richard et.al. 2104.08223 link
2022-05-13 Talking Face Generation with Multilingual TTS Hyoung-Kyu Song et.al. 2205.06421 null
2022-05-02 Emotion-Controllable Generalized Talking Face Generation Sanjana Sinha et.al. 2205.01155 null
2022-05-02 A Novel Speech-Driven Lip-Sync Model with CNN and LSTM Xiaohong Li et.al. 2205.00916 null
2022-04-27 Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion Sen Chen et.al. 2204.12756 null
2022-04-25 Fast Facial Landmark Detection and Applications: A Survey Kostiantyn Khabarlak et.al. 2101.10808 null
2022-04-13 Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions Zipeng Ye et.al. 2204.06180 null
2022-04-06 Transformer-S2A: Robust and Efficient Speech-to-Animation Liyang Chen et.al. 2111.09771 null
2022-04-03 Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text Pulkit Tandon et.al. 2106.14014 link
2022-03-30 End to End Lip Synchronization with a Temporal AutoEncoder Yoav Shalev et.al. 2203.16224 link
2022-03-29 Thin-Plate Spline Motion Model for Image Animation Jian Zhao et.al. 2203.14367 link
2022-03-17 StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN Fei Yin et.al. 2203.04036 link
2022-03-17 FaceFormer: Speech-Driven 3D Facial Animation with Transformers Yingruo Fan et.al. 2112.05329 link
2022-03-16 Efficient conditioned face animation using frontally-viewed embedding Maxime Oquab et.al. 2203.08765 null
2022-03-15 Depth-Aware Generative Adversarial Network for Talking Head Video Generation Fa-Ting Hong et.al. 2203.06605 link
2022-03-10 An Audio-Visual Attention Based Multimodal Network for Fake Talking Face Videos Detection Ganglai Wang et.al. 2203.05178 null
2022-03-08 Attention-Based Lip Audio-Visual Synthesis for Talking Face Generation in the Wild Ganglai Wang et.al. 2203.03984 null
2022-03-04 Multi-modality Deep Restoration of Extremely Compressed Face Videos Xi Zhang et.al. 2107.05548 null
2022-03-01 FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset Hasam Khalid et.al. 2108.05080 link
2022-02-25 FSGANv2: Improved Subject Agnostic Face Swapping and Reenactment Yuval Nirkin et.al. 2202.12972 null
2022-02-22 Thinking the Fusion Strategy of Multi-reference Face Reenactment Takuya Yashima et.al. 2202.10758 null
2022-01-24 Selective Listening by Synchronizing Speech with Lips Zexu Pan et.al. 2106.07150 link
2022-01-22 Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary Sibo Zhang et.al. 2104.14631 null
2022-01-21 Stitch it in Time: GAN-Based Facial Editing of Real Videos Rotem Tzaban et.al. 2201.08361 link
2022-01-17 Towards Realistic Visual Dubbing with Heterogeneous Sources Tianyi Xie et.al. 2201.06260 null
2022-01-16 Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels Zipeng Ye et.al. 2201.05986 null
2022-01-03 DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering Shunyu Yao et.al. 2201.00791 null
2021-12-20 Parallel and High-Fidelity Text-to-Lip Generation Jinglin Liu et.al. 2107.06831 link
2021-12-19 Initiative Defense against Facial Manipulation Qidong Huang et.al. 2112.10098 link
2021-12-07 Joint Audio-Text Model for Expressive Speech-Driven 3D Facial Animation Yingruo Fan et.al. 2112.02214 null
2021-12-06 One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning Suzhen Wang et.al. 2112.02749 null
2021-11-29 Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates Shenhan Qian et.al. 2108.08020 link
2021-11-04 FEAFA+: An Extended Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation Wei Gan et.al. 2111.02751 null
2021-11-02 BiosecurID: a multimodal biometric database Julian Fierrez et.al. 2111.03472 null
2021-10-30 Imitating Arbitrary Talking Style for Realistic Audio-DrivenTalking Face Synthesis Haozhe Wu et.al. 2111.00203 link
2021-10-26 Emotion recognition in talking-face videos using persistent entropy and neural networks Eduardo Paluzo-Hidalgo et.al. 2110.13571 link
2021-10-26 ViDA-MAN: Visual Dialog with Digital Humans Tong Shen et.al. 2110.13384 null
2021-10-22 Invertible Frowns: Video-to-Video Facial Emotion Translation Ian Magnusson et.al. 2109.08061 null
2021-10-19 Talking Head Generation with Audio and Speech Related Facial Action Units Sen Chen et.al. 2110.09951 null
2021-10-16 Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor Anchit Gupta et.al. 2110.08580 null
2021-10-12 Fine-grained Identity Preserving Landmark Synthesis for Face Reenactment Haichao Zhang et.al. 2110.04708 null
2021-10-07 Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution Yangyang Shi et.al. 2110.05241 null
2021-09-24 Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation Yuanxun Lu et.al. 2109.10595 null
2021-09-20 Accurate, Interpretable, and Fast Animation: An Iterative, Sparse, and Nonconvex Approach Stevo Rackovic et.al. 2109.08356 null
2021-09-17 Detection of GAN-synthesized street videos Omran Alamayreh et.al. 2109.04991 null
2021-08-30 Audiovisual Speech Synthesis using Tacotron2 Ahmed Hussen Abdelaziz et.al. 2008.00620 null
2021-08-23 KoDF: A Large-scale Korean DeepFake Detection Dataset Patrick Kwon et.al. 2103.10094 null
2021-08-23 HeadGAN: One-shot Neural Head Synthesis and Editing Michail Christos Doukas et.al. 2012.08261 null
2021-08-19 AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis Yudong Guo et.al. 2103.11078 link
2021-08-18 DeepFake MNIST+: A DeepFake Facial Animation Dataset Jiajun Huang et.al. 2108.07949 link
2021-08-18 FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning Chenxu Zhang et.al. 2108.07938 link
2021-08-12 UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing Meng Cao et.al. 2108.05650 null
2021-08-11 AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person Xinsheng Wang et.al. 2108.04325 null
2021-08-06 SofGAN: A Portrait Image Generator with Dynamic Styling Anpei Chen et.al. 2007.03780 link
2021-07-27 Beyond Voice Identity Conversion: Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations Laurent Benaroya et.al. 2107.12346 null
2021-07-21 Speech Driven Talking Face Generation from a Single Image and an Emotion Condition Sefik Emre Eskimez et.al. 2008.03592 link
2021-07-20 Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion Suzhen Wang et.al. 2107.09293 link
2021-07-10 Speech2Video: Cross-Modal Distillation for Speech to Video Generation Shijing Si et.al. 2107.04806 null
2021-07-07 Egocentric Videoconferencing Mohamed Elgharib et.al. 2107.03109 null
2021-06-08 LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization Avisek Lahiri et.al. 2106.04185 null
2021-05-20 Audio-Driven Emotional Video Portraits Xinya Ji et.al. 2104.07452 null
2021-05-07 Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation Lincheng Li et.al. 2104.07995 link
2021-05-05 A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors Ruobing Zheng et.al. 2002.08700 null
2021-04-29 Learned Spatial Representations for Few-shot Talking-Head Synthesis Moustafa Meshry et.al. 2104.14557 null
2021-04-26 One-shot Face Reenactment Using Appearance Adaptive Normalization Guangming Yao et.al. 2102.03984 null
2021-04-25 3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head Qianyun Wang et.al. 2104.12051 null
2021-04-22 Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation Hang Zhou et.al. 2104.11116 link
2021-04-07 Single Source One Shot Reenactment using Weighted motion From Paired Feature Points Soumya Tripathy et.al. 2104.03117 null
2021-04-07 Everything's Talkin': Pareidolia Face Reenactment Linsen Song et.al. 2104.03061 link
2021-04-07 LI-Net: Large-Pose Identity-Preserving Face Reenactment Network Jin Liu et.al. 2104.02850 null
2021-04-02 One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing Ting-Chun Wang et.al. 2011.15126 null
2021-03-20 Not made for each other- Audio-Visual Dissonance-based Deepfake Detection and Localization Komal Chugh et.al. 2005.14405 link
2021-03-19 End-to-End Lip Synchronisation Based on Pattern Classification You Jin Kim et.al. 2005.08606 null
2021-03-05 Real-time RGBD-based Extended Body Pose Estimation Renat Bashirov et.al. 2103.03663 link
2021-03-03 Estimating Uniqueness of I-Vector Representation of Human Voice Erkam Sinan Tandogan et.al. 2008.11985 null
2021-02-25 MakeItTalk: Speaker-Aware Talking-Head Animation Yang Zhou et.al. 2004.12992 null
2021-02-19 One Shot Audio to Animated Video Generation Neeraj Kumar et.al. 2102.09737 null
2021-02-18 AudioVisual Speech Synthesis: A brief literature review Efthymios Georgiou et.al. 2103.03927 null
2020-12-14 Robust One Shot Audio to Video Generation Neeraj Kumar et.al. 2012.07842 null
2020-12-14 Multi Modal Adaptive Normalization for Audio to Video Generation Neeraj Kumar et.al. 2012.07304 null
2020-11-30 Adaptive Compact Attention For Few-shot Video-to-video Translation Risheng Huang et.al. 2011.14695 null
2020-11-21 Stochastic Talking Face Generation Using Latent Distribution Matching Ravindra Yadav et.al. 2011.10727 link
2020-11-21 Iterative Text-based Editing of Talking-heads Using Neural Retargeting Xinwei Yao et.al. 2011.10688 null
2020-11-09 FACEGAN: Facial Attribute Controllable rEenactment GAN Soumya Tripathy et.al. 2011.04439 null
2020-11-06 Large-scale multilingual audio visual dubbing Yi Yang et.al. 2011.03530 null
2020-11-02 Facial Keypoint Sequence Generation from Audio Prateek Manocha et.al. 2011.01114 null
2020-10-25 APB2FaceV2: Real-Time Audio-Guided Multi-Face Reenactment Jiangning Zhang et.al. 2010.13017 link
2020-10-12 Intuitive Facial Animation Editing Based On A Generative RNN Framework Eloïse Berson et.al. 2010.05655 null
2020-10-05 SMILE: Semantically-guided Multi-attribute Image and Layout Editing Andrés Romero et.al. 2010.02315 link
2020-10-05 Dynamic Facial Asset and Rig Generation from a Single Scan Jiaman Li et.al. 2010.00560 null
2020-09-20 An Improved Approach of Intention Discovery with Machine Learning for POMDP-based Dialogue Management Ruturaj Raval et.al. 2009.09354 null
2020-09-18 Mesh Guided One-shot Face Reenactment using Graph Convolutional Networks Guangming Yao et.al. 2008.07783 null
2020-09-12 DualLip: A System for Joint Lip Reading and Generation Weicong Chen et.al. 2009.05784 null
2020-09-02 Seeing wake words: Audio-visual Keyword Spotting Liliane Momeni et.al. 2009.01225 null
2020-08-29 "It took me almost 30 minutes to practice this". Performance and Production Practices in Dance Challenge Videos on TikTok Daniel Klug et.al. 2008.13040 null
2020-08-23 A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild K R Prajwal et.al. 2008.10010 link
2020-08-11 Audio- and Gaze-driven Facial Animation of Codec Avatars Alexander Richard et.al. 2008.05023 null
2020-08-04 Speaker dependent acoustic-to-articulatory inversion using real-time MRI of the vocal tract Tamás Gábor Csapó et.al. 2008.02098 link
2020-08-04 Real-Time Cleaning and Refinement of Facial Animation Signals Eloïse Berson et.al. 2008.01332 null
2020-08-02 Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos Yanhui Guo et.al. 2008.01652 null
2020-07-29 Neural Voice Puppetry: Audio-driven Facial Reenactment Justus Thies et.al. 1912.05566 link
2020-07-20 Deformable Style Transfer Sunnie S. Y. Kim et.al. 2003.11038 link
2020-07-18 A Robust Interactive Facial Animation Editing System Eloïse Berson et.al. 2007.09367 null
2020-07-16 Talking-head Generation with Rhythmic Head Motion Lele Chen et.al. 2007.08547 link
2020-07-08 Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision Abhinav Shukla et.al. 2007.04134 null
2020-06-20 Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams Huirong Huang et.al. 2006.11610 null
2020-05-27 Modality Dropout for Improved Performance-driven Talking Faces Ahmed Hussen Abdelaziz et.al. 2005.13616 null
2020-05-25 Identity-Preserving Realistic Talking Face Generation Sanjana Sinha et.al. 2005.12318 null
2020-05-22 Head2Head: Video-based Neural Head Synthesis Mohammad Rami Koujan et.al. 2005.10954 null
2020-05-16 FReeNet: Multi-Identity Face Reenactment Jiangning Zhang et.al. 1905.11805 null
2020-05-13 FaR-GAN for One-Shot Face Reenactment Hanxiang Hao et.al. 2005.06402 null
2020-05-13 Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning Hao Zhu et.al. 1812.06589 null
2020-05-11 Dancing to the Partisan Beat: A First Analysis of Political Communication on TikTok Juan Carlos Medina Serrano et.al. 2004.05478 link
2020-05-07 What comprises a good talking-head video generation?: A Survey and Benchmark Lele Chen et.al. 2005.03201 link
2020-05-04 Disentangled Speech Embeddings using Cross-modal Self-supervision Arsha Nagrani et.al. 2002.08742 null
2020-04-30 APB2Face: Audio-guided face reenactment with auxiliary pose and blink signals Jiangning Zhang et.al. 2004.14569 null
2020-03-30 ActGAN: Flexible and Efficient One-shot Face Reenactment Ivan Kosarevych et.al. 2003.13840 null
2020-03-29 Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose Xianfang Zeng et.al. 2003.12957 null
2020-03-26 High-Accuracy Facial Depth Models derived from 3D Synthetic Data Faisal Khan et.al. 2003.06211 null
2020-03-05 Talking-Heads Attention Noam Shazeer et.al. 2003.02436 link
2020-03-05 Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose Ran Yi et.al. 2002.10137 link
2020-03-01 Towards Automatic Face-to-Face Translation Prajwal K R et.al. 2003.00418 link
2020-02-19 Speech-driven facial animation using polynomial fusion of features Triantafyllos Kefalas et.al. 1912.05833 null
2020-01-17 ICface: Interpretable and Controllable Face Reenactment Using GANs Soumya Tripathy et.al. 1904.01909 null
2019-12-20 Disentangling Style and Content in Anime Illustrations Sitao Xiang et.al. 1905.10742 null
2019-11-21 FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis Kuangxiao Gu et.al. 1911.09224 null
2019-11-19 MarioNETte: Few-shot Face Reenactment Preserving Identity of Unseen Targets Sungjoo Ha et.al. 1911.08139 null
2019-10-28 Few-shot Video-to-Video Synthesis Ting-Chun Wang et.al. 1910.12713 null
2019-10-19 Real-Time Lip Sync for Live 2D Animation Deepali Aneja et.al. 1910.08685 link
2019-10-16 Designing Style Matching Conversational Agents Deepali Aneja et.al. 1910.07514 null
2019-10-15 A High-Fidelity Open Embodied Avatar with Lip Syncing and Expression Capabilities Deepali Aneja et.al. 1909.08766 link
2019-10-09 EmoCo: Visual Analysis of Emotion Coherence in Presentation Videos Haipeng Zeng et.al. 1907.12918 null
2019-10-02 Animating Face using Disentangled Audio Representations Gaurav Mittal et.al. 1910.00726 null
2019-09-25 Few-Shot Adversarial Learning of Realistic Neural Talking Head Models Egor Zakharov et.al. 1905.08233 null
2019-09-06 Neural Style-Preserving Visual Dubbing Hyeongwoo Kim et.al. 1909.02518 null
2019-08-29 3D Face Pose and Animation Tracking via Eigen-Decomposition based Bayesian Approach Ngoc-Trung Tran et.al. 1908.11039 null
2019-08-20 Prosodic Phrase Alignment for Machine Dubbing Alp Öktem et.al. 1908.07226 link
2019-08-16 FSGAN: Subject Agnostic Face Swapping and Reenactment Yuval Nirkin et.al. 1908.05932 link
2019-08-11 Emotion Dependent Facial Animation from Affective Speech Rizwan Sadiq et.al. 1908.03904 null
2019-08-05 One-shot Face Reenactment Yunxuan Zhang et.al. 1908.03251 link
2019-07-25 Talking Face Generation by Conditional Recurrent Adversarial Network Yang Song et.al. 1804.04786 link
2019-07-24 Data-Driven Physical Face Inversion Yeara Kozlov et.al. 1907.10402 null
2019-07-23 A system for efficient 3D printed stop-motion face animation Rinat Abdrashitov et.al. 1907.10163 null
2019-06-14 Realistic Speech-Driven Facial Animation with GANs Konstantinos Vougioukas et.al. 1906.06337 null
2019-06-04 Text-based Editing of Talking-head Video Ohad Fried et.al. 1906.01524 null
2019-05-27 Audio2Face: Generating Speech/Face Animation from Single Audio with Attention-Based Bidirectional LSTM Networks Guanzhong Tian et.al. 1905.11142 null
2019-05-09 Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss Lele Chen et.al. 1905.03820 link
2019-05-08 Capture, Learning, and Synthesis of 3D Speaking Styles Daniel Cudeiro et.al. 1905.03079 link
2019-04-23 Talking Face Generation by Adversarially Disentangled Audio-Visual Representation Hang Zhou et.al. 1807.07860 null
2019-04-02 FEAFA: A Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation Yanfu Yan et.al. 1904.01509 null
2019-03-13 Animating an Autonomous 3D Talking Avatar Dominik Borer et.al. 1903.05448 null
2018-12-22 Deep Audio-Visual Speech Recognition Triantafyllos Afouras et.al. 1809.02108 null
2018-12-20 DeepFakes: a New Threat to Face Recognition? Assessment and Detection Pavel Korshunov et.al. 1812.08685 null
2018-11-22 Towards Highly Accurate and Stable Face Alignment for High-Resolution Videos Ying Tai et.al. 1811.00342 link
2018-11-16 Influence of visual cues on head and eye movements during listening tasks in multi-talker audiovisual environments with animated characters Maartje M. E. Hendrikse et.al. 1812.02088 null
2018-08-28 GANimation: Anatomically-aware Facial Animation from a Single Image Albert Pumarola et.al. 1807.09251 link
2018-08-19 Dynamic Temporal Alignment of Speech to Lips Tavi Halperin et.al. 1808.06250 link
2018-07-29 ReenactGAN: Learning to Reenact Faces via Boundary Transfer Wayne Wu et.al. 1807.11079 link
2018-07-26 Learnable PINs: Cross-Modal Embeddings for Person Identity Arsha Nagrani et.al. 1805.00833 null
2018-07-19 End-to-End Speech-Driven Facial Animation with Temporal GANs Konstantinos Vougioukas et.al. 1805.09313 null
2018-05-29 Deep Video Portraits Hyeongwoo Kim et.al. 1805.11714 null
2018-05-24 VisemeNet: Audio-Driven Animator-Centric Speech Animation Yang Zhou et.al. 1805.09488 null
2018-05-21 Anime Style Space Exploration Using Metric Learning and Generative Adversarial Networks Sitao Xiang et.al. 1805.07997 null
2018-04-23 Generating Talking Face Landmarks from Speech Sefik Emre Eskimez et.al. 1803.09803 null
2018-03-28 Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network Hai X. Pham et.al. 1803.07716 null
2018-03-20 Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks Seyed Ali Jalalifar et.al. 1803.07461 null
2017-12-07 End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech Hai X. Pham et.al. 1710.00920 null
2017-12-06 ObamaNet: Photo-realistic lip-sync from text Rithesh Kumar et.al. 1801.01442 null
2017-07-30 Kernel Projection of Latent Structures Regression for Facial Animation Retargeting Christos Ouzounis et.al. 1707.09629 null
2017-07-26 Fast Deep Matting for Portrait Animation on Mobile Phone Bingke Zhu et.al. 1707.08289 null
2017-07-21 Multichannel Attention Network for Analyzing Visual Behavior in Public Speaking Rahul Sharma et.al. 1707.06830 null
2017-07-18 You said that? Joon Son Chung et.al. 1705.02966 null
2017-01-30 Lip Reading Sentences in the Wild Joon Son Chung et.al. 1611.05358 link
2016-10-28 Galaxy gas as obscurer: II. Separating the galaxy-scale and nuclear obscurers of Active Galactic Nuclei Johannes Buchner et.al. 1610.09380 link
2016-07-11 Large-Scale MIMO is Capable of Eliminating Power-Thirsty Channel Coding for Wireless Transmission of HEVC/H.265 Video Shaoshi Yang et.al. 1601.06684 null
2016-05-22 Improving Facial Analysis and Performance Driven Animation through Disentangling Identity and Expression David Rim et.al. 1512.08212 null
2016-02-08 Automatic Face Reenactment Pablo Garrido et.al. 1602.02651 null
2015-11-20 ExpressionBot: An Emotive Lifelike Robotic Face for Face-to-Face Communication Ali Mollahosseini et.al. 1511.06502 null
2014-09-03 Visual Speech Recognition Ahmad B. A. Hassanat et.al. 1409.1411 null
2012-09-22 Using multimodal speech production data to evaluate articulatory animation for audiovisual speech synthesis Ingmar Steiner et.al. 1209.4982 null
2012-03-30 Face Expression Recognition and Analysis: The State of the Art Vinay Bettadapura et.al. 1203.6722 null
2012-01-19 Progress in animation of an EMA-controlled tongue model for acoustic-visual speech synthesis Ingmar Steiner et.al. 1201.4080 null
2010-03-01 Re-verification of a Lip Synchronization Protocol using Robust Reachability Piotr Kordy et.al. 1003.0431 null

(back to top)

Image Animation

Publish Date Title Authors PDF Code
2025-05-30 MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation Yanbo Ding et.al. 2505.10238 link
2025-05-29 HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions Shuolin Xu et.al. 2505.22977 link
2025-05-24 EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation Qiang Qu et.al. 2503.18552 null
2025-05-18 DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation Haoyu Zhao et.al. 2503.21246 link
2025-05-13 TT-DF: A Large-Scale Diffusion-Based Dataset and Benchmark for Human Body Forgery Detection Wenkui Yang et.al. 2505.08437 link
2025-04-28 AnimateAnywhere: Rouse the Background in Human Image Animation Xiaoyu Liu et.al. 2504.19834 null
2025-04-20 DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance Yuxuan Luo et.al. 2504.01724 null
2025-04-15 UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer Xiang Wang et.al. 2504.11289 link
2025-04-15 Taming Consistency Distillation for Accelerated Human Image Animation Xiang Wang et.al. 2504.11143 null
2025-04-05 Multi-identity Human Image Animation with Structural Video Diffusion Zhenzhi Wang et.al. 2504.04126 null
2025-04-04 Optimizing 4D Gaussians for Dynamic Scene Video from Single Landscape Images In-Hwan Jin et.al. 2504.05458 link
2025-04-01 VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer Xinyu Liu et.al. 2502.05979 null
2025-03-23 MotiF: Making Text Count in Image Animation with Motion Focal Loss Shijie Wang et.al. 2412.16153 null
2025-03-13 Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer Jiahao Cui et.al. 2412.00733 link
2025-03-10 Perception-as-Control: Fine-grained Controllable Image Animation with 3D-aware Motion Representation Yingjie Chen et.al. 2501.05020 null
2025-02-25 DisPose: Disentangling Pose Guidance for Controllable Human Image Animation Hongxiang Li et.al. 2412.09349 link
2025-02-24 X-Dancer: Expressive Music to Human Dance Video Generation Zeyuan Chen et.al. 2502.17414 null
2025-02-15 SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers Di Qiu et.al. 2502.10841 link
2025-02-10 Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance Li Hu et.al. 2502.06145 null
2025-02-06 MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation Jinbo Xing et.al. 2502.04299 null
2025-01-30 Every Image Listens, Every Image Dances: Music-Driven Image Animation Zhikang Dong et.al. 2501.18801 null
2025-01-20 X-Dyna: Expressive Dynamic Human Image Animation Di Chang et.al. 2501.10021 link
2025-01-15 Joint Learning of Depth and Appearance for Portrait Image Animation Xinya Ji et.al. 2501.08649 null
2024-12-11 Animate-X: Universal Character Image Animation with Enhanced Motion Representation Shuai Tan et.al. 2410.10306 null
2024-12-04 FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait Taekyung Ki et.al. 2412.01064 null
2024-11-30 DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses Yatian Pang et.al. 2412.00397 null
2024-11-28 JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation Xuyang Cao et.al. 2411.09209 link
2024-11-27 StableAnimator: High-Quality Identity-Preserving Human Image Animation Shuyuan Tu et.al. 2411.17697 link
2024-11-24 LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis Haojie Zhang et.al. 2411.16748 null
2024-11-21 HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation Zhenzhi Wang et.al. 2407.17438 link
2024-10-31 TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation Sunjae Yoon et.al. 2410.24037 null
2024-10-20 FrameBridge: Improving Image-to-Video Generation with Bridge Models Yuji Wang et.al. 2410.15371 null
2024-10-14 Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation Jiahao Cui et.al. 2410.07718 link
2024-09-30 Illustrious: an Open Advanced Illustration Model Sang Hyun Park et.al. 2409.19946 null
2024-09-29 High Quality Human Image Animation using Regional Supervision and Motion Blur Condition Zhongcong Xu et.al. 2409.19580 null
2024-09-22 Dormant: Defending against Pose-driven Human Image Animation Jiachen Zhou et.al. 2409.14424 link
2024-07-23 Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models Xin Ma et.al. 2407.15642 link
2024-07-12 TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models Jeongho Kim et.al. 2407.09012 null
2024-07-12 EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions Zhiyuan Chen et.al. 2407.08136 link
2024-07-11 MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model Muyao Niu et.al. 2405.20222 link
2024-06-16 Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation Mingwang Xu et.al. 2406.08801 null
2024-06-13 Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control Jingyun Xue et.al. 2406.03035 null
2024-06-03 UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation Xiang Wang et.al. 2406.01188 null
2024-06-01 Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance Shenhao Zhu et.al. 2403.14781 link
2024-05-29 Evaluating the efectiveness of sonifcation in science education using Edukoi Lucrezia Guiotto Nai Fovino et.al. 2405.18908 null
2024-05-28 VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation Qilin Wang et.al. 2405.18156 null
2024-05-28 Controllable Longer Image Animation with Diffusion Models Qiang Wang et.al. 2405.17306 null
2024-03-25 PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models Yiming Zhang et.al. 2312.13964 link
2024-03-13 Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts Yue Ma et.al. 2403.08268 link
2024-03-08 Audio-Synchronized Visual Animation Lin Zhang et.al. 2403.05659 link
2024-03-05 Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation Weijie Li et.al. 2403.02827 null
2024-01-17 Continuous Piecewise-Affine Based Motion Model for Image Animation Hexiang Wang et.al. 2401.09146 link
2024-01-03 Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions David Junhao Zhang et.al. 2401.01827 link
2023-12-06 AnimateZero: Video Diffusion Models are Zero-Shot Image Animators Jiwen Yu et.al. 2312.03793 link
2023-12-05 LivePhoto: Real Image Animation with Text-guided Motion Control Xi Chen et.al. 2312.02928 null
2023-12-04 AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance Zuozhuo Dai et.al. 2311.12886 link
2023-11-30 Motion-Conditioned Image Animation for Video Editing Wilson Yan et.al. 2311.18827 null
2023-11-27 MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model Zhongcong Xu et.al. 2311.16498 null
2023-11-27 DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors Jinbo Xing et.al. 2310.12190 link
2023-11-19 Differential Motion Evolution for Fine-Grained Motion Deformation in Unsupervised Image Animation Peirong Liu et.al. 2110.04658 null
2023-10-16 LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation Ruiqi Wu et.al. 2310.10769 link
2023-10-11 LEO: Generative Latent Image Animator for Human Video Synthesis Yaohui Wang et.al. 2305.03989 link
2023-09-26 Text-Guided Synthesis of Eulerian Cinemagraphs Aniruddha Mahapatra et.al. 2307.03190 link
2023-09-25 Automatic Animation of Hair Blowing in Still Portrait Photos Wenpeng Xiao et.al. 2309.14207 null
2023-07-10 AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning Yuwei Guo et.al. 2307.04725 link
2023-07-09 Predictive Coding For Animation-Based Video Compression Goluck Konuko et.al. 2307.04187 null
2023-04-12 VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs Moayed Haji Ali et.al. 2304.06020 null
2023-03-10 3D Cinemagraphy from a Single Image Xingyi Li et.al. 2303.05724 null
2023-02-02 Dreamix: Video Diffusion Models are General Video Editors Eyal Molad et.al. 2302.01329 null
2023-01-14 Continuous odor profile monitoring to study olfactory navigation in small animals Kevin S. Chen et.al. 2301.05905 null
2022-11-30 NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-shot Real Image Animation Yu Yin et.al. 2211.17235 null
2022-10-04 Implicit Warping for Animation with Image Sets Arun Mallya et.al. 2210.01794 null
2022-09-28 Motion Transformer for Unsupervised Image Animation Jiale Tao et.al. 2209.14024 link
2022-07-19 Single Stage Virtual Try-on via Deformable Attention Flows Shuai Bai et.al. 2207.09161 link
2022-07-08 Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation Yucheng Suo et.al. 2207.03714 null
2022-06-11 Bayesian Statistics Guided Label Refurbishment Mechanism: Mitigating Label Noise in Medical Image Classification Mengdi Gao et.al. 2106.12284 link
2022-04-05 Neural Fields in Visual Computing and Beyond Yiheng Xie et.al. 2111.11426 null
2022-03-29 Thin-Plate Spline Motion Model for Image Animation Jian Zhao et.al. 2203.14367 link
2022-03-29 Image Animation with Perturbed Masks Yoav Shalev et.al. 2011.06922 link
2022-03-25 3D GAN Inversion for Controllable Portrait Image Animation Connor Z. Lin et.al. 2203.13441 null
2022-03-17 Latent Image Animator: Learning to Animate Images via Latent Space Navigation Yaohui Wang et.al. 2203.09043 null
2021-12-21 Image Animation with Keypoint Mask Or Toledano et.al. 2112.10457 link
2021-12-19 Move As You Like: Image Animation in E-Commerce Scenario Borun Xu et.al. 2112.13647 null
2021-12-17 AI-Empowered Persuasive Video Generation: A Survey Chang Liu et.al. 2112.09401 null
2021-10-26 Incremental Learning for Animal Pose Estimation using RBF k-DPP Gaurav Kumar Nayak et.al. 2110.13598 null
2021-09-03 Sparse to Dense Motion Transfer for Face Image Animation Ruiqi Zhao et.al. 2109.00471 null
2021-08-18 DeepFake MNIST+: A DeepFake Facial Animation Dataset Jiajun Huang et.al. 2108.07949 link
2021-06-23 Analisis Kualitas Layanan Website E-Commerce Bukalapak Terhadap Kepuasan Pengguna Mahasiswa Universitas Bina Darma Menggunakan Metode Webqual 4.0 Adellia et.al. 2106.15342 null
2021-04-07 Single Source One Shot Reenactment using Weighted motion From Paired Feature Points Soumya Tripathy et.al. 2104.03117 null
2021-03-22 PriorityCut: Occlusion-guided Regularization for Warp-based Image Animation Wai Ting Cheung et.al. 2103.11600 null
2020-12-01 Ultra-low bitrate video conferencing using deep image animation Goluck Konuko et.al. 2012.00346 null
2020-10-01 First Order Motion Model for Image Animation Aliaksandr Siarohin et.al. 2003.00196 link
2020-08-27 Deep Spatial Transformation for Pose-Guided Person Image Generation and Animation Yurui Ren et.al. 2008.12606 link
2019-08-30 Animating Arbitrary Objects via Deep Motion Transfer Aliaksandr Siarohin et.al. 1812.08861 link
2018-10-09 3D model silhouette-based tracking in depth images for puppet suit dynamic video-mapping Guillaume Caron et.al. 1810.03956 null
2018-06-24 A Design of FPGA Based Small Animal PET Real Time Digital Signal Processing and Correction Logic Jiaming Lu et.al. 1806.09117 null
2018-01-31 RAPTOR I: Time-dependent radiative transfer in arbitrary spacetimes Thomas Bronzwaer et.al. 1801.10452 null
2016-06-23 Gender and Interest Targeting for Sponsored Post Advertising at Tumblr Mihajlo Grbovic et.al. 1606.07189 null
2015-03-16 Use of Effective Audio in E-learning Courseware Kisor Ray et.al. 1503.04837 null
2015-02-04 Multimedia-Video for Learning Kah Hean Chua et.al. 1502.01090 null
2013-01-25 Measurements of Martian Dust Devil Winds with HiRISE David S. Choi et.al. 1301.6130 null
2010-01-04 Tutoring System for Dance Learning Rajkumar Kannan et.al. 1001.0440 null

(back to top)

Video Generation

Publish Date Title Authors PDF Code
2025-06-25 Video Perception Models for 3D Scene Synthesis Rui Huang et.al. 2506.20601 null
2025-06-25 BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos Jiahao Lin et.al. 2506.20103 null
2025-06-24 Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation Xingyang Li et.al. 2506.19852 null
2025-06-24 GenHSI: Controllable Generation of Human-Scene Interaction Videos Zekun Li et.al. 2506.19840 null
2025-06-24 SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution Liangbin Xie et.al. 2506.19838 null
2025-06-24 Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router Yubo Huang et.al. 2506.19833 null
2025-06-24 Training-Free Motion Customization for Distilled Video Generators with Adaptive Test-Time Distillation Jintao Rong et.al. 2506.19348 null
2025-06-23 VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory Runjia Li et.al. 2506.18903 null
2025-06-23 From Virtual Games to Real-World Play Wenqiang Sun et.al. 2506.18901 null
2025-06-23 FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation Kaiyi Huang et.al. 2506.18899 null
2025-06-23 MinD: Unified Visual Imagination and Control via Hierarchical World Models Xiaowei Chi et.al. 2506.18897 null
2025-06-23 OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation Qijun Gan et.al. 2506.18866 null
2025-06-23 Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset Zhuowei Chen et.al. 2506.18851 null
2025-06-23 Matrix-Game: Interactive World Foundation Model Yifan Zhang et.al. 2506.18701 null
2025-06-23 RDPO: Real Data Preference Optimization for Physics Consistency Video Generation Wenxu Qian et.al. 2506.18655 null
2025-06-23 BulletGen: Improving 4D Reconstruction with Bullet-Time Generation Denys Rozumnyi et.al. 2506.18601 null
2025-06-23 VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning Xuanyu Zhang et.al. 2506.18564 null
2025-06-23 Emergent Temporal Correspondences from Video Diffusion Transformers Jisu Nam et.al. 2506.17220 link
2025-06-21 STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation Jiamin Wang et.al. 2506.13138 null
2025-06-20 Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition Jiaqi Li et.al. 2506.17201 null
2025-06-20 Seeing What Matters: Generalizable AI-generated Video Detection with Forensic-Oriented Augmentation Riccardo Corvi et.al. 2506.16802 null
2025-06-20 Sekai: A Video Dataset towards World Exploration Zhen Li et.al. 2506.15675 null
2025-06-20 Show-o2: Improved Native Unified Multimodal Models Jinheng Xie et.al. 2506.15564 link
2025-06-19 VideoGAN-based Trajectory Proposal for Automated Vehicles Annajoyce Mariani et.al. 2506.16209 link
2025-06-19 FastInit: Fast Noise Initialization for Temporally Consistent Video Generation Chengyu Bai et.al. 2506.16119 null
2025-06-19 PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models Tianchen Zhao et.al. 2506.16054 null
2025-06-19 Advanced Sign Language Video Generation with Compressed and Quantized Multi-Condition Tokenization Cong Wang et.al. 2506.15980 link
2025-06-18 VideoMAR: Autoregressive Video Generatio with Continuous Tokens Hu Yu et.al. 2506.14168 null
2025-06-18 Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models Xuanchi Ren et.al. 2506.09042 link
2025-06-17 Causally Steered Diffusion for Automated Video Counterfactual Generation Nikos Spyrou et.al. 2506.14404 link
2025-06-17 CausalDiffTab: Mixed-Type Causal-Aware Diffusion for Tabular Data Generation Jia-Chen Zhang et.al. 2506.14206 null
2025-06-16 EchoShot: Multi-Shot Portrait Video Generation Jiahao Wang et.al. 2506.15838 null
2025-06-16 UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions Zhucun Xue et.al. 2506.13691 null
2025-06-15 iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer Zhelun Shen et.al. 2506.12847 null
2025-06-13 SignAligner: Harmonizing Complementary Pose Modalities for Coherent Sign Language Generation Xu Wang et.al. 2506.11621 null
2025-06-12 GenWorld: Towards Detecting AI-generated Real-world Simulation Videos Weiliang Chen et.al. 2506.10975 null
2025-06-12 M4V: Multi-Modal Mamba for Text-to-Video Generation Jiancheng Huang et.al. 2506.10915 null
2025-06-12 GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning Xiaoyi Bao et.al. 2506.10639 null
2025-06-12 DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers Lizhen Wang et.al. 2506.10568 null
2025-06-12 AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation Haoyuan Shi et.al. 2506.10540 null
2025-06-11 AlignHuman: Improving Motion and Fidelity via Timestep-Segment Preference Optimization for Audio-Driven Human Animation Chao Liang et.al. 2506.11144 null
2025-06-11 PlayerOne: Egocentric World Simulator Yuanpeng Tu et.al. 2506.09995 null
2025-06-11 InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions Zhenzhi Wang et.al. 2506.09984 null
2025-06-11 ReSim: Reliable World Simulation for Autonomous Driving Jiazhi Yang et.al. 2506.09981 null
2025-06-11 DGAE: Diffusion-Guided Autoencoder for Efficient Latent Representation Learning Dongxu Liu et.al. 2506.09644 null
2025-06-11 Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation Shanchuan Lin et.al. 2506.09350 null
2025-06-10 Seedance 1.0: Exploring the Boundaries of Video Generation Models Yu Gao et.al. 2506.09113 null
2025-06-10 FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation Zheqi He et.al. 2506.09081 null
2025-06-10 VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks Xinlong Chen et.al. 2506.09079 null
2025-06-10 MagCache: Fast Video Generation with Magnitude-Aware Cache Zehong Ma et.al. 2506.09045 link
2025-06-10 Product of Experts for Visual Generation Yunzhi Zhang et.al. 2506.08894 null
2025-06-10 HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation Ziyao Huang et.al. 2506.08797 null
2025-06-10 RoboSwap: A GAN-driven Video Diffusion Framework For Unsupervised Robot Arm Swapping Yang Bai et.al. 2506.08632 null
2025-06-10 How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models Huixuan Zhang et.al. 2506.08351 null
2025-06-10 From Generation to Generalization: Emergent Few-Shot Learning in Video Diffusion Models Pablo Acuaviva et.al. 2506.07280 null
2025-06-09 Seeing Voices: Generating A-Roll Video from Audio with Mirage Aditi Sundararaman et.al. 2506.08279 null
2025-06-09 Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion Xun Huang et.al. 2506.08009 null
2025-06-09 Dreamland: Controllable World Creation with Simulator and Generative Models Sicheng Mo et.al. 2506.08006 null
2025-06-09 Audio-Sync Video Generation with Multi-Stream Temporal Control Shuchen Weng et.al. 2506.08003 null
2025-06-09 Generative Modeling of Weights: Generalization or Memorization? Boya Zeng et.al. 2506.07998 link
2025-06-09 Video Unlearning via Low-Rank Refusal Vector Simone Facchiano et.al. 2506.07891 null
2025-06-09 EgoM2P: Egocentric Multimodal Multitask Pretraining Gen Li et.al. 2506.07886 null
2025-06-09 PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement Teng Hu et.al. 2506.07848 null
2025-06-09 Consistent Video Editing as Flow-Driven Image-to-Video Generation Ge Wang et.al. 2506.07713 null
2025-06-09 Evaluating Robustness in Latent Diffusion Models via Embedding Level Augmentation Boris Martirosyan et.al. 2506.07706 null
2025-06-09 Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers Haosong Liu et.al. 2506.05096 null
2025-06-08 TV-LiVE: Training-Free, Text-Guided Video Editing via Layer Informed Vitality Exploitation Min-Jung Kim et.al. 2506.07205 null
2025-06-08 Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models Sangwon Jang et.al. 2506.07177 null
2025-06-08 Hi-VAE: Efficient Video Autoencoding with Global and Detailed Motion Huaize Liu et.al. 2506.07136 null
2025-06-07 Self-Adapting Improvement Loops for Robotic Learning Calvin Luo et.al. 2506.06658 null
2025-06-06 Restereo: Diffusion stereo video generation and restoration Xingchang Huang et.al. 2506.06023 null
2025-06-06 LLIA -- Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models Haojie Yu et.al. 2506.05806 null
2025-06-06 FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion Akide Liu et.al. 2506.04648 null
2025-06-05 EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh Tao Hu et.al. 2506.05554 null
2025-06-05 ContentV: Efficient Training of Video Generation Models with Limited Compute Wenfeng Lin et.al. 2506.05343 null
2025-06-05 FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation Huihan Wang et.al. 2506.04956 null
2025-06-05 DualX-VSR: Dual Axial Spatial $\times$ Temporal Transformer for Real-World Video Super-Resolution without Motion Compensation Shuo Cao et.al. 2506.04830 null
2025-06-05 Follow-Your-Creation: Empowering 4D Creation through Video Inpainting Yue Ma et.al. 2506.04590 null
2025-06-05 FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers Xuanhua He et.al. 2506.04213 null
2025-06-05 SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios Lingwei Dang et.al. 2506.02444 link
2025-06-04 LayerFlow: A Unified Model for Layer-aware Video Generation Sihui Ji et.al. 2506.04228 null
2025-06-04 UNIC: Unified In-Context Video Editing Zixuan Ye et.al. 2506.04216 null
2025-06-04 DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models Ziyi Wu et.al. 2506.03517 null
2025-06-03 Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas Austin Silveria et.al. 2506.03275 null
2025-06-03 IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation Yuanze Lin et.al. 2506.03150 null
2025-06-03 Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval Jiwen Yu et.al. 2506.03141 null
2025-06-03 CamCloneMaster: Enabling Reference-based Camera Control for Video Generation Yawen Luo et.al. 2506.03140 null
2025-06-03 AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation Lu Qiu et.al. 2506.03126 null
2025-06-03 DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation Zhengyao Lv et.al. 2506.03123 null
2025-06-03 TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models Chetwin Low et.al. 2506.03099 null
2025-06-03 SG2VID: Scene Graphs Enable Fine-Grained Control for Video Synthesis Ssharvien Kumar Sivakumar et.al. 2506.03082 null
2025-06-03 ORV: 4D Occupancy-centric Robot Video Generation Xiuyu Yang et.al. 2506.03079 link
2025-06-03 Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers Pengtao Chen et.al. 2506.03065 null
2025-06-03 LinkTo-Anime: A 2D Animation Optical Flow Dataset from 3D Model Rendering Xiaoyi Feng et.al. 2506.02733 null
2025-06-03 LumosFlow: Motion-Guided Long Video Generation Jiahao Chen et.al. 2506.02497 null
2025-06-02 Motion aware video generative model Bowen Xue et.al. 2506.02244 null
2025-06-02 Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control Xiao Fu et.al. 2506.01943 null
2025-06-02 OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation Sen Liang et.al. 2506.01801 null
2025-06-02 Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks Tao Yang et.al. 2506.01758 null
2025-06-02 Respond Beyond Language: A Benchmark for Video Generation in Response to Realistic User Intents Shuting Wang et.al. 2506.01689 null
2025-06-02 LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model Xiaodong Wang et.al. 2506.01546 null
2025-06-02 Towards Scalable Video Anomaly Retrieval: A Synthetic Video-Text Benchmark Shuyu Yang et.al. 2506.01466 null
2025-06-02 DiffuseSlide: Training-Free High Frame Rate Video Generation Diffusion Geunmin Hwang et.al. 2506.01454 null
2025-05-30 MiniMax-Remover: Taming Bad Noise Helps Video Object Removal Bojia Zi et.al. 2505.24873 null
2025-05-30 DreamDance: Animating Character Art via Inpainting Stable Gaussian Worlds Jiaxu Zhang et.al. 2505.24733 null
2025-05-30 UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation Yang-Tian Sun et.al. 2505.24521 null
2025-05-30 Interactive Video Generation via Domain Adaptation Ishaan Rawal et.al. 2505.24253 null
2025-05-30 STORK: Improving the Fidelity of Mid-NFE Sampling for Diffusion and Flow Matching Models Zheng Tan et.al. 2505.24210 link
2025-05-29 MAGREF: Masked Guidance for Any-Reference Video Generation Yufan Deng et.al. 2505.23742 link
2025-05-29 VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos Tingyu Song et.al. 2505.23693 link
2025-05-29 VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models Xiangdong Zhang et.al. 2505.23656 link
2025-05-29 VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation Shi-Xue Zhang et.al. 2505.23484 link
2025-05-29 Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis Hengyuan Cao et.al. 2505.23325 null
2025-05-29 RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer Liu Liu et.al. 2505.23171 null
2025-05-29 Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing Tongtong Su et.al. 2505.23134 link
2025-05-29 MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation Siyuan Wang et.al. 2505.23120 link
2025-05-29 GeoMan: Temporally Consistent Human Geometry Estimation using Image-to-Video Diffusion Gwanghyun Kim et.al. 2505.23085 null
2025-05-29 MOVi: Training-free Text-conditioned Multi-Object Video Generation Aimon Rahman et.al. 2505.22980 null
2025-05-29 HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions Shuolin Xu et.al. 2505.22977 link
2025-05-29 Minute-Long Videos with Dual Parallelisms Zeqing Wang et.al. 2505.21070 link
2025-05-28 ATI: Any Trajectory Instruction for Controllable Video Generation Angtian Wang et.al. 2505.22944 null
2025-05-28 Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation Zhe Kong et.al. 2505.22647 link
2025-05-28 Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers Weilun Feng et.al. 2505.22167 null
2025-05-28 FaceEditTalker: Interactive Talking Head Generation with Facial Attribute Editing Guanwen Feng et.al. 2505.22141 null
2025-05-28 LatentMove: Towards Complex Human Movement Video Generation Ashkan Taghipour et.al. 2505.22046 null
2025-05-28 PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms Yifei Xia et.al. 2505.22016 null
2025-05-28 Learning World Models for Interactive Video Generation Taiye Chen et.al. 2505.21996 null
2025-05-28 SageAttention2++: A More Efficient Implementation of SageAttention2 Jintao Zhang et.al. 2505.21136 link
2025-05-28 OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation Shenghai Yuan et.al. 2505.20292 link
2025-05-27 HDRSDR-VQA: A Subjective Video Quality Dataset for HDR and SDR Comparative Evaluation Bowen Chen et.al. 2505.21831 null
2025-05-27 Think Before You Diffuse: LLMs-Guided Physics-Aware Video Generation Ke Zhang et.al. 2505.21653 null
2025-05-27 VideoMarkBench: Benchmarking Robustness of Video Watermarking Zhengyuan Jiang et.al. 2505.21620 link
2025-05-27 Frame In-N-Out: Unbounded Controllable Image-to-Video Generation Boyang Wang et.al. 2505.21491 null
2025-05-27 Dynamic Vision from EEG Brain Recordings: How much does EEG know? Prajwal Singh et.al. 2505.21385 null
2025-05-27 RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy Aiyue Chen et.al. 2505.21036 null
2025-05-27 Frame-Level Captions for Long Video Generation with Complex Multi Scenes Guangcong Zheng et.al. 2505.20827 null
2025-05-27 Learning Generalizable Robot Policy with Human Demonstration Video as a Prompt Xiang Zhu et.al. 2505.20795 null
2025-05-27 Photography Perspective Composition: Towards Aesthetic Perspective Recommendation Lujian Yao et.al. 2505.20655 null
2025-05-27 Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training Bolin Lai et.al. 2505.20629 null
2025-05-27 Dynamic-I2V: Exploring Image-to-Video Generation Models via Multimodal LLM Peng Liu et.al. 2505.19901 null
2025-05-26 MotionPro: A Precise Motion Controller for Image-to-Video Generation Zhongwei Zhang et.al. 2505.20287 null
2025-05-26 DriveCamSim: Generalizable Camera Simulation via Explicit Camera Modeling for Autonomous Driving Wenchao Sun et.al. 2505.19692 link
2025-05-26 TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs Juntong Wang et.al. 2505.19535 null
2025-05-26 The Role of Video Generation in Enhancing Data-Limited Action Understanding Wei Li et.al. 2505.19495 null
2025-05-26 Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals Nate Gillman et.al. 2505.19386 null
2025-05-25 From Single Images to Motion Policies via Video-Generation Environment Representations Weiming Zhi et.al. 2505.19306 null
2025-05-25 SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation Shenggan Cheng et.al. 2505.19151 null
2025-05-25 WorldEval: World Model as Real-World Robot Policies Evaluator Yaxuan Li et.al. 2505.19017 null
2025-05-25 Geometry-guided Online 3D Video Synthesis with Multi-View Temporal Consistency Hyunho Ha et.al. 2505.18932 null
2025-05-25 Interspatial Attention for Efficient 4D Human Video Generation Ruizhi Shao et.al. 2505.15800 null
2025-05-24 Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation Shuo Yang et.al. 2505.18875 null
2025-05-24 VORTA: Efficient Video Diffusion via Routing Sparse Attention Wenhao Sun et.al. 2505.18809 link
2025-05-24 DVD-Quant: Data-free Video Diffusion Transformers Quantization Zhiteng Li et.al. 2505.18663 link
2025-05-24 ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos Xiaodong Wang et.al. 2505.18650 null
2025-05-23 WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions Zizhang Li et.al. 2505.18151 null
2025-05-23 DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation Junhao Chen et.al. 2505.18078 null
2025-05-23 SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain Jiawei Zhou et.al. 2505.17727 null
2025-05-23 Scaling Image and Video Generation via Test-Time Evolutionary Search Haoran He et.al. 2505.17618 null
2025-05-23 InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO Xueji Fang et.al. 2505.17574 link
2025-05-23 Challenger: Affordable Adversarial Driving Video Generation Zhiyuan Xu et.al. 2505.15880 null
2025-05-22 Temporal Differential Fields for 4D Motion Modeling via Image-to-Video Synthesis Xin You et.al. 2505.17333 null
2025-05-22 Training-Free Efficient Video Generation via Dynamic Token Carving Yuechen Zhang et.al. 2505.16864 link
2025-05-22 Action2Dialogue: Generating Character-Centric Narratives from Scene-Level Prompts Taewon Kang et.al. 2505.16819 null
2025-05-22 MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM Siwei Meng et.al. 2505.16456 null
2025-05-21 Generative AI for Autonomous Driving: A Review Katharina Winter et.al. 2505.15863 null
2025-05-21 AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection Zhipei Xu et.al. 2505.15173 null
2025-05-21 CineTechBench: A Benchmark for Cinematographic Technique Understanding and Generation Xinran Wang et.al. 2505.15145 link
2025-05-21 BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation Haiquan Wen et.al. 2505.12620 link
2025-05-21 Video-GPT via Next Clip Diffusion Shaobin Zhuang et.al. 2505.12489 null
2025-05-20 Programmatic Video Prediction Using Large Language Models Hao Tang et.al. 2505.14948 link
2025-05-20 Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers Sucheng Ren et.al. 2505.14687 link
2025-05-20 LMP: Leveraging Motion Prior in Zero-Shot Video Generation with Diffusion Transformer Changgu Chen et.al. 2505.14167 null
2025-05-20 Hunyuan-Game: Industrial-grade Intelligent Game Creation Model Ruihuang Li et.al. 2505.14135 null
2025-05-20 MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation Yanbo Ding et.al. 2505.10238 link
2025-05-19 FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance Dian Shao et.al. 2505.13437 null
2025-05-19 MAGI-1: Autoregressive Video Generation at Scale Sand. ai et.al. 2505.13211 link
2025-05-19 DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories Joel Jang et.al. 2505.12705 link
2025-05-19 Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking Zihan Su et.al. 2505.12667 null
2025-05-18 EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models Hu Yue et.al. 2505.09694 link
2025-05-17 FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge Xuan Shen et.al. 2505.14709 link
2025-05-17 DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance Xuan Shen et.al. 2505.14708 link
2025-05-17 LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation Jiarui Wang et.al. 2505.12098 link
2025-05-17 VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption Tianxiong Zhong et.al. 2505.12053 null
2025-05-17 STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives Bo Wang et.al. 2505.08350 null
2025-05-16 QVGen: Pushing the Limit of Quantized Video Generative Models Yushi Huang et.al. 2505.11497 null
2025-05-16 Face Consistency Benchmark for GenAI Video Michal Podstawski et.al. 2505.11425 null
2025-05-16 Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model Wei Li et.al. 2505.07449 link
2025-05-15 ToonifyGB: StyleGAN-based Gaussian Blendshapes for 3D Stylized Head Avatars Rui-Yang Ju et.al. 2505.10072 null
2025-05-15 Generating time-consistent dynamics with discriminator-guided image diffusion models Philipp Hess et.al. 2505.09089 null
2025-05-15 Generative Pre-trained Autoregressive Diffusion Transformer Yuan Zhang et.al. 2505.07344 null
2025-05-14 Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios Huafeng Shi et.al. 2505.10584 null
2025-05-13 Generative AI for Autonomous Driving: Frontiers and Opportunities Yuping Wang et.al. 2505.08854 link
2025-05-13 Symbolically-Guided Visual Plan Inference from Uncurated Video Data Wenyan Yang et.al. 2505.08444 null
2025-05-12 DanceGRPO: Unleashing GRPO on Visual Generation Zeyue Xue et.al. 2505.07818 null
2025-05-12 ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models Ozgur Kara et.al. 2505.07652 null
2025-05-11 DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models Junhao Xia et.al. 2505.07057 null
2025-05-11 BridgeIV: Bridging Customized Image and Video Generation through Test-Time Autoregressive Identity Propagation Panwen Hu et.al. 2505.06985 null
2025-05-10 Jailbreaking the Text-to-Video Generative Models Jiayang Liu et.al. 2505.06679 null
2025-05-10 ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images Xianghao Kong et.al. 2505.06537 null
2025-05-08 3D Scene Generation: A Survey Beichen Wen et.al. 2505.05474 link
2025-05-08 T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models Xuyang Guo et.al. 2505.04946 null
2025-05-08 HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation Teng Hu et.al. 2505.04512 null
2025-05-06 Real-Time Person Image Synthesis Using a Flow Matching Model Jiwoo Jeong et.al. 2505.03562 link
2025-05-06 Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights Zhaiming Shen et.al. 2505.03205 null
2025-05-04 DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization Wenchuan Wang et.al. 2505.02192 null
2025-05-03 GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting Anushka Agarwal et.al. 2505.01928 null
2025-05-03 PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth Bu Jin et.al. 2505.01729 null
2025-05-02 VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos Zongxia Li et.al. 2505.01481 link
2025-05-02 FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis Jiangtong Tan et.al. 2505.01172 link
2025-05-01 Controllable Weather Synthesis and Removal with Video Diffusion Models Chih-Hao Lin et.al. 2505.00704 null
2025-05-01 T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation Xuyang Guo et.al. 2505.00337 null
2025-04-30 Direct Motion Models for Assessing Generated Videos Kelsey Allen et.al. 2505.00209 null
2025-04-30 Eye2Eye: A Simple Approach for Monocular-to-Stereo Video Synthesis Michal Geyer et.al. 2505.00135 null
2025-04-30 ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction Qihao Liu et.al. 2504.21855 null
2025-04-30 HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation Haiyang Zhou et.al. 2504.21650 link
2025-04-30 Simple Visual Artifact Detection in Sora-Generated Videos Misora Sugiyama et.al. 2504.21334 null
2025-04-30 Capturing Conditional Dependence via Auto-regressive Diffusion Models Xunpeng Huang et.al. 2504.21314 null
2025-04-29 TesserAct: Learning 4D Embodied World Models Haoyu Zhen et.al. 2504.20995 null
2025-04-29 DDPS: Discrete Diffusion Posterior Sampling for Paths in Layered Graphs Hao Luan et.al. 2504.20754 null
2025-04-29 Advance Fake Video Detection via Vision Transformers Joy Battocchio et.al. 2504.20669 null
2025-04-28 CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition Quynh Phung et.al. 2504.19894 null
2025-04-28 DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer Junpeng Jiang et.al. 2504.19614 null
2025-04-26 Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning Yifan Xie et.al. 2504.18810 null
2025-04-26 Stealing Creator's Workflow: A Creator-Inspired Agentic Framework with Iterative Feedback Loop for Improved Scientific Short-form Generation Jong Inn Park et.al. 2504.18805 null
2025-04-25 NoiseController: Towards Consistent Multi-view Video Generation via Noise Decomposition and Collaboration Haotian Dong et.al. 2504.18448 null
2025-04-25 We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback Minkyu Choi et.al. 2504.17180 null
2025-04-24 Dynamic Camera Poses and Where to Find Them Chris Rockwell et.al. 2504.17788 null
2025-04-24 MV-Crafter: An Intelligent System for Music-guided Video Generation Chuer Chen et.al. 2504.17267 null
2025-04-24 DIVE: Inverting Conditional Diffusion Models for Discriminative Tasks Yinqi Li et.al. 2504.17253 link
2025-04-23 Subject-driven Video Generation via Disentangled Identity and Motion Daneul Kim et.al. 2504.17816 null
2025-04-23 BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation Ruotong Wang et.al. 2504.16907 null
2025-04-23 ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance Ying Li et.al. 2504.16464 null
2025-04-23 VideoMark: A Distortion-Free Robust Watermarking Framework for Video Diffusion Models Xuming Hu et.al. 2504.16359 null
2025-04-22 DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment Xiaofan Li et.al. 2504.18576 link
2025-04-22 Survey of Video Diffusion Models: Foundations, Implementations, and Applications Yimu Wang et.al. 2504.16081 link
2025-04-22 Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework Xinyuan Song et.al. 2504.16016 null
2025-04-22 Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning Wang Lin et.al. 2504.15932 null
2025-04-22 Satellite to GroundScape -- Large-scale Consistent Ground View Generation from Satellite Views Ningli Xu et.al. 2504.15786 null
2025-04-22 DiTPainter: Efficient Video Inpainting with Diffusion Transformers Xian Wu et.al. 2504.15661 null
2025-04-21 Solving New Tasks by Adapting Internet Video Knowledge Calvin Luo et.al. 2504.15369 null
2025-04-21 Tiger200K: Manually Curated High Visual Quality Video Dataset from UGC Platform Xianpan Zhou et.al. 2504.15182 null
2025-04-21 DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation Weijie He et.al. 2504.15032 null
2025-04-21 Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation Chenjie Cao et.al. 2504.14899 link
2025-04-21 SkyReels-V2: Infinite-length Film Generative Model Guibin Chen et.al. 2504.13074 link
2025-04-21 Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Lvmin Zhang et.al. 2504.12626 link
2025-04-20 Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis Jingjing Ren et.al. 2504.14470 null
2025-04-19 SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video Generation via Spherical Latent Representation Minho Park et.al. 2504.14396 link
2025-04-18 Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting Jiaxin Huang et.al. 2504.11092 null
2025-04-17 Understanding Attention Mechanism in Video Diffusion Models Bingyan Liu et.al. 2504.12027 null
2025-04-17 VideoPanda: Video Panoramic Diffusion with Multi-view Attention Kevin Xie et.al. 2504.11389 null
2025-04-16 VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate Zhihang Yuan et.al. 2504.12259 link
2025-04-16 Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM Zirui Pan et.al. 2504.12048 null
2025-04-16 The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation Bingjie Gao et.al. 2504.11739 null
2025-04-15 InterAnimate: Taming Region-aware Diffusion Model for Realistic Human Interaction Animation Yukang Lin et.al. 2504.10905 null
2025-04-15 OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding Dianbing Xi et.al. 2504.10825 null
2025-04-14 H-MoRe: Learning Human-centric Motion Representation for Action Analysis Zhanbo Huang et.al. 2504.10676 link
2025-04-14 H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models Yushu Wu et.al. 2504.10567 null
2025-04-14 FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos Rui Chen et.al. 2504.10358 null
2025-04-14 Aligning Anime Video Generation with Human Feedback Bingwen Zhu et.al. 2504.10044 null
2025-04-14 EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise Chao Liu et.al. 2504.09789 null
2025-04-13 CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models Pooja Guhan et.al. 2504.09472 null
2025-04-11 Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Team Seawead et.al. 2504.08685 null
2025-04-11 Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization Jialu Li et.al. 2504.08641 null
2025-04-11 Diffusion Models for Robotic Manipulation: A Survey Rosa Wolf et.al. 2504.08438 null
2025-04-11 EasyGenNet: An Efficient Framework for Audio-Driven Gesture Video Generation Based on Diffusion Model Renda Li et.al. 2504.08344 null
2025-04-11 RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements Guangcong Zheng et.al. 2504.08212 link
2025-04-11 TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation Ruineng Li et.al. 2504.08181 null
2025-04-10 Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction Zeren Jiang et.al. 2504.07961 link
2025-04-10 Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos Rundong Luo et.al. 2504.07940 null
2025-04-10 Diffusion Transformers for Tabular Data Time Series Generation Fabrizio Garuti et.al. 2504.07566 link
2025-04-09 EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation Diljeet Jagpal et.al. 2504.06861 null
2025-04-09 DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation Wangbo Zhao et.al. 2504.06803 link
2025-04-09 RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism Elia Peruzzo et.al. 2504.06672 null
2025-04-09 Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception Ruotian Peng et.al. 2504.06666 null
2025-04-08 CamContextI2V: Context-aware Controllable Video Generation Luis Denninger et.al. 2504.06022 link
2025-04-08 Physics-aware generative models for turbulent fluid flows through energy-consistent stochastic interpolants Nikolaj T. Mücke et.al. 2504.05852 link
2025-04-07 One-Minute Video Generation with Test-Time Training Karan Dalal et.al. 2504.05298 null
2025-04-07 Video-Bench: Human-Aligned Video Generation Benchmark Hui Han et.al. 2504.04907 null
2025-04-07 Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation Fa-Ting Hong et.al. 2504.02542 link
2025-04-05 Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization Yikai Wang et.al. 2504.04153 link
2025-04-05 Multi-identity Human Image Animation with Structural Video Diffusion Zhenzhi Wang et.al. 2504.04126 null
2025-04-05 Can You Count to Nine? A Human Evaluation Benchmark for Counting Limits in Modern Text-to-Video Models Xuyang Guo et.al. 2504.04051 null
2025-04-05 DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion Maksim Siniukov et.al. 2504.04010 null
2025-04-04 Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models Xuran Ma et.al. 2504.03140 link
2025-04-04 MG-Gen: Single Image to Motion Graphics Generation with Layer Decomposition Takahiro Shirakawa et.al. 2504.02361 null
2025-04-03 How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models Pascal Chang et.al. 2504.03072 null
2025-04-03 Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments Chenyu Zhang et.al. 2504.02918 null
2025-04-03 Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets Chuning Zhu et.al. 2504.02792 null
2025-04-03 Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model Shengjun Zhang et.al. 2504.02764 null
2025-04-03 ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer Jiayi Gao et.al. 2504.02451 link
2025-04-03 SkyReels-A2: Compose Anything in Video Diffusion Transformers Zhengcong Fei et.al. 2504.02436 link
2025-04-03 OmniCam: Unified Multimodal Video Generation via Camera Control Xiaoda Yang et.al. 2504.02312 null
2025-04-03 VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step Hanyang Wang et.al. 2504.01956 null
2025-04-02 Proof of Humanity: A Multi-Layer Network Framework for Certifying Human-Originated Content in an AI-Dominated Internet Sebastian Barros et.al. 2504.03752 null
2025-04-02 WorldPrompter: Traversable Text-to-Scene Generation Zhaoyang Zhang et.al. 2504.02045 null
2025-04-02 Towards Physically Plausible Video Generation via VLM Planning Xindi Yang et.al. 2503.23368 null
2025-04-01 AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction Junhao Cheng et.al. 2504.01014 link
2025-04-01 WorldScore: A Unified Evaluation Benchmark for World Generation Haoyi Duan et.al. 2504.00983 null
2025-04-01 DecoFuse: Decomposing and Fusing the "What", "Where", and "How" for Brain-Inspired fMRI-to-Video Decoding Chong Li et.al. 2504.00432 null
2025-04-01 HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation Boyuan Wang et.al. 2503.24026 null
2025-04-01 On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices Bosung Kim et.al. 2503.23796 link
2025-03-31 GazeLLM: Multimodal LLMs incorporating Human Visual Attention Jun Rekimoto et.al. 2504.00221 null
2025-03-31 Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Shengqiong Wu et.al. 2503.24379 null
2025-03-31 JointTuner: Appearance-Motion Adaptive Joint Training for Customized Video Generation Fangda Chen et.al. 2503.23951 null
2025-03-31 HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation Kun Liu et.al. 2503.23715 null
2025-03-30 VideoGen-Eval: Agent-based System for Video Generation Evaluation Yuhang Yang et.al. 2503.23452 link
2025-03-30 JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization Kai Liu et.al. 2503.23377 null
2025-03-30 MoCha: Towards Movie-Grade Talking Character Synthesis Cong Wei et.al. 2503.23307 null
2025-03-30 SketchVideo: Sketch-based Video Generation and Editing Feng-Lin Liu et.al. 2503.23284 null
2025-03-29 Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models Prin Phunyaphibarn et.al. 2503.20240 null
2025-03-28 Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model Jangho Park et.al. 2503.22622 null
2025-03-28 EchoFlow: A Foundation Model for Cardiac Ultrasound Image and Video Generation Hadrien Reynaud et.al. 2503.22357 null
2025-03-28 CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving Yishen Ji et.al. 2503.22231 null
2025-03-27 VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models Chi-Pin Huang et.al. 2503.21781 null
2025-03-27 Exploring the Evolution of Physics Cognition in Video Generation: A Survey Minghui Lin et.al. 2503.21765 link
2025-03-27 VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness Dian Zheng et.al. 2503.21755 link
2025-03-27 Audio-driven Gesture Generation via Deviation Feature in the Latent Space Jiahui Chen et.al. 2503.21616 null
2025-03-27 ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model Jinwei Qi et.al. 2503.21144 null
2025-03-26 Protecting Your Video Content: Disrupting Automated Video-based LLM Annotations Haitong Liu et.al. 2503.21824 link
2025-03-26 Synthetic Video Enhances Physical Fidelity in Video Synthesis Qi Zhao et.al. 2503.20822 null
2025-03-26 RecTable: Fast Modeling Tabular Data with Rectified Flow Masane Fuchi et.al. 2503.20731 link
2025-03-26 AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports Xiangwen Zhang et.al. 2503.20654 null
2025-03-26 GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving Lloyd Russell et.al. 2503.20523 null
2025-03-26 VPO: Aligning Text-to-Video Generation Models with Prompt Optimization Jiale Cheng et.al. 2503.20491 link
2025-03-26 Wan: Open and Advanced Large-Scale Video Generative Models WanTeam et.al. 2503.20314 link
2025-03-26 Video Motion Graphs Haiyang Liu et.al. 2503.20218 null
2025-03-26 Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing Jaihoon Kim et.al. 2503.19385 null
2025-03-26 EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models Yufei Cai et.al. 2503.19369 link
2025-03-25 Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors Yuke Lou et.al. 2503.20118 null
2025-03-25 Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals Stefan Stojanov et.al. 2503.19953 null
2025-03-25 FuXi-RTM: A Physics-Guided Prediction Framework with Radiative Transfer Modeling Qiusheng Huang et.al. 2503.19940 null
2025-03-25 FullDiT: Multi-Task Video Generative Foundation Model with Full Attention Xuan Ju et.al. 2503.19907 null
2025-03-25 Mask $^2$ DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation Tianhao Qi et.al. 2503.19881 null
2025-03-25 AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers Jiazhi Guan et.al. 2503.19824 null
2025-03-25 AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset Haiyu Zhang et.al. 2503.19462 null
2025-03-25 MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation Yukang Lin et.al. 2503.19383 null
2025-03-25 Long-Context Autoregressive Video Modeling with Next-Frame Prediction Yuchao Gu et.al. 2503.19325 link
2025-03-25 Aether: Geometric-Aware Unified World Modeling Aether Team et.al. 2503.18945 null
2025-03-25 AMD-Hummingbird: Towards an Efficient Text-to-Video Model Takashi Isobe et.al. 2503.18559 link
2025-03-25 Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model Yingying Fan et.al. 2503.16942 null
2025-03-24 Video-T1: Test-Time Scaling for Video Generation Fangfu Liu et.al. 2503.18942 null
2025-03-24 Training-free Diffusion Acceleration with Bottleneck Sampling Ye Tian et.al. 2503.18940 null
2025-03-24 EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation Qiang Qu et.al. 2503.18552 null
2025-03-24 Can Text-to-Video Generation help Video-Language Alignment? Luca Zanella et.al. 2503.18507 null
2025-03-24 Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation Dingcheng Zhen et.al. 2503.18429 null
2025-03-24 Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance Sicong Feng et.al. 2503.18386 null
2025-03-23 LongDiff: Training-Free Long Video Generation in One Go Zhuoling Li et.al. 2503.18150 null
2025-03-23 TransAnimate: Taming Layer Diffusion to Generate RGBA Video Xuewei Chen et.al. 2503.17934 null
2025-03-22 RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation Zhiqiang Yuan et.al. 2503.17735 null
2025-03-21 Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks Bhishma Dedhia et.al. 2503.17539 null
2025-03-21 Position: Interactive Generative Video as Next-Generation Game Engine Jiwen Yu et.al. 2503.17359 null
2025-03-21 AnimatePainter: A Self-Supervised Rendering Framework for Reconstructing Painting Process Junjie Hu et.al. 2503.17029 null
2025-03-21 Enabling Versatile Controls for Video Diffusion Models Xu Zhang et.al. 2503.16983 link
2025-03-21 SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation Chun-Han Yao et.al. 2503.16396 null
2025-03-20 A Recipe for Generating 3D Worlds From a Single Image Katja Schwarz et.al. 2503.16611 null
2025-03-20 XAttention: Block Sparse Attention with Antidiagonal Scoring Ruyi Xu et.al. 2503.16428 link
2025-03-20 MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance Quanhao Li et.al. 2503.16421 null
2025-03-20 ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos Haolin Yang et.al. 2503.16400 null
2025-03-20 PoseTraj: Pose-Aware Trajectory Control in Video Diffusion Longbin Ji et.al. 2503.16068 null
2025-03-20 Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models Marc Benedí San Millán et.al. 2503.15996 null
2025-03-20 MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving Haiguang Wang et.al. 2503.15875 link
2025-03-20 VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling Hyojun Go et.al. 2503.15855 null
2025-03-20 VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention Mingzhe Zheng et.al. 2503.15138 null
2025-03-19 Temporal Regularization Makes Your Video Generator Stronger Harold Haodong Chen et.al. 2503.15417 null
2025-03-19 Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models Tingxiu Chen et.al. 2503.14966 link
2025-03-18 MusicInfuser: Making Video Diffusion Listen and Dance Susung Hong et.al. 2503.14505 null
2025-03-18 MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation Hongyu Zhang et.al. 2503.14428 null
2025-03-18 Impossible Videos Zechen Bai et.al. 2503.14378 null
2025-03-18 LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models Yu Cheng et.al. 2503.14325 link
2025-03-18 Concat-ID: Towards Universal Identity-Preserving Video Synthesis Yong Zhong et.al. 2503.14151 null
2025-03-18 Fast Autoregressive Video Generation with Diagonal Decoding Yang Ye et.al. 2503.14070 null
2025-03-18 AIGVE-Tool: AI-Generated Video Evaluation Toolkit with Multifaceted Benchmark Xinhao Xiang et.al. 2503.14064 link
2025-03-17 MagicDistillation: Weak-to-Strong Video Distillation for Large-Scale Portrait Few-Step Synthesis Shitong Shao et.al. 2503.13319 null
2025-03-17 Language-guided Open-world Video Anomaly Detection Zihao Liu et.al. 2503.13160 null
2025-03-17 Frame-wise Conditioning Adaptation for Fine-Tuning Diffusion Models in Text-to-Video Prediction Zheyuan Liu et.al. 2503.12953 null
2025-03-17 AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations Quang Trung Truong et.al. 2503.12828 null
2025-03-17 Long-Video Audio Synthesis with Multi-Agent Collaboration Yehang Zhang et.al. 2503.10719 null
2025-03-16 SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse Inputs Guibiao Liao et.al. 2503.12535 null
2025-03-16 VMBench: A Benchmark for Perception-Aligned Video Motion Generation Xinran Ling et.al. 2503.10076 link
2025-03-15 ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis Yu Fang et.al. 2503.14526 null
2025-03-15 A Speech-to-Video Synthesis Approach Using Spatio-Temporal Diffusion for Vocal Tract MRI Paula Andrea Pérez-Toro et.al. 2503.12102 null
2025-03-15 SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering Byeongjun Park et.al. 2503.12024 link
2025-03-14 ReCamMaster: Camera-Controlled Generative Rendering from A Single Video Jianhong Bai et.al. 2503.11647 null
2025-03-14 HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models Ziqin Zhou et.al. 2503.11513 null
2025-03-14 TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation Hongxiang Zhao et.al. 2503.11423 null
2025-03-14 Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model Haoyang Huang et.al. 2503.11251 link
2025-03-14 Cross-Modal Learning for Music-to-Music-Video Description Generation Zhuoyuan Mao et.al. 2503.11190 null
2025-03-14 On the Limitations of Vision-Language Models in Understanding Image Transforms Ahmad Mustafa Anis et.al. 2503.09837 null
2025-03-13 CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models Hao He et.al. 2503.10592 null
2025-03-13 Long Context Tuning for Video Generation Yuwei Guo et.al. 2503.10589 null
2025-03-13 CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance Yufan Deng et.al. 2503.10391 null
2025-03-13 Semantic Latent Motion for Portrait Video Generation Qiyuan Zhang et.al. 2503.10096 null
2025-03-13 UVE: Are MLLMs Unified Evaluators for AI-Generated Videos? Yuanxin Liu et.al. 2503.09949 link
2025-03-13 Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers Yasheng Sun et.al. 2503.09942 null
2025-03-13 VideoMerge: Towards Training-free Long Video Generation Siyang Zhang et.al. 2503.09926 null
2025-03-13 WonderVerse: Extendable 3D Scene Generation with Video Generative Models Hao Feng et.al. 2503.09160 null
2025-03-12 Error Analyses of Auto-Regressive Video Diffusion Models: A Unified Framework Jing Wang et.al. 2503.10704 null
2025-03-12 LuciBot: Automated Robot Policy Learning from Generated Videos Xiaowen Qiu et.al. 2503.09871 null
2025-03-12 I2V3D: Controllable image-to-video generation with 3D guidance Zhiyuan Zhang et.al. 2503.09733 null
2025-03-12 Accelerating Diffusion Sampling via Exploiting Local Transition Coherence Shangwen Zhu et.al. 2503.09675 null
2025-03-12 Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k Xiangyu Peng et.al. 2503.09642 link
2025-03-12 PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop Chenyu Li et.al. 2503.09595 link
2025-03-12 Unified Dense Prediction of Video Diffusion Lehan Yang et.al. 2503.09344 null
2025-03-12 Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latant Space Jian Zhu et.al. 2503.09215 null
2025-03-12 SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video Chengshu Zhao et.al. 2503.09154 link
2025-03-12 Reangle-A-Video: 4D Video Generation as Video-to-Video Translation Hyeonho Jeong et.al. 2503.09151 null
2025-03-12 $^R$ FLAV: Rolling Flow matching for infinite Audio Video generation Alex Ergasti et.al. 2503.08307 link
2025-03-12 Object-Centric World Model for Language-Guided Manipulation Youngjoon Jeong et.al. 2503.06170 null
2025-03-11 V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video Jianqi Chen et.al. 2503.09631 null
2025-03-11 REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder Yitian Zhang et.al. 2503.08665 null
2025-03-11 Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling Subin Kim et.al. 2503.08605 null
2025-03-11 WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation Jing Wang et.al. 2503.08153 null
2025-03-11 ObjectMover: Generative Object Movement with Video Prior Xin Yu et.al. 2503.08037 null
2025-03-11 How Can Video Generative AI Transform K-12 Education? Examining Teachers' Perspectives through TPACK and TAM Unggi Lee et.al. 2503.08003 null
2025-03-11 VACE: All-in-One Video Creation and Editing Zeyinzi Jiang et.al. 2503.07598 null
2025-03-11 LightMotion: A Light and Tuning-free Method for Simulating Camera Motion in Video Generation Quanjian Song et.al. 2503.06508 link
2025-03-10 DreamRelation: Relation-Centric Video Customization Yujie Wei et.al. 2503.07602 null
2025-03-10 AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion Mingzhen Sun et.al. 2503.07418 null
2025-03-10 Automated Movie Generation via Multi-Agent CoT Planning Weijia Wu et.al. 2503.07314 link
2025-03-10 From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers Jiacheng Liu et.al. 2503.06923 link
2025-03-09 VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation Hritik Bansal et.al. 2503.06800 null
2025-03-09 TR-DQ: Time-Rotation Diffusion Quantization Yihua Shao et.al. 2503.06564 null
2025-03-09 QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation Junyi Wu et.al. 2503.06545 link
2025-03-09 Generative Video Bi-flow Chen Liu et.al. 2503.06364 null
2025-03-08 Text2Story: Advancing Video Storytelling with Text Guidance Taewon Kang et.al. 2503.06310 null
2025-03-08 ROCM: RLHF on consistency models Shivanshu Shekhar et.al. 2503.06171 null
2025-03-08 VACT: A Video Automatic Causal Testing System and a Benchmark Haotong Yang et.al. 2503.06163 null
2025-03-08 GSV3D: Gaussian Splatting-based Geometric Distillation with Stable Video Diffusion for Single-Image 3D Object Generation Ye Tao et.al. 2503.06136 null
2025-03-08 DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation Runze Zhang et.al. 2503.06053 null
2025-03-08 The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation Aoxiong Yin et.al. 2503.04606 link
2025-03-08 Rethinking Video Tokenization: A Conditioned Diffusion-based Approach Nianzu Yang et.al. 2503.03708 link
2025-03-07 MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice Hongwei Yi et.al. 2503.05978 null
2025-03-07 MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio Xuenan Xu et.al. 2503.05242 link
2025-03-07 Unified Reward Model for Multimodal Understanding and Generation Yibin Wang et.al. 2503.05236 null
2025-03-07 Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos Zhiyu Tan et.al. 2502.21314 null
2025-03-06 Toward Lightweight and Fast Decoders for Diffusion Models in Image and Video Generation Alexey Buzovkin et.al. 2503.04871 link
2025-03-06 FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video Yue Gao et.al. 2503.04720 null
2025-03-06 What Are You Doing? A Closer Look at Controllable Human Video Generation Emanuele Bugliarello et.al. 2503.04666 null
2025-03-05 ProReflow: Progressive Reflow with Decomposed Velocity Lei Ke et.al. 2503.04824 null
2025-03-05 GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control Xuanchi Ren et.al. 2503.03751 link
2025-03-05 DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance Zhao Yang et.al. 2503.03689 link
2025-03-05 High-Quality Virtual Single-Viewpoint Surgical Video: Geometric Autocalibration of Multiple Cameras in Surgical Lights Yuna Kato et.al. 2503.03558 link
2025-03-05 Video Super-Resolution: All You Need is a Video Diffusion Model Zhihao Zhan et.al. 2503.03355 null
2025-03-04 GRADEO: Towards Human-Like Evaluation for Text-to-Video Generation via Multi-Step Reasoning Zhun Mou et.al. 2503.02341 null
2025-03-04 Unified Video Action Model Shuang Li et.al. 2503.00200 null
2025-03-03 VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation Wenhao Wang et.al. 2503.01739 link
2025-03-03 VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors Juil Koo et.al. 2503.01107 null
2025-03-03 TransVDM: Motion-Constrained Video Diffusion Model for Transparent Video Synthesis Menghao Li et.al. 2502.19454 null
2025-03-02 Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think Jie Tian et.al. 2503.00948 link
2025-03-01 Learning to Animate Images from A Few Videos to Portray Delicate Human Actions Haoxin Li et.al. 2503.00276 null
2025-02-28 Training-free and Adaptive Sparse Attention for Efficient Long Video Generation Yifei Xia et.al. 2502.21079 null
2025-02-28 HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models Xiao Wang et.al. 2502.20811 null
2025-02-28 WorldModelBench: Judging Video Generation Models As World Models Dacheng Li et.al. 2502.20694 null
2025-02-28 RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers Ke Cao et.al. 2502.14377 null
2025-02-27 Mobius: Text to Seamless Looping Video Generation via Latent Shift Xiuli Bi et.al. 2502.20307 link
2025-02-27 FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute Sotiris Anagnostidis et.al. 2502.20126 null
2025-02-27 C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation Yuhao Li et.al. 2502.19868 link
2025-02-26 Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical Analysis Long Cheng et.al. 2503.01873 null
2025-02-26 Glad: A Streaming Scene Generator for Autonomous Driving Bin Xie et.al. 2503.00045 null
2025-02-26 FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion mode Lingzhou Mu et.al. 2502.19455 null
2025-02-25 SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference Jintao Zhang et.al. 2502.18137 link
2025-02-25 ASurvey: Spatiotemporal Consistency in Video Generation Zhiyu Yin et.al. 2502.17863 null
2025-02-24 X-Dancer: Expressive Music to Human Dance Video Generation Zeyuan Chen et.al. 2502.17414 null
2025-02-24 VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing Xiangpeng Yang et.al. 2502.17258 null
2025-02-24 Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions Zhong Li et.al. 2502.17119 link
2025-02-21 RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers Min Zhao et.al. 2502.15894 null
2025-02-21 VaViM and VaVAM: Autonomous Driving through Video Generative Modeling Florent Bartoccioni et.al. 2502.15672 link
2025-02-21 LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities Florian Sestak et.al. 2502.12128 link
2025-02-20 Hardware-Friendly Static Quantization Method for Video Diffusion Transformers Sanghyun Yi et.al. 2502.15077 null
2025-02-20 LAVID: An Agentic LVLM Framework for Diffusion-Generated Video Detection Qingyuan Liu et.al. 2502.14994 null
2025-02-20 Improving the Diffusability of Autoencoders Ivan Skorokhodov et.al. 2502.14831 null
2025-02-20 Designing Parameter and Compute Efficient Diffusion Transformers using Distillation Vignesh Sundaresha et.al. 2502.14226 null
2025-02-19 FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation Yunpeng Zhang et.al. 2502.13995 link
2025-02-19 LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation Junchen Fu et.al. 2502.12945 null
2025-02-18 VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation Xinlong Chen et.al. 2502.12782 link
2025-02-18 MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation Sihyun Yu et.al. 2502.12632 null
2025-02-17 DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation Zhihang Yuan et.al. 2502.11897 link
2025-02-17 Object-Centric Image to Video Generation with Language Guidance Angel Villar-Corrales et.al. 2502.11655 null
2025-02-17 Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Guoqing Ma et.al. 2502.10248 link
2025-02-17 Magic 1-For-1: Generating One Minute Video Clips within One Minute Hongwei Yi et.al. 2502.07701 link
2025-02-16 MaskFlow: Discrete Flows For Flexible and Efficient Long Video Generation Michael Fuest et.al. 2502.11234 null
2025-02-16 Phantom: Subject-consistent video generation via cross-modal alignment Lijie Liu et.al. 2502.11079 null
2025-02-15 SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers Di Qiu et.al. 2502.10841 link
2025-02-14 RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control Teng Li et.al. 2502.10059 null
2025-02-14 GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation Hongyin Zhang et.al. 2502.09268 null
2025-02-13 Enhance-A-Video: Better Generated Video for Free Yang Luo et.al. 2502.07508 link
2025-02-12 CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation Qinghe Wang et.al. 2502.08639 null
2025-02-12 FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis Wonjoon Jin et.al. 2502.08244 null
2025-02-12 Learning Human Skill Generators at Key-Step Levels Yilu Wu et.al. 2502.08234 null
2025-02-12 AnyCharV: Bootstrap Controllable Character Video Generation with Fine-to-Coarse Guidance Zhao Wang et.al. 2502.08189 null
2025-02-12 Next Block Prediction: Video Generation via Semi-Autoregressive Modeling Shuhuai Ren et.al. 2502.07737 null
2025-02-12 VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation Sixiao Zheng et.al. 2502.07531 null

(back to top)

TryOn

Publish Date Title Authors PDF Code
2025-06-23 InstructAttribute: Fine-grained Object Attributes editing with Instruction Xingxi Yin et.al. 2505.00751 null
2025-06-14 Real-Time Per-Garment Virtual Try-On with Temporal Consistency for Loose-Fitting Garments Zaiqiang Wu et.al. 2506.12348 link
2025-06-13 HF-VTON: High-Fidelity Virtual Try-On via Consistent Geometric and Semantic Alignment Ming Meng et.al. 2505.19638 null
2025-06-12 Low-Barrier Dataset Collection with Real Human Body for Interactive Per-Garment Virtual Try-On Zaiqiang Wu et.al. 2506.10468 link
2025-06-06 ChronoTailor: Harnessing Attention Guidance for Fine-Grained Video Virtual Try-On Jinjuan Wang et.al. 2506.05858 null
2025-06-02 OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation Sen Liang et.al. 2506.01801 null
2025-06-01 DS-VTON: High-Quality Virtual Try-on via Disentangled Dual-Scale Generation Xianbing Sun et.al. 2506.00908 null
2025-05-29 VITON-DRR: Details Retention Virtual Try-on via Non-rigid Registration Ben Li et.al. 2505.23439 link
2025-05-28 MagicTryOn: Harnessing Diffusion Transformer for Garment-Preserving Video Virtual Try-on Guangyuan Li et.al. 2505.21325 null
2025-05-27 Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals Davide Lobba et.al. 2505.21062 link
2025-05-26 VTBench: Comprehensive Benchmark Suite Towards Real-World Virtual Try-on Models Hu Xiaobin et.al. 2505.19571 link
2025-05-22 Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction Dong Li et.al. 2505.16980 null
2025-05-22 Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On Siqi Wan et.al. 2505.16977 link
2025-05-15 Single View Garment Reconstruction Using Diffusion Mapping Via Pattern Coordinates Ren Li et.al. 2504.08353 link
2025-04-29 Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting Hanxi Liu et.al. 2504.20403 null
2025-04-24 FashionM3: Multimodal, Multitask, and Multiround Fashion Assistant based on Unified Vision-Language Model Kaicheng Pang et.al. 2504.17826 null
2025-04-24 3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models Min Wei et.al. 2504.17414 null
2025-04-21 Shape-Guided Clothing Warping for Virtual Try-On Xiaoyu Han et.al. 2504.15232 link
2025-04-21 Insert Anything: Image Insertion via In-Context Editing in DiT Wensong Song et.al. 2504.15009 null
2025-04-19 Flux Already Knows -- Activating Subject-Driven Image Generation without Training Hao Kang et.al. 2504.11478 link
2025-04-19 Concat-ID: Towards Universal Identity-Preserving Video Synthesis Yong Zhong et.al. 2503.14151 null
2025-04-18 Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation Fulvio Sanguigni et.al. 2504.14011 null
2025-04-17 Enhancing Person-to-Person Virtual Try-On with Multi-Garment Virtual Try-Off Riza Velioglu et.al. 2504.13078 link
2025-04-15 ReZero: Enhancing LLM search ability by trying one-more-time Alan Dao et.al. 2504.11001 null
2025-04-11 VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction Zijian He et.al. 2503.12165 null
2025-04-04 From Keypoints to Realism: A Realistic and Accurate Virtual Try-on Network from 2D Images Maliheh Toozandehjani et.al. 2504.03807 null
2025-04-03 MAD: Makeup All-in-One with Cross-Domain Diffusion Model Bo-Kai Ruan et.al. 2504.02545 null
2025-04-01 Diffusion Model-Based Size Variable Virtual Try-On Technology and Evaluation Method Shufang Zhang et.al. 2504.00562 null
2025-03-26 ITA-MDT: Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On Ji Woo Hong et.al. 2503.20418 null
2025-03-26 Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks Hailong Guo et.al. 2501.15891 null
2025-03-25 Exploring Disentangled and Controllable Human Image Synthesis: From End-to-End to Stage-by-Stage Zhengwentai Sun et.al. 2503.19486 null
2025-03-20 Shining Yourself: High-Fidelity Ornaments Virtual Try-on with Diffusion Model Yingmao Miao et.al. 2503.16065 null
2025-03-18 Limb-Aware Virtual Try-On Network with Progressive Clothing Warping Shengping Zhang et.al. 2503.14074 link
2025-03-16 Progressive Limb-Aware Virtual Try-On Xiaoyu Han et.al. 2503.12588 link
2025-03-15 ITVTON: Virtual Try-On Diffusion Transformer Based on Integrated Image and Text Haifeng Ni et.al. 2501.16757 null
2025-03-11 MF-VITON: High-Fidelity Mask-Free Virtual Try-On with Minimal Input Zhenchen Wan et.al. 2503.08650 null
2025-03-11 RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency Siqi Li et.al. 2501.08682 null
2025-02-20 CrossVTON: Mimicking the Logic Reasoning on Cross-category Virtual Try-on guided by Tri-zone Priors Donghao Luo et.al. 2502.14373 null
2025-02-05 Dress-1-to-3: Single Image to Simulation-Ready 3D Outfit with Diffusion Prior and Differentiable Physics Xuan Li et.al. 2502.03449 null
2025-02-03 MFP-VTON: Enhancing Mask-Free Person-to-Person Virtual Try-On via Diffusion Transformer Le Shen et.al. 2502.01626 null
2025-01-26 IPVTON: Image-based 3D Virtual Try-on with Image Prompt Adapter Xiaojing Zhong et.al. 2501.15616 null
2025-01-26 Cross-Cultural Fashion Design via Interactive Large Language Models and Diffusion Models Spencer Ramsey et.al. 2501.15571 null
2025-01-20 EfficientVITON: An Efficient Virtual Try-On Model using Optimized Diffusion Process Mostafa Atef et.al. 2501.11776 null
2025-01-20 CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation Zheng Chong et.al. 2501.11325 link
2025-01-17 Disharmony: Forensics using Reverse Lighting Harmonization Philip Wootaek Shin et.al. 2501.10212 null
2025-01-12 ODPG: Outfitting Diffusion with Pose Guided Condition Seohyun Lee et.al. 2501.06769 null
2025-01-10 MC-VTON: Minimal Control Virtual Try-On Diffusion Transformer Junsheng Luan et.al. 2501.03630 null
2025-01-09 1-2-1: Renaissance of Single-Network Paradigm for Virtual Try-On Shuliang Ning et.al. 2501.05369 null
2025-01-08 Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling Nannan Li et.al. 2501.04666 null
2025-01-07 HYB-VITON: A Hybrid Approach to Virtual Try-On Combining Explicit and Implicit Warping Kosuke Takemoto et.al. 2501.03910 link
2025-01-07 VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Yuanpeng Tu et.al. 2501.01427 null
2024-12-25 DRDM: A Disentangled Representations Diffusion Model for Synthesizing Realistic Person Images Enbo Huang et.al. 2412.18797 null
2024-12-22 PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask Jeongho Kim et.al. 2412.16978 link
2024-12-19 DiffusionTrend: A Minimalist Approach to Virtual Fashion Try-On Wengyi Zhan et.al. 2412.14465 null
2024-12-19 FashionComposer: Compositional Fashion Image Generation Sihui Ji et.al. 2412.14168 null

(back to top)

Visual Edit

Publish Date Title Authors PDF Code
2025-06-25 EditP23: 3D Editing via Propagation of Image Prompts to Multi-View Roi Bar-On et.al. 2506.20652 null
2025-06-25 Towards Efficient Exemplar Based Image Editing with Multimodal VLMs Avadhoot Jadhav et.al. 2506.20155 null
2025-06-25 OmniGen2: Exploration to Advanced Multimodal Generation Chenyuan Wu et.al. 2506.18871 null
2025-06-24 SceneCrafter: Controllable Multi-View Driving Scene Editing Zehao Zhu et.al. 2506.19488 null
2025-06-24 LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning Chenjian Gao et.al. 2506.10082 null
2025-06-23 Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models Ilia Beletskii et.al. 2506.19103 null
2025-06-23 Let Your Video Listen to Your Music! Xinyu Zhang et.al. 2506.18881 null
2025-06-23 CPAM: Context-Preserving Adaptive Manipulation for Zero-Shot Real Image Editing Dinh-Khoi Vo et.al. 2506.18438 null
2025-06-23 Instability in Diffusion ODEs: An Explanation for Inaccurate Image Reconstruction Han Zhang et.al. 2506.18290 null
2025-06-20 BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing Jiacheng Chen et.al. 2506.17450 null
2025-06-20 FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation Fan Yang et.al. 2506.16806 null
2025-06-19 Arch-Router: Aligning LLM Routing with Human Preferences Co Tran et.al. 2506.16655 null
2025-06-18 VectorEdits: A Dataset and Benchmark for Instruction-Based Editing of Vector Graphics Josef Kuchař et.al. 2506.15903 null
2025-06-17 Causally Steered Diffusion for Automated Video Counterfactual Generation Nikos Spyrou et.al. 2506.14404 link
2025-06-16 AttentionDrag: Exploiting Latent Correlation Knowledge in Pre-trained Diffusion Models for Image Editing Biao Yang et.al. 2506.13301 null
2025-06-15 Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing Zhuoying Li et.al. 2506.13827 null
2025-06-15 ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies Chenglin Wang et.al. 2506.12830 null
2025-06-14 Good Noise Makes Good Edits: A Training-Free Diffusion-Based Video Editing with Image and Text Prompts Saemee Choi et.al. 2506.12520 null
2025-06-13 SphereDrag: Spherical Geometry-Aware Panoramic Image Editing Zhiao Feng et.al. 2506.11863 null
2025-06-13 Consistent Video Editing as Flow-Driven Image-to-Video Generation Ge Wang et.al. 2506.07713 null
2025-06-12 VINCIE: Unlocking In-context Image Editing from Video Leigang Qu et.al. 2506.10941 null
2025-06-12 Edit360: 2D Image Edits to 3D Assets from Any Angle Junchao Huang et.al. 2506.10507 null
2025-06-12 Towards Reliable Identification of Diffusion-based Image Manipulations Alex Costanzino et.al. 2506.05466 null
2025-06-11 EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits Ron Yosef et.al. 2506.09988 null
2025-06-11 ELBO-T2IAlign: A Generic ELBO-Based Method for Calibrating Pixel-level Text-Image Alignment in Diffusion Models Qin Zhou et.al. 2506.09740 null
2025-06-11 Ming-Omni: A Unified Multimodal Model for Perception and Generation Inclusion AI et.al. 2506.09344 link
2025-06-11 Fine-Grained Spatially Varying Material Selection in Images Julia Guerrero-Viu et.al. 2506.09023 null
2025-06-10 Do Concept Replacement Techniques Really Erase Unacceptable Concepts? Anudeep Das et.al. 2506.08991 null
2025-06-10 RoboSwap: A GAN-driven Video Diffusion Framework For Unsupervised Robot Arm Swapping Yang Bai et.al. 2506.08632 null
2025-06-09 Highly Compressed Tokenizer Can Generate Without Training L. Lao Beyer et.al. 2506.08257 link
2025-06-09 PairEdit: Learning Semantic Variations for Exemplar-based Image Editing Haoguang Lu et.al. 2506.07992 link
2025-06-09 Diffusion Counterfactual Generation with Semantic Abduction Rajat Rasal et.al. 2506.07883 link
2025-06-09 DragNeXt: Rethinking Drag-Based Image Editing Yuan Zhou et.al. 2506.07611 null
2025-06-09 Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding Boyu Chen et.al. 2506.07576 null
2025-06-08 Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning Tianyi Bai et.al. 2506.07227 null
2025-06-08 TV-LiVE: Training-Free, Text-Guided Video Editing via Layer Informed Vitality Exploitation Min-Jung Kim et.al. 2506.07205 null
2025-06-06 Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models Yifu Qiu et.al. 2506.06006 link
2025-06-06 FADE: Frequency-Aware Diffusion Model Factorization for Video Editing Yixuan Zhu et.al. 2506.05934 link
2025-06-06 SeedEdit 3.0: Fast and High-Quality Generative Image Editing Peng Wang et.al. 2506.05083 null
2025-06-05 FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing Guangzhao Li et.al. 2506.05046 null
2025-06-05 Invisible Backdoor Triggers in Image Editing Model via Deep Watermarking Yu-Feng Chen et.al. 2506.04879 null
2025-06-05 FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers Xuanhua He et.al. 2506.04213 null
2025-06-04 HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation Hermann Kumbong et.al. 2506.04421 null
2025-06-04 Is Perturbation-Based Image Protection Disruptive to Image Editing? Qiuyu Tang et.al. 2506.04394 null
2025-06-04 UNIC: Unified In-Context Video Editing Zixuan Ye et.al. 2506.04216 null
2025-06-04 Image Editing As Programs with Diffusion Models Yujia Hu et.al. 2506.04158 null
2025-06-04 UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Bin Lin et.al. 2506.03147 null
2025-06-04 MedEBench: Revisiting Text-instructed Image Editing on Medical Domain Minghao Liu et.al. 2506.01921 null
2025-06-03 RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions Bimsara Pathiraja et.al. 2506.03448 null
2025-06-03 ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions Di Chang et.al. 2506.03107 null
2025-06-03 DCI: Dual-Conditional Inversion for Boosting Diffusion-Based Image Editing Zixiang Li et.al. 2506.02560 null
2025-06-03 RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers Yan Gong et.al. 2506.02528 null
2025-06-02 IMAGHarmony: Controllable Image Editing with Consistent Object Quantity and Layout Fei Shen et.al. 2506.01949 null
2025-06-02 OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation Sen Liang et.al. 2506.01801 null
2025-06-02 Unlocking Aha Moments via Reinforcement Learning: Advancing Collaborative Visual Comprehension and Generation Kaihang Pan et.al. 2506.01480 null
2025-06-02 DNAEdit: Direct Noise Alignment for Text-Guided Rectified Flow Editing Chenxi Xie et.al. 2506.01430 null
2025-06-01 Motion-Aware Concept Alignment for Consistent Video Editing Tong Zhang et.al. 2506.01004 null
2025-05-31 Concept-Centric Token Interpretation for Vector-Quantized Generative Models Tianze Yang et.al. 2506.00698 null
2025-05-30 MiniMax-Remover: Taming Bad Noise Helps Video Object Removal Bojia Zi et.al. 2505.24873 null
2025-05-29 Cora: Correspondence-aware image editing using few step diffusion Amirhossein Almohammadi et.al. 2505.23907 null
2025-05-29 LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers Yusuf Dalva et.al. 2505.23758 null
2025-05-29 Weakly-supervised Localization of Manipulated Image Regions Using Multi-resolution Learned Features Ziyong Wang et.al. 2505.23586 null
2025-05-29 Video Editing for Audio-Visual Dubbing Binyamin Manela et.al. 2505.23406 link
2025-05-29 FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing Jeongsol Kim et.al. 2505.23145 link
2025-05-29 Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing Tongtong Su et.al. 2505.23134 link
2025-05-28 HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer Qi Cai et.al. 2505.22705 link
2025-05-28 VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use Mingyuan Wu et.al. 2505.19255 null
2025-05-27 Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion Yang Yang et.al. 2505.21593 null
2025-05-27 Imago Obscura: An Image Privacy AI Co-pilot to Enable Identification and Mitigation of Risks Kyzyl Monteiro et.al. 2505.20916 null
2025-05-27 InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling Xiaoxiao Jiang et.al. 2505.20600 null
2025-05-26 What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models Lorenzo Baraldi et.al. 2505.20405 null
2025-05-26 ImgEdit: A Unified Image Editing Dataset and Benchmark Yang Ye et.al. 2505.20275 link
2025-05-26 StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation Yi Wu et.al. 2505.19874 null
2025-05-26 TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs Juntong Wang et.al. 2505.19535 null
2025-05-26 Understanding Generative AI Capabilities in Everyday Image Editing Tasks Mohammad Reza Taesiri et.al. 2505.16181 null
2025-05-25 Beyond Editing Pairs: Fine-Grained Instructional Image Editing via Multi-Scale Learnable Regions Chenrui Ma et.al. 2505.19352 null
2025-05-25 SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation Shenggan Cheng et.al. 2505.19151 null
2025-05-25 MIND-Edit: MLLM Insight-Driven Editing via Language-Vision Projection Shuyu Wang et.al. 2505.19149 null
2025-05-24 REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing Weihan Xu et.al. 2505.18880 null
2025-05-24 Affective Image Editing: Shaping Emotional Factors via Text Descriptions Peixuan Zhang et.al. 2505.18699 null
2025-05-24 Improved Immiscible Diffusion: Accelerate Diffusion Training by Reducing Its Miscibility Yiheng Li et.al. 2505.18521 link
2025-05-23 DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval Yuxin Yang et.al. 2505.17796 null
2025-05-23 R-Genie: Reasoning-Guided Generative Image Editing Dong Zhang et.al. 2505.17768 null
2025-05-22 KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models Yongliang Wu et.al. 2505.16707 null
2025-05-21 FragFake: A Dataset for Fine-Grained Detection of Edited Images with Vision Language Models Zhen Sun et.al. 2505.15644 link
2025-05-20 DragLoRA: Online Optimization of LoRA Adapters for Drag-based Image Editing in Diffusion Model Siwei Xia et.al. 2505.12427 link
2025-05-20 CompBench: Benchmarking Complex Instruction-guided Image Editing Bohan Jia et.al. 2505.12200 null
2025-05-18 From Shots to Stories: LLM-Assisted Video Editing with Unified Language Representations Yuzhi Li et.al. 2505.12237 null
2025-05-16 X-Edit: Detecting and Localizing Edits in Images Altered by Text-Guided Diffusion Models Valentina Bazyleva et.al. 2505.11753 null
2025-05-16 GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing Yusu Qian et.al. 2505.11493 null
2025-05-15 3D-Fixup: Advancing Photo Editing with 3D Priors Yen-Chi Cheng et.al. 2505.10566 null
2025-05-15 IntrinsicEdit: Precise generative image manipulation in intrinsic space Linjie Lyu et.al. 2505.08889 null
2025-05-14 Don't Forget your Inverse DDIM for Image Editing Guillermo Gomez-Trenado et.al. 2505.09571 null
2025-05-12 MDE-Edit: Masked Dual-Editing for Multi-Object Image Editing via Diffusion Models Hongyang Zhu et.al. 2505.05101 null
2025-05-11 DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models Junhao Xia et.al. 2505.07057 null
2025-05-11 Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation Chao Liao et.al. 2505.05472 null
2025-05-08 GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing Tong Wang et.al. 2505.04915 null
2025-05-07 Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers Divyansh Srivastava et.al. 2505.04718 null
2025-05-07 Multi-turn Consistent Image Editing Zijun Zhou et.al. 2505.04320 null
2025-05-07 Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction Inclusion AI et.al. 2505.02471 link
2025-05-06 MambaStyle: Efficient StyleGAN Inversion for Real Image Editing with State-Space Models Jhon Lopez et.al. 2505.15822 null
2025-05-06 Step1X-Edit: A Practical Framework for General Image Editing Shiyu Liu et.al. 2504.17761 link
2025-05-05 SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing Ming Li et.al. 2505.02370 link
2025-05-04 Video Forgery Detection for Surveillance Cameras: A Review Noor B. Tayfor et.al. 2505.03832 null
2025-05-02 Improving Editability in Image Generation with Layer-wise Memory Daneul Kim et.al. 2505.01079 null
2025-05-02 A Rusty Link in the AI Supply Chain: Detecting Evil Configurations in Model Repositories Ziqi Ding et.al. 2505.01067 null
2025-05-02 Photoshop Batch Rendering Using Actions for Stylistic Video Editing Tessa De La Fuente et.al. 2505.01001 null
2025-05-01 InstructAttribute: Fine-grained Object Attributes editing with Instruction Xingxi Yin et.al. 2505.00751 null
2025-05-01 Controllable Weather Synthesis and Removal with Video Diffusion Models Chih-Hao Lin et.al. 2505.00704 null
2025-05-01 Towards Scalable Human-aligned Benchmark for Text-guided Image Editing Suho Ryu et.al. 2505.00502 link
2025-04-30 PixelHacker: Image Inpainting with Structural and Semantic Consistency Ziyang Xu et.al. 2504.20438 null
2025-04-29 In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer Zechuan Zhang et.al. 2504.20690 null
2025-04-27 CapsFake: A Multimodal Capsule Network for Detecting Instruction-Guided Deepfakes Tuan Nguyen et.al. 2504.19212 null
2025-04-26 REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models Gal Almog et.al. 2504.18989 link
2025-04-24 DCT-Shield: A Robust Frequency Domain Defense against Malicious Image Editing Aniruddha Bala et.al. 2504.17894 null
2025-04-24 VEU-Bench: Towards Comprehensive Understanding of Video Editing Bozheng Li et.al. 2504.17828 null
2025-04-24 Generative Fields: Uncovering Hierarchical Feature Control for StyleGAN via Inverted Receptive Fields Zhuo He et.al. 2504.17712 null
2025-04-24 Enhancing Variational Autoencoders with Smooth Robust Latent Encoding Hyomin Lee et.al. 2504.17219 null
2025-04-24 Vidi: Large Multimodal Models for Video Understanding and Editing Vidi Team et.al. 2504.15681 null
2025-04-22 Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework Xinyuan Song et.al. 2504.16016 null
2025-04-22 Structure-Preserving Zero-Shot Image Editing via Stage-Wise Latent Injection in Diffusion Models Dasol Jeong et.al. 2504.15723 null
2025-04-21 MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World Ankit Dhiman et.al. 2504.15397 null
2025-04-21 Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach Lvpan Cai et.al. 2504.11922 link
2025-04-20 MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation Siyi Jiao et.al. 2504.14606 null
2025-04-19 Visual Prompting for One-shot Controllable Video Editing without Inversion Zhengbo Zhang et.al. 2504.14335 null
2025-04-19 PRISM: A Unified Framework for Photorealistic Reconstruction and Intrinsic Scene Modeling Alara Dirik et.al. 2504.14219 null
2025-04-18 Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation Fulvio Sanguigni et.al. 2504.14011 null
2025-04-18 Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing Joowon Kim et.al. 2504.13490 null
2025-04-17 Image Editing with Diffusion Models: A Survey Jia Wang et.al. 2504.13226 null
2025-04-17 $\texttt{Complex-Edit}$ : CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark Siwei Yang et.al. 2504.13143 null
2025-04-17 UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models Guanlong Jiao et.al. 2504.13109 null
2025-04-17 Image-Editing Specialists: An RLAIF Approach for Diffusion Models Elior Benarous et.al. 2504.12833 link
2025-04-17 SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding Qianqian Sun et.al. 2504.12704 null
2025-04-17 DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency Mengshi Qi et.al. 2504.12080 link
2025-04-17 Understanding Attention Mechanism in Video Diffusion Models Bingyan Liu et.al. 2504.12027 null
2025-04-14 Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing Taihang Hu et.al. 2504.10434 link
2025-04-14 Analysis of Attention in Video Diffusion Transformers Yuxin Wen et.al. 2504.10317 null
2025-04-14 TAPNext: Tracking Any Point (TAP) as Next Token Prediction Artem Zholus et.al. 2504.05579 null
2025-04-13 SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow Kenan Tang et.al. 2504.09697 link
2025-04-13 CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models Pooja Guhan et.al. 2504.09472 null
2025-04-11 CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model Ruohao Zhan et.al. 2504.08259 null
2025-04-10 POEM: Precise Object-level Editing via MLLM control Marco Schouten et.al. 2504.08111 null
2025-04-10 Learning Universal Features for Generalizable Image Forgery Localization Hengrun Zhao et.al. 2504.07462 link
2025-04-10 Routing to the Right Expertise: A Trustworthy Judge for Instruction-based Image Editing Chenxi Sun et.al. 2504.07424 null
2025-04-09 FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution Gene Chou et.al. 2504.07093 link
2025-04-08 VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing Juan Luis Gonzalez Bello et.al. 2504.07146 null
2025-04-08 Transfer between Modalities with MetaQueries Xichen Pan et.al. 2504.06256 null
2025-04-08 Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model Qi Mao et.al. 2504.05594 null
2025-04-08 Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing Xiangyu Zhao et.al. 2504.02826 link
2025-04-07 CREA: A Collaborative Multi-Agent Framework for Creative Content Generation with Diffusion Models Kavana Venkatesh et.al. 2504.05306 null
2025-04-07 Disentangling Instruction Influence in Diffusion Transformers for Parallel Multi-Instruction-Guided Image Editing Hui Liu et.al. 2504.04784 null
2025-04-07 MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models Wulin Xie et.al. 2504.03641 null
2025-04-04 Synthesizing Optimal Object Selection Predicates for Image Editing using Lattices Yang He et.al. 2504.03155 null
2025-04-03 How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models Pascal Chang et.al. 2504.03072 null
2025-04-03 VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning Xianwei Zhuang et.al. 2504.02949 link
2025-04-03 Concept Lancet: Image Editing with Compositional Representation Transplant Jinqi Luo et.al. 2504.02828 null
2025-04-03 GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation Zhiyuan Yan et.al. 2504.02782 link
2025-04-03 ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement Runhui Huang et.al. 2504.01934 null
2025-04-02 FreSca: Unveiling the Scaling Space in Diffusion Models Chao Huang et.al. 2504.02154 null
2025-04-02 A Diffusion-Based Framework for Occluded Object Movement Zheng-Peng Duan et.al. 2504.01873 null
2025-03-31 AI2Agent: An End-to-End Framework for Deploying AI Projects as Autonomous Agents Jiaxiang Chen et.al. 2503.23948 link
2025-03-31 Training-Free Text-Guided Image Editing with Visual Autoregressive Model Yufei Wang et.al. 2503.23897 link
2025-03-30 Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging Amar Kumar et.al. 2503.23618 null
2025-03-30 ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025 Tianming Liang et.al. 2503.23509 link
2025-03-30 SketchVideo: Sketch-based Video Generation and Editing Feng-Lin Liu et.al. 2503.23284 null
2025-03-29 FreeInv: Free Lunch for Improving DDIM Inversion Yuxiang Bao et.al. 2503.23035 null
2025-03-29 FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model Jun Zhou et.al. 2503.19839 null
2025-03-28 Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance Haijie Yang et.al. 2503.22225 null
2025-03-28 LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing Achint Soni et.al. 2503.21541 link
2025-03-26 Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising Yan-Bo Lin et.al. 2503.20782 null
2025-03-26 EditCLIP: Representation Learning for Image Editing Qian Wang et.al. 2503.20318 link
2025-03-26 Wan: Open and Advanced Large-Scale Video Generative Models WanTeam et.al. 2503.20314 link
2025-03-26 InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction Yuhui Wu et.al. 2503.20287 link
2025-03-25 Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning Sherry X. Chen et.al. 2503.18406 link
2025-03-25 Shot Sequence Ordering for Video Editing: Benchmarks, Metrics, and Cinematology-Inspired Computing Methods Yuzhi Li et.al. 2503.17975 null
2025-03-24 FDS: Frequency-Aware Denoising Score for Text-Guided Latent Diffusion Image Editing Yufan Ren et.al. 2503.19191 null
2025-03-24 Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance Sicong Feng et.al. 2503.18386 null
2025-03-24 MaSS13K: A Matting-level Semantic Segmentation Benchmark Chenxi Xie et.al. 2503.18364 link
2025-03-23 Collaborating with AI Agents: Field Experiments on Teamwork, Productivity, and Performance Harang Ju et.al. 2503.18238 link
2025-03-23 What Time Tells Us? An Explorative Study of Time Awareness Learned from Static Images Dongheng Lin et.al. 2503.17899 null
2025-03-23 Multi-focal Conditioned Latent Diffusion for Person Image Synthesis Jiaqi Liu et.al. 2503.15686 link
2025-03-22 InstructVEdit: A Holistic Approach for Instructional Video Editing Chi Zhang et.al. 2503.17641 null
2025-03-22 Guidance Free Image Editing via Explicit Conditioning Mehdi Noroozi et.al. 2503.17593 null
2025-03-21 HyperNVD: Accelerating Neural Video Decomposition via Hypernetworks Maria Pilligua et.al. 2503.17276 null
2025-03-21 DCEdit: Dual-Level Controlled Image Editing via Precisely Localized Semantics Yihan Hu et.al. 2503.16795 null
2025-03-20 FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing Tianyi Wei et.al. 2503.16153 null
2025-03-20 Single Image Iterative Subject-driven Generation and Editing Yair Shpitzer et.al. 2503.16025 link
2025-03-19 VEGGIE: Instructional Editing and Reasoning of Video Concepts with Grounded Generation Shoubin Yu et.al. 2503.14350 null
2025-03-18 ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing Yulin Pan et.al. 2503.14482 null
2025-03-18 TarPro: Targeted Protection against Malicious Image Editing Kaixin Shen et.al. 2503.13994 null
2025-03-17 FiVE: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models Minghan Li et.al. 2503.13684 null
2025-03-17 Unified Autoregressive Visual Generation and Understanding with Continuous Tokens Lijie Fan et.al. 2503.13436 null
2025-03-17 Edit Transfer: Learning Image Editing via Vision In-Context Relations Lan Chen et.al. 2503.13327 null
2025-03-17 GIFT: Generated Indoor video frames for Texture-less point tracking Jianzheng Huang et.al. 2503.12944 null
2025-03-17 DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode Junjia Huang et.al. 2503.12838 null
2025-03-16 UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing Tsu-Jui Fu et.al. 2503.12652 null
2025-03-16 Personalize Anything for Free with Diffusion Transformer Haoran Feng et.al. 2503.12590 null
2025-03-14 Upcycling Text-to-Image Diffusion Models for Multi-Task Capabilities Ruchika Chavhan et.al. 2503.11905 null
2025-03-14 RASA: Replace Anyone, Say Anything -- A Training-Free Framework for Audio-Driven and Universal Portrait Video Editing Tianrui Pan et.al. 2503.11571 null
2025-03-14 LUSD: Localized Update Score Distillation for Text-Guided Image Editing Worameth Chinchuthakun et.al. 2503.11054 link
2025-03-14 V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes Yanming Zhang et.al. 2503.10634 null
2025-03-14 On the Limitations of Vision-Language Models in Understanding Image Transforms Ahmad Mustafa Anis et.al. 2503.09837 null
2025-03-13 Fine-Tuning Diffusion Generative Models via Rich Preference Optimization Hanyang Zhao et.al. 2503.11720 null
2025-03-13 CoSTA $\ast$ : Cost-Sensitive Toolpath Agent for Multi-turn Image Editing Advait Gupta et.al. 2503.10613 link
2025-03-13 EEdit : Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing Zexuan Yan et.al. 2503.10270 link
2025-03-13 MoEdit: On Learning Quantity Perception for Multi-object Image Editing Yanfeng Li et.al. 2503.10112 link
2025-03-13 Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models Armando Fortes et.al. 2503.08434 null
2025-03-12 Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space Yifan Zhou et.al. 2503.09419 link
2025-03-12 InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images Jiun Tian Hoe et.al. 2503.09130 null
2025-03-12 OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting Yongsheng Yu et.al. 2503.08677 null
2025-03-11 Aligning Text to Image in Diffusion Models is Easier Than You Think Jaa-Yeon Lee et.al. 2503.08250 link
2025-03-11 ObjectMover: Generative Object Movement with Video Prior Xin Yu et.al. 2503.08037 null
2025-03-11 CAD-VAE: Leveraging Correlation-Aware Latents for Comprehensive Fair Disentanglement Chenrui Ma et.al. 2503.07938 null
2025-03-11 VACE: All-in-One Video Creation and Editing Zeyinzi Jiang et.al. 2503.07598 null
2025-03-10 Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model Lixue Gong et.al. 2503.07703 null
2025-03-10 TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation Victor Shea-Jay Huang et.al. 2503.07050 null
2025-03-10 Interactive Tumor Progression Modeling via Sketch-Based Image Editing Gexin Huang et.al. 2503.06809 null
2025-03-10 VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control Yuxuan Bian et.al. 2503.05639 link
2025-03-09 Consistent Image Layout Editing with Diffusion Models Tao Xia et.al. 2503.06419 null
2025-03-08 Get In Video: Add Anything You Want to the Video Shaobin Zhuang et.al. 2503.06268 null
2025-03-08 X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation Jian Ma et.al. 2503.06134 link
2025-03-07 Towards Locally Explaining Prediction Behavior via Gradual Interventions and Measuring Property Gradients Niklas Penzel et.al. 2503.05424 null
2025-03-06 Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models Rui Jiang et.al. 2503.04215 null
2025-03-05 GuardDoor: Safeguarding Against Malicious Diffusion Editing via Protective Backdoors Yaopei Zeng et.al. 2503.03944 null
2025-03-04 h-Edit: Effective and Flexible Diffusion-Based Editing via Doob's h-Transform Toan Nguyen et.al. 2503.02187 link
2025-03-03 VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors Juil Koo et.al. 2503.01107 null
2025-03-01 GenVDM: Generating Vector Displacement Maps From a Single Image Yuezhi Yang et.al. 2503.00605 null
2025-02-27 Tight Inversion: Image-Conditioned Inversion for Real Image Editing Edo Kadosh et.al. 2502.20376 null
2025-02-27 Identity-preserving Distillation Sampling by Fixed-Point Iterator SeonHwa Kim et.al. 2502.19930 null
2025-02-26 SVGEditBench V2: A Benchmark for Instruction-based SVG Editing Kunato Nishina et.al. 2502.19453 link
2025-02-26 Bayesian Optimization for Controlled Image Editing via LLMs Chengkun Cai et.al. 2502.18116 null
2025-02-25 KV-Edit: Training-Free Image Editing for Precise Background Preservation Tianrui Zhu et.al. 2502.17363 link
2025-02-24 VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing Xiangpeng Yang et.al. 2502.17258 null
2025-02-23 PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data Shijie Huang et.al. 2502.14397 link
2025-02-22 DualNeRF: Text-Driven 3D Scene Editing via Dual-Field Representation Yuxuan Xiong et.al. 2502.16302 null
2025-02-18 AnyRefill: A Unified, Data-Efficient Framework for Left-Prompt-Guided Vision Tasks Ming Xie et.al. 2502.11158 null
2025-02-14 PromptArtisan: Multi-instruction Image Editing in Single Pass with Complete Attention Control Kunal Swami et.al. 2502.10258 null
2025-02-14 VideoDiff: Human-AI Video Co-Creation with Alternatives Mina Huh et.al. 2502.10190 null
2025-02-14 Hands-off Image Editing: Language-guided Editing without any Task-specific Labeling, Masking or even Training Rodrigo Santos et.al. 2502.10064 null
2025-02-14 SportsBuddy: Designing and Evaluating an AI-Powered Sports Video Storytelling Tool Through Real-World Deployment Tica Lin et.al. 2502.08621 null
2025-02-10 Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists Bojia Zi et.al. 2502.06734 null
2025-02-10 Predictive Red Teaming: Breaking Policies Without Breaking Robots Anirudha Majumdar et.al. 2502.06575 null
2025-02-08 AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming And Keyframe Selection Shuheng Zhang et.al. 2502.05433 null
2025-02-06 MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation Jinbo Xing et.al. 2502.04299 null
2025-02-06 PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models Aleksandar Cvejic et.al. 2502.04050 null
2025-02-06 DICE: Distilling Classifier-Free Guidance into Text Embeddings Zhenyu Zhou et.al. 2502.03726 null
2025-02-05 Lost in Edits? A $λ$ -Compass for AIGC Provenance Wenhao You et.al. 2502.04364 null
2025-02-05 REALEDIT: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations Peter Sushko et.al. 2502.03629 null
2025-02-04 Exploring the latent space of diffusion models directly through singular value decomposition Li Wang et.al. 2502.02225 null
2025-02-04 EditIQ: Automated Cinematic Editing of Static Wide-Angle Videos via Dialogue Interpretation and Saliency Cues Rohit Girmaji et.al. 2502.02172 null
2025-02-04 Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation JooHyun Kwon et.al. 2502.02091 null
2025-01-30 DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models Ruofan Liang et.al. 2501.18590 null
2025-01-24 MATCHA:Towards Matching Anything Fei Xue et.al. 2501.14945 null
2025-01-24 Training-Free Style and Content Transfer by Leveraging U-Net Skip Connections in Stable Diffusion 2.* Ludovica Schaerf et.al. 2501.14524 null
2025-01-23 IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models Jiayi Lei et.al. 2501.13920 null

(back to top)

Others

Publish Date Title Authors PDF Code
2025-06-25 A Computationally Aware Multi Objective Framework for Camera LiDAR Calibration Venkat Karramreddy et.al. 2506.20636 null
2025-06-25 Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings Ankit Shah et.al. 2506.20609 null
2025-06-25 Learning-Based Distance Estimation for 360° Single-Sensor Setups Yitong Quan et.al. 2506.20586 null
2025-06-25 Communication-Aware Map Compression for Online Path-Planning: A Rate-Distortion Approach Ali Reza Pedram et.al. 2506.20579 null
2025-06-25 HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction Zhonghao Shi et.al. 2506.20566 null
2025-06-25 Reinforcement Learning Increases Wind Farm Power Production by Enabling Closed-Loop Collaborative Control Andrew Mole et.al. 2506.20554 null
2025-06-25 Lightweight Multi-Frame Integration for Robust YOLO Object Detection in Videos Yitong Quan et.al. 2506.20550 null
2025-06-25 {\tt RapidGBM}: An Efficient Tool for Fermi-GBM Visibility Checking and Data Analysis with a Case Study of EP240617a Yun Wang et.al. 2506.20532 null
2025-06-25 Comparison between Causal and Acausal Diffusion: a Schwinger-Keldysh Effective Field Theory Perspective Navid Abbasi et.al. 2506.20500 null
2025-06-25 Learning-based safety lifting monitoring system for cranes on construction sites Hao Chen et.al. 2506.20475 null
2025-06-25 Enhanced Robotic Navigation in Deformable Environments using Learning from Demonstration and Dynamic Modulation Lingyun Chen et.al. 2506.20376 null
2025-06-25 Producer-Fairness in Sequential Bundle Recommendation Alexandre Rio et.al. 2506.20329 null
2025-06-25 Finding the Easy Way Through -- the Probabilistic Gap Planner for Social Robot Navigation Malte Probst et.al. 2506.20320 null
2025-06-25 Computed tomography of propagating microwave photons Qi-Ming Chen et.al. 2506.20318 null
2025-06-25 Real-Time Obstacle Avoidance Algorithms for Unmanned Aerial and Ground Vehicles Jingwen Wei et.al. 2506.20311 null
2025-06-25 Analog OFDM based on Real-Time Fourier Transformation Xiaolu Yang et.al. 2506.20287 null
2025-06-25 Dynamic Bandwidth Allocation for Hybrid Event-RGB Transmission Pujing Yang et.al. 2506.20222 null
2025-06-25 Personalized Mental State Evaluation in Human-Robot Interaction using Federated Learning Andrea Bussolan et.al. 2506.20212 null
2025-06-25 RaRa Clipper: A Clipper for Gaussian Splatting Based on Ray Tracer and Rasterizer Da Li et.al. 2506.20202 null
2025-06-25 First experimental demonstration of plasma shape control in a tokamak through Model Predictive Control Adriano Mele et.al. 2506.20096 null
2025-06-25 Adaptive Request Scheduling for CodeLLM Serving with SLA Guarantees Shi Chang et.al. 2506.19677 null
2025-06-24 The Shape of Consumer Behavior: A Symbolic and Topological Analysis of Time Series Pola Bereta et.al. 2506.19759 null
2025-06-24 MDR-DeePC: Model-Inspired Distributionally Robust Data-Enabled Predictive Control Shihao Li et.al. 2506.19744 null
2025-06-24 NEAR $^2$ : A Nested Embedding Approach to Efficient Product Retrieval and Ranking Shenbin Qian et.al. 2506.19743 null
2025-06-24 Dual-energy extraction for proton therapy and imaging: validation on a clinical synchrotron-based facility Alexander A. Pryanichnikov et.al. 2506.19736 null
2025-06-24 SIP-IFVM: An observation-based magnetohydrodynamic model of coronal mass ejection Haopeng Wang et.al. 2506.19711 null
2025-06-24 Health Sentinel: An AI Pipeline For Real-time Disease Outbreak Detection Devesh Pant et.al. 2506.19548 null
2025-06-24 NTRL: Encounter Generation via Reinforcement Learning for Dynamic Difficulty Adjustment in Dungeons and Dragons Carlo Romeo et.al. 2506.19530 null
2025-06-24 MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications Aleksandr Algazinov et.al. 2506.19502 null
2025-06-24 An analytical model of depth-dose distributions for carbon-ion beams Fulya Halıcılar et.al. 2506.19479 null
2025-06-24 Can Movable Antenna-enabled Micro-Mobility Replace UAV-enabled Macro-Mobility? A Physical Layer Security Perspective Kaixuan Li et.al. 2506.19456 null
2025-06-24 Enhanced Fault Ride-Through Grid Forming with Transient Synchronisation Stability and Current Saturation Youcefa Brahim Elkhalil et.al. 2506.19444 null
2025-06-24 Mem4Nav: Boosting Vision-and-Language Navigation in Urban Environments with a Hierarchical Spatial-Cognition Long-Short Memory System Lixuan He et.al. 2506.19433 null
2025-06-24 Virtual Memory for 3D Gaussian Splatting Jonathan Haberl et.al. 2506.19415 null
2025-06-24 Can theory-driven learning analytics dashboard enhance human-AI collaboration in writing learning? Insights from an empirical experiment Angxuan Chen et.al. 2506.19364 null
2025-06-24 OpticalAging: Real-time Presbyopia Simulation for Inclusive Design via Tunable Lenses Qing Zhang et.al. 2506.19307 null
2025-06-24 Ontology Neural Network and ORTSF: A Framework for Topological Reasoning and Delay-Robust Control Jaehong Oh et.al. 2506.19277 null
2025-06-24 High-throughput spin-bath characterization of spin-defects in semiconductors Abigail N. Poteshman et.al. 2506.19259 null
2025-06-24 Behavioral Anomaly Detection in Distributed Systems via Federated Contrastive Learning Renzi Meng et.al. 2506.19246 null
2025-06-24 PicoSAM2: Low-Latency Segmentation In-Sensor for Edge Vision Applications Pietro Bonazzi et.al. 2506.18807 null
2025-06-23 PRISM: Perceptual Recognition for Identifying Standout Moments in Human-Centric Keyframe Extraction Mert Can Cakmak et.al. 2506.19168 null
2025-06-23 MinD: Unified Visual Imagination and Control via Hierarchical World Models Xiaowei Chi et.al. 2506.18897 null
2025-06-23 OmniGen2: Exploration to Advanced Multimodal Generation Chenyuan Wu et.al. 2506.18871 null
2025-06-23 LIGHTHOUSE: Fast and precise distance to shoreline calculations from anywhere on earth Patrick Beukema et.al. 2506.18842 null
2025-06-23 STU-PID: Steering Token Usage via PID Controller for Efficient Large Language Model Reasoning Aryasomayajula Ram Bharadwaj et.al. 2506.18831 null
2025-06-23 MLLP-VRAIN UPV system for the IWSLT 2025 Simultaneous Speech Translation Translation task Jorge Iranzo-Sánchez et.al. 2506.18828 null
2025-06-23 ModeliHub: A Web-based, Federated Analytics Platform for Modelica-centric, Model-based Systems Engineering Mohamad Omar Nachawati et.al. 2506.18790 null
2025-06-23 Flow-Aware Diffusion for Real-Time VR Restoration: Enhancing Spatiotemporal Coherence and Efficiency Yitong Zhu et.al. 2506.18786 null
2025-06-23 NOVA: Navigation via Object-Centric Visual Autonomy for High-Speed Target Tracking in Unstructured GPS-Denied Environments Alessandro Saviolo et.al. 2506.18689 null
2025-06-23 Efficient and Generalizable Speaker Diarization via Structured Pruning of Self-Supervised Models Jiangyu Han et.al. 2506.18623 null
2025-06-23 New Power Decoupling Method for Grid Forming Inverter Based on Adaptive Virtual-Synchronous Machine in Weak Grids Waleed Breesam et.al. 2506.18619 null
2025-06-23 Frequency Control in Microgrids: An Adaptive Fuzzy-Neural-Network Virtual Synchronous Generator Waleed Breesam et.al. 2506.18611 null
2025-06-23 PG-LIO: Photometric-Geometric fusion for Robust LiDAR-Inertial Odometry Nikhil Khedekar et.al. 2506.18583 null
2025-06-23 Multi-Rank Subspace Change-Point Detection for Monitoring Robotic Swarms Jonghyeok Lee et.al. 2506.18562 null
2025-06-23 Efficient Beam Selection for ISAC in Cell-Free Massive MIMO via Digital Twin-Assisted Deep Reinforcement Learning Jiexin Zhang et.al. 2506.18560 null
2025-06-23 ADNF-Clustering: An Adaptive and Dynamic Neuro-Fuzzy Clustering for Leukemia Prediction Marco Aruta et.al. 2506.18396 null
2025-06-23 Robots and Children that Learn Together : Improving Knowledge Retention by Teaching Peer-Like Interactive Robots Imene Tarakli et.al. 2506.18365 null
2025-06-23 TritonZ: A Remotely Operated Underwater Rover with Manipulator Arm for Exploration and Rescue Operations Kawser Ahmed et.al. 2506.18343 null
2025-06-23 Programmable electro-optic frequency comb empowers integrated parallel convolution processing Jinze He et.al. 2506.18310 null
2025-06-23 LLM-Integrated Digital Twins for Hierarchical Resource Allocation in 6G Networks Majumder Haider et.al. 2506.18293 null
2025-06-20 Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition Jiaqi Li et.al. 2506.17201 null
2025-06-20 Judo: A User-Friendly Open-Source Package for Sampling-Based Model Predictive Control Albert H. Li et.al. 2506.17184 null
2025-06-20 A Set-valued Impact Law Approach for Modeling and Analysis of Rigid Contact Universal Joint with Clearance Junaid Ali et.al. 2506.17183 null
2025-06-20 A tutorial overview of model predictive control for continuous crystallization: current possibilities and future perspectives Collin R. Johnson et.al. 2506.17146 null
2025-06-20 Real-time Broadband RFI Excision for the Upgraded GMRT Ruta Kale et.al. 2506.17131 null
2025-06-20 Rapid and Continuous Trust Evaluation for Effective Task Collaboration Through Siamese Model Botao Zhu et.al. 2506.17128 null
2025-06-20 RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and Tracking Teng Guo et.al. 2506.17119 null
2025-06-20 JANUS: Resilient and Adaptive Data Transmission for Enabling Timely and Efficient Cross-Facility Scientific Workflows Vladislav Esaulov et.al. 2506.17084 null
2025-06-20 Opportunities for real-time process control of electrode properties in lithium-ion battery manufacturing Noël Hallemans et.al. 2506.17048 null
2025-06-20 Probing dynamical axion quasiparticles with two-photon correlations Daniel Boyanovsky et.al. 2506.17013 null
2025-06-20 Prmpt2Adpt: Prompt-Based Zero-Shot Domain Adaptation for Resource-Constrained Environments Yasir Ali Farrukh et.al. 2506.16994 null
2025-06-20 Wi-Fi Sensing Tool Release: Gathering 802.11ax Channel State Information from a Commercial Wi-Fi Access Point Zisheng Wang et.al. 2506.16957 null
2025-06-20 Multimodal Fused Learning for Solving the Generalized Traveling Salesman Problem in Robotic Task Planning Jiaqi Chen et.al. 2506.16931 null
2025-06-20 Single-shot thermometry of simulated Bose--Einstein condensates using artificial intelligence Jack Griffiths et.al. 2506.16925 null
2025-06-20 Real-Time Black-Box Optimization for Dynamic Discrete Environments Using Embedded Ising Machines Tomoya Kashimata et.al. 2506.16924 null
2025-06-20 ROS 2 Agnocast: Supporting Unsized Message Types for True Zero-Copy Publish/Subscribe IPC Takahiro Ishikawa-Aso et.al. 2506.16882 null
2025-06-20 Revolutionizing Validation and Verification: Explainable Testing Methodologies for Intelligent Automotive Decision-Making Systems Halit Eris et.al. 2506.16876 null
2025-06-20 RS-Coded Adaptive Dynamic Network for Reliable Long-Term Information Transmission in Disturbed Multimode Fiber Yang Hu et.al. 2506.16859 null
2025-06-20 Robust Dynamic Material Handling via Adaptive Constrained Evolutionary Reinforcement Learning Chengpeng Hu et.al. 2506.16795 null
2025-06-20 Reinforcement learning for hybrid charging stations planning and operation considering fixed and mobile chargers Yanchen Zhu et.al. 2506.16764 null
2025-06-18 Vision in Action: Learning Active Perception from Human Demonstrations Haoyu Xiong et.al. 2506.15666 null
2025-06-18 BoxFusion: Reconstruction-Free Open-Vocabulary 3D Object Detection via Real-Time Multi-View Box Fusion Yuqing Lan et.al. 2506.15610 null
2025-06-18 MicroRicci: A Greedy and Local Ricci Flow Solver for Self-Tuning Mesh Smoothing Le Vu Anh et.al. 2506.15571 null
2025-06-18 PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction Shufan Li et.al. 2506.15556 null
2025-06-18 Real-Time Initialization of Unknown Anchors for UWB-aided Navigation Giulio Delama et.al. 2506.15518 null
2025-06-18 Model Predictive Path-Following Control for a Quadrotor David Leprich et.al. 2506.15447 null
2025-06-18 A Real-time Endoscopic Image Denoising System Yu Xing et.al. 2506.15395 null
2025-06-18 Evaluation Pipeline for systematically searching for Anomaly Detection Systems Florian Rokohl et.al. 2506.15388 null
2025-06-18 Efficient Navigation Among Movable Obstacles using a Mobile Manipulator via Hierarchical Policy Learning Taegeun Yang et.al. 2506.15380 null
2025-06-18 J3DAI: A tiny DNN-Based Edge AI Accelerator for 3D-Stacked CMOS Image Sensor Benoit Tain et.al. 2506.15316 null
2025-06-18 AI-driven visual monitoring of industrial assembly tasks Mattia Nardon et.al. 2506.15285 null
2025-06-18 Multi-Agent Reinforcement Learning for Autonomous Multi-Satellite Earth Observation: A Realistic Case Study Mohamad A. Hady et.al. 2506.15207 null
2025-06-18 In-Context Learning for Gradient-Free Receiver Adaptation: Principles, Applications, and Theory Matteo Zecchin et.al. 2506.15176 null
2025-06-18 Human Locomotion Implicit Modeling Based Real-Time Gait Phase Estimation Yuanlong Ji et.al. 2506.15150 null
2025-06-18 I Know You're Listening: Adaptive Voice for HRI Paige Tuttösí et.al. 2506.15107 null
2025-06-18 EmojiVoice: Towards long-term controllable expressivity in robot speech Paige Tuttösí et.al. 2506.15085 null
2025-06-18 Make Your AUV Adaptive: An Environment-Aware Reinforcement Learning Framework For Underwater Tasks Yimian Ding et.al. 2506.15082 null
2025-06-18 ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies Jinyan Yuan et.al. 2506.14315 null
2025-06-17 GCN-Driven Reinforcement Learning for Probabilistic Real-Time Guarantees in Industrial URLLC Eman Alqudah et.al. 2506.15011 null
2025-06-17 Mixed Traffic: A Perspective from Long Duration Autonomy Filippos Tzortzoglou et.al. 2506.15004 null
2025-06-17 CNN-Enabled Scheduling for Probabilistic Real-Time Guarantees in Industrial URLLC Eman Alqudah et.al. 2506.14987 null
2025-06-17 CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion Jiahua Ma et.al. 2506.14769 null
2025-06-17 Technosignature Searches with Real-time Alert Brokers Eleanor M. Gallay et.al. 2506.14744 null
2025-06-17 Casper: Inferring Diverse Intents for Assistive Teleoperation with Vision Language Models Huihan Liu et.al. 2506.14727 null
2025-06-17 SkinCells: Sparse Skinning using Voronoi Cells Egor Larionov et.al. 2506.14714 null
2025-06-17 Deep Learning-Based Prediction of High Explosive Induced Fluid Dynamics Francis G. VanGessel et.al. 2506.14710 null
2025-06-17 Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers Daniel D'souza et.al. 2506.14702 null
2025-06-17 Design an Editable Speech-to-Sign-Language Transformer System: A Human-Centered AI Approach Yingchao Li et.al. 2506.14677 null
2025-06-17 ASAP-FE: Energy-Efficient Feature Extraction Enabling Multi-Channel Keyword Spotting on Edge Processors Jongin Choi et.al. 2506.14657 null
2025-06-17 3DGS-IEval-15K: A Large-scale Image Quality Evaluation Database for 3D Gaussian-Splatting Yuke Xing et.al. 2506.14642 null
2025-06-17 Low-code to fight climate change: the Climaborough project Aaron Conrardy et.al. 2506.14623 null
2025-06-17 Deep Learning Surrogates for Real-Time Gas Emission Inversion Thomas Newman et.al. 2506.14597 null
2025-06-17 Review of Machine Learning for Real-Time Analysis at the Large Hadron Collider experiments ALICE, ATLAS, CMS and LHCb Laura Boggia et.al. 2506.14578 null
2025-06-17 GAMORA: A Gesture Articulated Meta Operative Robotic Arm for Hazardous Material Handling in Containment-Level Environments Farha Abdul Wasay et.al. 2506.14513 null
2025-06-17 SimSpark: Interactive Simulation of Social Media Behaviors Ziyue Lin et.al. 2506.14476 null
2025-06-17 MalGuard: Towards Real-Time, Accurate, and Actionable Detection of Malicious Packages in PyPI Ecosystem Xingan Gao et.al. 2506.14466 null
2025-06-17 Active Digital Twins via Active Inference Matteo Torzoni et.al. 2506.14453 null
2025-06-17 Socially Aware Robot Crowd Navigation via Online Uncertainty-Driven Risk Adaptation Zhirui Sun et.al. 2506.14305 null
2025-06-17 Whole-Body Control Framework for Humanoid Robots with Heavy Limbs: A Model-Based Approach Tianlin Zhang et.al. 2506.14278 null
2025-06-17 GHz spiking neuromorphic photonic chip with in-situ training Jinlong Xiang et.al. 2506.14272 null
2025-06-16 Compact representation and long-time extrapolation of real-time data for quantum systems Andre Erpenbeck et.al. 2506.13760 null
2025-06-16 Robust Recursive Fusion of Multiresolution Multispectral Images with Location-Aware Neural Networks Haoqing Li et.al. 2506.13733 null
2025-06-16 BanditWare: A Contextual Bandit-based Framework for Hardware Prediction Tainã Coleman et.al. 2506.13730 null
2025-06-16 How Real is CARLAs Dynamic Vision Sensor? A Study on the Sim-to-Real Gap in Traffic Object Detection Kaiyuan Tan et.al. 2506.13722 null
2025-06-16 Direct visualization of visible-light hyperbolic plasmon polaritons in real space and time Atreyie Ghosh et.al. 2506.13719 null
2025-06-16 HARMONI: Haptic-Guided Assistance for Unified Robotic Tele-Manipulation and Tele-Navigation V. Sripada et.al. 2506.13704 null
2025-06-16 Photomagnetic-Chiral Anisotropy mediated by Chirality-Driven Asymmetric Spin Splitting Tianwei Ouyang et.al. 2506.13696 null
2025-06-16 Integrated Pipeline for Monocular 3D Reconstruction and Finite Element Simulation in Industrial Applications Bowen Zheng et.al. 2506.13573 null
2025-06-16 Controlled manipulation of solitons in a recirculating fiber loop using external potentials François Copie et.al. 2506.13544 null
2025-06-16 UAV Object Detection and Positioning in a Mining Industrial Metaverse with Custom Geo-Referenced Data Vasiliki Balaska et.al. 2506.13505 null
2025-06-16 Leveraging active learning-enhanced machine-learned interatomic potential for efficient infrared spectra prediction Nitik Bhatia et.al. 2506.13486 null
2025-06-16 From Flat to Feeling: A Feasibility and Impact Study on Dynamic Facial Emotions in AI-Generated Avatars Pegah Salehi et.al. 2506.13477 null
2025-06-16 SA-LUT: Spatial Adaptive 4D Look-Up Table for Photorealistic Style Transfer Zerui Gong et.al. 2506.13465 link
2025-06-16 Block-wise Adaptive Caching for Accelerating Diffusion Policy Kangye Ji et.al. 2506.13456 null
2025-06-16 Towards real-time additive-free dopamine detection at $10^{-8}$ mM with hardware accelerated platform integrated on camera Ning Li et.al. 2506.13447 null
2025-06-16 Training Neural Networks by Optimizing Neuron Positions Laura Erb et.al. 2506.13410 null
2025-06-16 HELENA: High-Efficiency Learning-based channel Estimation using dual Neural Attention Miguel Camelo Botero et.al. 2506.13408 link
2025-06-16 A Model-Free Detection Method for Internal Short Circuits in Single Lithium-ion Cells Using Pseudo Open-Circuit Voltage Difference Yangyang Xu et.al. 2506.13394 null
2025-06-16 Joint Optimization of Multi-UAV Deployment and 3D Positioning in Traffic-Aware Aerial Networks Kamran Shafafi et.al. 2506.13287 null
2025-06-16 SONIC: Sound Optimization for Noise In Crowds Pranav M N et.al. 2506.13272 null
2025-06-13 Reimagining Dance: Real-time Music Co-creation between Dancers and AI Olga Vechtomova et.al. 2506.12008 null
2025-06-13 Robustness of Floquet topological phase at room temperature: a first-principles dynamics study Ruiyi Zhou et.al. 2506.12005 null
2025-06-13 Learning Before Filtering: Real-Time Hardware Learning at the Detector Level Boštjan Maček et.al. 2506.11981 null
2025-06-13 Secure API-Driven Research Automation to Accelerate Scientific Discovery Tyler J. Skluzacek et.al. 2506.11950 null
2025-06-13 Palpation Alters Auditory Pain Expressions with Gender-Specific Variations in Robopatients Chapa Sirithunge et.al. 2506.11906 null
2025-06-13 DMRS-Based Uplink Channel Estimation for MU-MIMO Systems with Location-Specific SCSI Acquisition Jiawei Zhuang et.al. 2506.11899 null
2025-06-13 Enter: Graduated Realism: A Pedagogical Framework for AI-Powered Avatars in Virtual Reality Teacher Training Judson Leroy Dean Haynes IV et.al. 2506.11890 null
2025-06-13 An Explainable AI Framework for Dynamic Resource Management in Vehicular Network Slicing Haochen Sun et.al. 2506.11882 null
2025-06-13 Bistable random momentum transfer in a linear on-chip resonator Tingyi Gu et.al. 2506.11859 null
2025-06-13 Framework of a multiscale data-driven digital twin of the muscle-skeletal system Martina Paccini et.al. 2506.11821 null
2025-06-13 Diffusion-Based Electrocardiography Noise Quantification via Anomaly Detection Tae-Seong Han et.al. 2506.11815 link
2025-06-13 SSPINNpose: A Self-Supervised PINN for Inertial Pose and Dynamics Estimation Markus Gambietz et.al. 2506.11786 null
2025-06-13 Real-Time Feedback and Benchmark Dataset for Isometric Pose Evaluation Abhishek Jaiswal et.al. 2506.11774 null
2025-06-13 Dynamic Collaborative Material Distribution System for Intelligent Robots In Smart Manufacturing Ziren Xiao et.al. 2506.11723 null
2025-06-13 Modeling Urban Air Quality Using Taxis as Sensors Anastasios Noulas et.al. 2506.11720 null
2025-06-13 Generalised Rate Control Approach For Stream Processing Applications Ziren Xiao et.al. 2506.11710 null
2025-06-13 DMAF-Net: An Effective Modality Rebalancing Framework for Incomplete Multi-Modal Medical Image Segmentation Libin Lan et.al. 2506.11691 null
2025-06-13 GraphRAG-Causal: A novel graph-augmented framework for causal reasoning and annotation in news Abdul Haque et.al. 2506.11600 null
2025-06-13 Camera-based method for the detection of lifted truck axles using convolutional neural networks Bachir Tchana Tankeu et.al. 2506.11574 null
2025-06-13 Scheduling Agile Earth Observation Satellites with Onboard Processing and Real-Time Monitoring Antonio M. Mercado-Martínez et.al. 2506.11556 null
2025-06-12 InstaInpaint: Instant 3D-Scene Inpainting with Masked Large Reconstruction Model Junqi You et.al. 2506.10980 null
2025-06-12 Discovery and Localization of the Swift-Observed FRB 20241228A in a Star-forming Host Galaxy Alice P. Curtin et.al. 2506.10961 null
2025-06-12 Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors Chen Yueh-Han et.al. 2506.10949 link
2025-06-12 Execution Guided Line-by-Line Code Generation Boaz Lavon et.al. 2506.10948 link
2025-06-12 Non-Abelian dynamics on a cube: improving quantum compilation through qudit-based simulations Jacky Jiang et.al. 2506.10945 null
2025-06-12 Building a Media Ecosystem Observatory from Scratch: Infrastructure, Methodology, and Insights Zeynep Pehlivan et.al. 2506.10942 null
2025-06-12 MARS: Processing-In-Memory Acceleration of Raw Signal Genome Analysis Inside the Storage Subsystem Melina Soysal et.al. 2506.10931 null
2025-06-12 Agentic Semantic Control for Autonomous Wireless Space Networks: Extending Space-O-RAN with MCP-Driven Distributed Intelligence Eduardo Baena et.al. 2506.10925 null
2025-06-12 Adaptive Job Scheduling in Quantum Clouds Using Reinforcement Learning Waylon Luo et.al. 2506.10889 null
2025-06-12 S3 Mirror: S3Mirror: Making Genomic Data Transfers Fast, Reliable, and Observable with DBOS Steven Vasquez-Grinnell et.al. 2506.10886 null
2025-06-12 Modeling Trust Dynamics in Robot-Assisted Delivery: Impact of Trust Repair Strategies Dong Hae Mangalindan et.al. 2506.10884 null
2025-06-12 Enhancing Medical Dialogue Generation through Knowledge Refinement and Dynamic Prompt Adjustment Hongda Sun et.al. 2506.10877 link
2025-06-12 General Reference Frame Identification and Transformation in Unbalanced Power Systems Francisco G. Montoya et.al. 2506.10835 null
2025-06-12 A novel visual data-based diagnostic approach for estimation of regime transition in pool boiling Pranay Nirapure et.al. 2506.10832 null
2025-06-12 Efficiency Robustness of Dynamic Deep Learning Systems Ravishka Rathnasuriya et.al. 2506.10831 link
2025-06-12 Grasp Prediction based on Local Finger Motion Dynamics Dimitar Valkov et.al. 2506.10818 null
2025-06-12 Human-Robot Navigation using Event-based Cameras and Reinforcement Learning Ignacio Bugueno-Cordova et.al. 2506.10790 null
2025-06-12 Hazel Deriver: A Live Editor for Constructing Rule-Based Derivations Zhiyao Zhong et.al. 2506.10781 null
2025-06-12 Integrating Large Language Models into Text Animation: An Intelligent Editing System with Inline and Chat Interaction Bao Zhang et.al. 2506.10762 null
2025-06-12 Grounded Vision-Language Navigation for UAVs with Open-Vocabulary Goal Understanding Yuhang Zhang et.al. 2506.10756 null
2025-06-11 DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos Chieh Hubert Lin et.al. 2506.09997 null
2025-06-11 Locomotion on Constrained Footholds via Layered Architectures and Model Predictive Control Zachary Olkin et.al. 2506.09979 null
2025-06-11 SRLAgent: Enhancing Self-Regulated Learning Skills through Gamification and LLM Assistance Wentao Ge et.al. 2506.09968 null
2025-06-11 Mechanism of Conductivity Enhancement of Polymers Employing Microbubble Lithography Anand Dev Ranjan et.al. 2506.09957 null
2025-06-11 Microservices and Real-Time Processing in Retail IT: A Review of Open-Source Toolchains and Deployment Strategies Aaditaa Vashisht et.al. 2506.09938 null
2025-06-11 Repeated ancilla reuse for logical computation on a neutral atom quantum computer J. A. Muniz et.al. 2506.09936 null
2025-06-11 TransGI: Real-Time Dynamic Global Illumination With Object-Centric Neural Transfer Model Yijie Deng et.al. 2506.09909 null
2025-06-11 Machine Learning-Based Classification of Oils Using Dielectric Properties and Microwave Resonant Sensing Amit Baran Dey et.al. 2506.09867 null
2025-06-11 Multi-FPGA Synchronization and Data Communication for Quantum Control and Measurement Yilun Xu et.al. 2506.09856 null
2025-06-11 Advancing Exchange Rate Forecasting: Leveraging Machine Learning and AI for Enhanced Accuracy in Global Financial Markets Md. Yeasin Rahat et.al. 2506.09851 null
2025-06-11 Learning Quality from Complexity and Structure: A Feature-Fused XGBoost Model for Video Quality Assessment Amritha Premkumar et.al. 2506.09795 null
2025-06-11 Human-robot collaborative transport personalization via Dynamic Movement Primitives and velocity scaling Paolo Franceschi et.al. 2506.09697 null
2025-06-11 Searching for sub-TeV IceCube neutrinos correlated to sub-threshold GW events Tista Mukherjee et.al. 2506.09694 null
2025-06-11 Early and Accurate Recession Detection Using Classifiers on the Anticipation-Precision Frontier Pascal Michaillat et.al. 2506.09664 null
2025-06-11 Real-Time Network Traffic Forecasting with Missing Data: A Generative Model Approach Lei Deng et.al. 2506.09647 null
2025-06-11 VAULT: A Mobile Mapping System for ROS 2-based Autonomous Robots Miguel Á. González-Santamarta et.al. 2506.09583 null
2025-06-11 Real-time adaptive tracking of fluctuating relaxation rates in superconducting qubits Fabrizio Berritta et.al. 2506.09576 null
2025-06-11 HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene Jianing Chen et.al. 2506.09518 null
2025-06-11 A Survey on the Role of Artificial Intelligence and Machine Learning in 6G-V2X Applications Donglin Wang et.al. 2506.09512 null
2025-06-11 ArcNeural: A Multi-Modal Database for the Gen-AI Era Wu Min et.al. 2506.09467 null
2025-06-10 Rapid cardiac activation prediction for cardiac resynchronization therapy planning using geometric deep learning Ehsan Naghavi et.al. 2506.08987 link
2025-06-10 Online Learning Control Strategies for Industrial Processes with Application for Loosening and Conditioning Yue Wu et.al. 2506.08983 null
2025-06-10 Rethinking Range-View LiDAR Segmentation in Adverse Weather Longyu Yang et.al. 2506.08979 null
2025-06-10 Yau-YauAL: A computer tool for solving nonlinear filtering problems Yu Wang et.al. 2506.08976 null
2025-06-10 WIP: Large Language Model-Enhanced Smart Tutor for Undergraduate Circuit Analysis Liangliang Chen et.al. 2506.08962 null
2025-06-10 CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks Yixuan Li et.al. 2506.08931 null
2025-06-10 Implementing Keyword Spotting on the MCUX947 Microcontroller with Integrated NPU Petar Jakuš et.al. 2506.08911 null
2025-06-10 HabSim: Architecture for modelling disruptions, propagation, detection and repair in deep space habitats Luca Vaccino et.al. 2506.08903 null
2025-06-10 Real-Time Cascade Mitigation in Power Systems Using Influence Graph Improved by Reinforcement Learning Kai Zhou et.al. 2506.08893 null
2025-06-10 Help or Hindrance: Understanding the Impact of Robot Communication in Action Teams Tauhid Tanjim et.al. 2506.08892 null
2025-06-10 SmartAttack: Air-Gap Attack via Smartwatches Mordechai Guri et.al. 2506.08866 null
2025-06-10 StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams Zike Wu et.al. 2506.08862 link
2025-06-10 Fast Estimation of Globally Optimal Independent Contact Regions for Robust Grasping and Manipulation Jonathan P. King et.al. 2506.08856 null
2025-06-10 Agile Reinforcement Learning for Real-Time Task Scheduling in Edge Computing Amin Avan et.al. 2506.08850 link
2025-06-10 FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency Yifei Su et.al. 2506.08822 null
2025-06-10 Enhancing Synthetic CT from CBCT via Multimodal Fusion: A Study on the Impact of CBCT Quality and Alignment Maximilian Tschuchnig et.al. 2506.08716 null
2025-06-10 Balancing Fixed Number of Nodes Among Multiple Fixed Clusters Paritosh Ranjan et.al. 2506.08715 null
2025-06-10 Industrial Flexibility Investment Under Uncertainty: A Multi-Stage Stochastic Framework Considering Energy and Reserve Market Participation Amund Norland et.al. 2506.08638 null
2025-06-10 Plug-and-Play Linear Attention for Pre-trained Image and Video Restoration Models Srinivasan Kidambi et.al. 2506.08520 link
2025-06-10 One Patch to Rule Them All: Transforming Static Patches into Dynamic Attacks in the Physical World Xingshuo Han et.al. 2506.08482 null
2025-06-10 Silencing Empowerment, Allowing Bigotry: Auditing the Moderation of Hate Speech on Twitch Prarabdh Shukla et.al. 2506.07667 link
2025-06-09 Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion Xun Huang et.al. 2506.08009 null
2025-06-09 MADFormer: Mixed Autoregressive and Diffusion Transformers for Continuous Image Generation Junhao Chen et.al. 2506.07999 null
2025-06-09 Unraveling Ethereum's Mempool: The Impact of Fee Fairness, Transaction Prioritization, and Consensus Efficiency S M Mostaq Hossain et.al. 2506.07988 null
2025-06-09 Real-time Localization of a Soccer Ball from a Single Camera Dmitrii Vorobev et.al. 2506.07981 null
2025-06-09 Low-Complexity Super-Resolution Signature Estimation of XL-MIMO FMCW Radar Chandrashekhar Rai et.al. 2506.07979 null
2025-06-09 Predicting Situation Awareness from Physiological Signals Kieran J. Smith et.al. 2506.07930 null
2025-06-09 LUCIFER: Language Understanding and Context-Infused Framework for Exploration and Behavior Refinement Dimitris Panagopoulos et.al. 2506.07915 null
2025-06-09 GaussianVAE: Adaptive Learning Dynamics of 3D Gaussians for High-Fidelity Super-Resolution Shuja Khalid et.al. 2506.07897 null
2025-06-09 CrosswalkNet: An Optimized Deep Learning Framework for Pedestrian Crosswalk Detection in Aerial Images with High-Performance Computing Zubin Bhuyan et.al. 2506.07885 null
2025-06-09 Spatio-Temporal State Space Model For Efficient Event-Based Optical Flow Muhammad Ahmed Humais et.al. 2506.07878 link
2025-06-09 Egocentric Event-Based Vision for Ping Pong Ball Trajectory Prediction Ivan Alberico et.al. 2506.07860 link
2025-06-09 SAM2Auto: Auto Annotation Using FLASH Arash Rocky et.al. 2506.07850 null
2025-06-09 R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation William Ljungbergh et.al. 2506.07826 null
2025-06-09 On-The-Fly Symbolic Algorithm for Timed ATL with Abstractions Nicolaj Ø. Jensen et.al. 2506.07802 null
2025-06-09 Novel software for continuous wavelet analysis enable EEG real-time analysis on portable computers Shoichiro Nakanishi et.al. 2506.07793 null
2025-06-09 Language-Vision Planner and Executor for Text-to-Visual Reasoning Yichang Xu et.al. 2506.07778 null
2025-06-09 ETA: Efficiency through Thinking Ahead, A Dual Approach to Self-Driving with Large Models Shadi Hamdan et.al. 2506.07725 null
2025-06-09 CommSense: A Rapid and Accurate ISAC Paradigm Sandip Jana et.al. 2506.07685 null
2025-06-09 QUITE: A Query Rewrite System Beyond Rules with LLM Agents Yuyang Song et.al. 2506.07675 null
2025-06-06 RecGPT: A Foundation Model for Sequential Recommendation Yangqin Jiang et.al. 2506.06270 link
2025-06-06 Integrating Complexity and Biological Realism: High-Performance Spiking Neural Networks for Breast Cancer Detection Zofia Rudnicka et.al. 2506.06265 null
2025-06-06 Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens Jihwan Jeong et.al. 2506.06261 null
2025-06-06 From NLVO to NAO: Reactive Robot Navigation using Velocity and Acceleration Obstacles Asher Stern et.al. 2506.06255 null
2025-06-06 PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time Weizhi Zhang et.al. 2506.06254 null
2025-06-06 Correlated Structural and Optical Characterization during Van der Waals Epitaxy of PbI2 on Graphene C. P. Sonny Tsotezem et.al. 2506.06241 null
2025-06-06 Initial stage jet momentum broadening in tBLFQ formalism Dana Avramescu et.al. 2506.06206 null
2025-06-06 NAT: Neural Acoustic Transfer for Interactive Scenes in Real Time Xutong Jin et.al. 2506.06190 null
2025-06-06 Physics-Informed Neural Networks for Control of Single-Phase Flow Systems Governed by Partial Differential Equations Luis Kin Miyatake et.al. 2506.06188 null
2025-06-06 Technical Report for Egocentric Mistake Detection for the HoloAssist Challenge Constantin Patsch et.al. 2506.06174 null
2025-06-06 Stream DaQ: Stream-First Data Quality Monitoring Vasileios Papastergios et.al. 2506.06147 link
2025-06-06 On the Suitability of Wi-Fi for Interconnecting Moving Equipment in Industrial Environments Pietro Chiavassa et.al. 2506.06074 null
2025-06-06 Conversational Interfaces for Parametric Conceptual Architectural Design: Integrating Mixed Reality with LLM-driven Interaction Ruochen Ji et.al. 2506.06066 null
2025-06-06 Direct Integration of Recursive Gaussian Process Regression Into Extended Kalman Filters With Application to Vapor Compression Cycle Control Ricus Husmann et.al. 2506.06065 null
2025-06-06 Enhanced Trust Region Sequential Convex Optimization for Multi-Drone Thermal Screening Trajectory Planning in Urban Environments Kaiyuan Chen et.al. 2506.06012 link
2025-06-06 MOGO: Residual Quantized Hierarchical Causal Transformer for High-Quality and Real-Time 3D Human Motion Generation Dongjie Fu et.al. 2506.05952 null
2025-06-06 Neural Visibility Cache for Real-Time Light Sampling Jakub Bokšanský et.al. 2506.05930 null
2025-06-06 Proactive Assistant Dialogue Generation from Streaming Egocentric Videos Yichi Zhang et.al. 2506.05904 null
2025-06-06 The Online Data Filter for the KM3NeT Neutrino Telescopes O. Adriani et.al. 2506.05881 null
2025-06-06 Towards Next-Generation Intelligent Maintenance: Collaborative Fusion of Large and Small Models Xiaoyi Yuan et.al. 2506.05854 null
2025-06-06 FreeTimeGS: Free Gaussian Primitives at Anytime and Anywhere for Dynamic Scene Reconstruction Yifan Wang et.al. 2506.05348 null
2025-06-05 SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs Jiahui Wang et.al. 2506.05344 link
2025-06-05 Generalizable, real-time neural decoding with hybrid state-space models Avery Hee-Woon Ryoo et.al. 2506.05320 null
2025-06-05 Fast-DataShapley: Neural Modeling for Training Data Valuation Haifeng Sun et.al. 2506.05281 null
2025-06-05 Vision-Based Autonomous MM-Wave Reflector Using ArUco-Driven Angle-of-Arrival Estimation Josue Marroquin et.al. 2506.05195 null
2025-06-05 Noise-Driven AI Sensors: Secure Healthcare Monitoring with PUFs Christiana Chamon et.al. 2506.05135 null
2025-06-05 EDEN: Efficient Dual-Layer Exploration Planning for Fast UAV Autonomous Exploration in Large 3-D Environments Qianli Dong et.al. 2506.05106 link
2025-06-05 Cloud-Based Interoperability in Residential Energy Systems Darren Leniston et.al. 2506.05076 null
2025-06-05 PulseRide: A Robotic Wheelchair for Personalized Exertion Control with Human-in-the-Loop Reinforcement Learning Azizul Zahid et.al. 2506.05056 null
2025-06-05 Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic Reasoning Mehdi Azarafza et.al. 2506.04998 null
2025-06-05 En Route Path-planning for Partially Occupied Vehicles in Ride-pooling Systems Pengbo Zhu et.al. 2506.04968 null
2025-06-05 Organic Crystal Active Waveguide as an All-Angle Signal Receiver and Transmission Platform for Visible Light Communication Ankur Khapre et.al. 2506.04874 null
2025-06-05 Beyond the Desktop: XR-Driven Segmentation with Meta Quest 3 and MX Ink Lisle Faray de Paiva et.al. 2506.04858 null
2025-06-05 Deep learning image burst stacking to reconstruct high-resolution ground-based solar observations Christoph Schirninger et.al. 2506.04781 null
2025-06-05 A high-sensitivity frequency counter for free-induction-decay signals Tong Gong et.al. 2506.04780 null
2025-06-05 Tire Wear Aware Trajectory Tracking Control for Multi-axle Swerve-drive Autonomous Mobile Robots Tianxin Hu et.al. 2506.04752 null
2025-06-05 SRD: Reinforcement-Learned Semantic Perturbation for Backdoor Defense in VLMs Shuhan Xu et.al. 2506.04743 null
2025-06-05 Real-Time LPV-Based Non-Linear Model Predictive Control for Robust Trajectory Tracking in Autonomous Vehicles Nitish Kumar et.al. 2506.04684 null
2025-06-05 Application of SDRE to Achieve Gait Control in a Bipedal Robot for Knee-Type Exoskeleton Testing Ping-Kong Huang et.al. 2506.04680 null
2025-06-05 Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation Yuyang Wanyan et.al. 2506.04614 null
2025-06-05 Construction of Urban Greenland Resources Collaborative Management Platform Dongyang Lyu et.al. 2506.03830 null
2025-06-05 MambaNeXt-YOLO: A Hybrid State Space Model for Real-time Object Detection Xiaochun Lei et.al. 2506.03654 null
2025-06-04 MS-YOLO: A Multi-Scale Model for Accurate and Efficient Blood Cell Detection Guohua Wu et.al. 2506.03972 null
2025-06-04 FPGA-Enabled Machine Learning Applications in Earth Observation: A Systematic Review Cédric Léonard et.al. 2506.03938 link
2025-06-04 Forecasting Seasonal Influenza Epidemics with Physics-Informed Neural Networks Martina Rama et.al. 2506.03897 null
2025-06-04 JointSplat: Probabilistic Joint Flow-Depth Optimization for Sparse-View Gaussian Splatting Yang Xiao et.al. 2506.03872 null
2025-06-04 Frame-Level Real-Time Assessment of Stroke Rehabilitation Exercises from Video-Level Labeled Data: Task-Specific vs. Foundation Models Gonçalo Mesquita et.al. 2506.03752 null
2025-06-04 Probabilistic measures afford fair comparisons of AIWP and NWP model output Tilmann Gneiting et.al. 2506.03744 link
2025-06-04 Accelerating SfM-based Pose Estimation with Dominating Set Joji Joseph et.al. 2506.03667 null
2025-06-04 Analyzing Transformer Models and Knowledge Distillation Approaches for Image Captioning on Edge AI Wing Man Casca Kwok et.al. 2506.03607 null
2025-06-04 SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting Shengjie Lin et.al. 2506.03594 link
2025-06-04 SwitchVLA: Execution-Aware Task Switching for Vision-Language-Action Models Meng Li et.al. 2506.03574 null
2025-06-04 Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments Reo Yoneyama et.al. 2506.03554 null
2025-06-04 A Threat Intelligence Event Extraction Conceptual Model for Cyber Threat Intelligence Feeds Jamal H. Al-Yasiri et.al. 2506.03551 null
2025-06-04 Topology-Aware Graph Neural Network-based State Estimation for PMU-Unobservable Power Systems Shiva Moshtagh et.al. 2506.03493 null
2025-06-04 Adaptive Configuration Selection for Multi-Model Inference Pipelines in Edge Computing Jinhao Sheng et.al. 2506.02814 null
2025-06-04 Voyager: Real-Time Splatting City-Scale 3D Gaussians on Your Phone Zheng Liu et.al. 2506.02774 null
2025-06-03 StARS DCM: A Sleep Stage-Decoding Forehead EEG Patch for Real-time Modulation of Sleep Physiology William G. Coon et.al. 2506.03442 null
2025-06-03 Online multi-layer FDR control Runqiu Wang et.al. 2506.03406 null
2025-06-03 A Multimodal, Multilingual, and Multidimensional Pipeline for Fine-grained Crowdsourcing Earthquake Damage Evaluation Zihui Ma et.al. 2506.03360 link
2025-06-03 Spatial Association Between Near-Misses and Accident Blackspots in Sydney, Australia: A Getis-Ord $G_i^*$ Analysis Artur Grigorev et.al. 2506.03356 null
2025-06-03 Structural Vibration Monitoring with Diffractive Optical Processors Yuntian Wang et.al. 2506.03317 null
2025-06-03 TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models Chetwin Low et.al. 2506.03099 null
2025-06-03 InterMamba: Efficient Human-Human Interaction Generation with Adaptive Spatio-Temporal Mamba Zizhao Wu et.al. 2506.03084 null
2025-06-03 LEG-SLAM: Real-Time Language-Enhanced Gaussian Splatting for SLAM Roman Titkov et.al. 2506.03073 null
2025-06-03 Diffusion Buffer: Online Diffusion-based Speech Enhancement with Sub-Second Latency Bunlong Lay et.al. 2506.02908 null
2025-06-03 When Blockchain Meets Crawlers: Real-time Market Analytics in Solana NFT Markets Chengxin Shen et.al. 2506.02892 null
2025-06-03 OpenFace 3.0: A Lightweight Multitask System for Comprehensive Facial Behavior Analysis Jiewen Hu et.al. 2506.02891 null
2025-06-03 CLONE: Customizing LLMs for Efficient Latency-Aware Inference at the Edge Chunlin Tian et.al. 2506.02847 null
2025-06-03 Process Mining on Distributed Data Sources Maximilian Weisenseel et.al. 2506.02830 null
2025-06-03 Target Sensing Performance in Disaster-Specific ISAC Networks Ahmet Burak Ozyurt et.al. 2506.02828 null
2025-06-03 AI-Driven Vehicle Condition Monitoring with Cell-Aware Edge Service Migration Charalampos Kalalas et.al. 2506.02785 null
2025-06-03 SAMJ: Fast Image Annotation on ImageJ/Fiji via Segment Anything Model Carlos Garcia-Lopez-de-Haro et.al. 2506.02783 null
2025-06-03 RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS Chuanyu Fu et.al. 2506.02751 null
2025-06-03 Collective Intelligence Outperforms Individual Talent: A Case Study in League of Legends Angelo Josey Caldeira et.al. 2506.02706 null
2025-06-03 A Pretrained Probabilistic Transformer for City-Scale Traffic Volume Prediction Shiyu Shen et.al. 2506.02654 null
2025-06-03 From Prompts to Protection: Large Language Model-Enabled In-Context Learning for Smart Public Safety UAV Yousef Emami et.al. 2506.02649 null
2025-06-03 Phase Topology Stability of an Optical Vortex via an Electrically Controlled Twist-Planar Oriented Liquid Crystal Fresnel Lens Elena Melnikova et.al. 2506.02632 null
2025-06-03 HORUS: A Mixed Reality Interface for Managing Teams of Mobile Robots Omotoye Shamsudeen Adekoya et.al. 2506.02622 null
2025-06-03 Hierarchical Question-Answering for Driving Scene Understanding Using Vision-Language Models Safaa Abdullahi Moallim Mohamud et.al. 2506.02615 null
2025-05-30 PB&J: Peanut Butter and Joints for Damped Articulation Avery S. Williamson et.al. 2505.24860 link
2025-05-30 Don't Reinvent the Wheel: Efficient Instruction-Following Text Embedding based on Guided Space Transformation Yingchaojie Feng et.al. 2505.24754 link
2025-05-30 Neural Network-based Universal Formulas for Control Pol Mestres et.al. 2505.24744 null
2025-05-30 Efficient Text Encoders for Labor Market Analysis Jens-Joris Decorte et.al. 2505.24640 null
2025-05-30 Co-designed Quantum Discrete Adiabatic Linear System Solver Via Dynamic Circuits Boxuan Ai et.al. 2505.24626 null
2025-05-30 Frequency-Domain Joint Monitoring of Differential Group Delay and Dependent Loss of Optical Singleand Few-Mode Fiber Channels Based on CAZAC Sequences Linsheng Fan et.al. 2505.24589 null
2025-05-30 Fine-tuning for Data-enabled Predictive Control of Noisy Systems by Reinforcement Learning Jinbao Wang et.al. 2505.24572 null
2025-05-30 Airborne Neural Network Paritosh Ranjan et.al. 2505.24513 null
2025-05-30 How can AI reduce fall injuries in the workplace? Nicholas Cartocci et.al. 2505.24507 null
2025-05-30 Enhancing the Accuracy of Spatio-Temporal Models for Wind Speed Prediction by Incorporating Bias-Corrected Crowdsourced Data Eamonn Organ et.al. 2505.24506 link
2025-05-30 Real-time Fall Prevention system for the Next-generation of Workers Nicholas Cartocci et.al. 2505.24487 null
2025-05-30 Boosting Automatic Exercise Evaluation Through Musculoskeletal Simulation-Based IMU Data Augmentation Andreas Spilz et.al. 2505.24415 null
2025-05-30 SAH-Drive: A Scenario-Aware Hybrid Planner for Closed-Loop Vehicle Trajectory Generation Yuqi Fan et.al. 2505.24390 link
2025-05-30 Spatiotemporal Analysis of Forest Machine Operations Using 3D Video Classification Maciej Wielgosz et.al. 2505.24375 null
2025-05-30 A Novel Coronary Artery Registration Method Based on Super-pixel Particle Swarm Optimization Peng Qi et.al. 2505.24351 null
2025-05-30 A 3D Mobile Crowdsensing Framework for Sustainable Urban Digital Twins Taku Yamazaki et.al. 2505.24348 null
2025-05-30 DTR: Delaunay Triangulation-based Racing for Scaled Autonomous Racing Luca Tognoni et.al. 2505.24320 null
2025-05-30 A Novel Discrete Memristor-Coupled Heterogeneous Dual-Neuron Model and Its Application in Multi-Scenario Image Encryption Yi Zou et.al. 2505.24294 null
2025-05-30 Proactive Guidance of Multi-Turn Conversation in Industrial Search Xiaoyu Li et.al. 2505.24251 null
2025-05-30 MOPSA: Mixture of Prompt-Experts Based Speaker Adaptation for Elderly Speech Recognition Chengxi Deng et.al. 2505.24224 null
2025-05-29 AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views Lihan Jiang et.al. 2505.23716 null
2025-05-29 From Connectivity to Autonomy: The Dawn of Self-Evolving Communication Systems Zeinab Nezami et.al. 2505.23710 null
2025-05-29 Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better Danny Driess et.al. 2505.23705 null
2025-05-29 DiCoFlex: Model-agnostic diverse counterfactuals with flexible control Oleksii Furman et.al. 2505.23700 null
2025-05-29 Errors in Stereo Geometry Induce Distance Misperception Raffles Xingqi Zhu et.al. 2505.23685 null
2025-05-29 Differentially Private Space-Efficient Algorithms for Counting Distinct Elements in the Turnstile Model Rachel Cummings et.al. 2505.23682 null
2025-05-29 Performance Analysis of Wireless Communication Systems Assisted by Fluid Reconfigurable Intelligent Surfaces Farshad Rostami Ghadi et.al. 2505.23680 null
2025-05-29 Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model Qingyu Shi et.al. 2505.23606 link
2025-05-29 MAPLE: A Mobile Assistant with Persistent Finite State Machines for Recovery Reasoning Linqiang Guo et.al. 2505.23596 null
2025-05-29 Scalable decoding protocols for fast transversal logic in the surface code Mark L. Turner et.al. 2505.23567 null
2025-05-29 The CASE Framework -- A New Architecture for Participatory Research and Digital Health Surveillance Marco Hirsch et.al. 2505.23516 null
2025-05-29 DeepFilterGAN: A Full-band Real-time Speech Enhancement System with GAN-based Stochastic Regeneration Sanberk Serbest et.al. 2505.23515 null
2025-05-29 Agentic Robot: A Brain-Inspired Framework for Vision-Language-Action Models in Embodied Agents Zhejian Yang et.al. 2505.23450 null
2025-05-29 Enhanced DACER Algorithm with High Diffusion Efficiency Yinuo Wang et.al. 2505.23426 null
2025-05-29 When water phase matters: its effect on the stopping cross section for proton therapy and astrophysics F. Matias et.al. 2505.23396 null
2025-05-29 CF-DETR: Coarse-to-Fine Transformer for Real-Time Object Detection Woojin Shin et.al. 2505.23317 null
2025-05-29 Investigating A Geometrical Solution to the Vergence-Accommodation Conflict for Targeted Movements in Virtual Reality Xiaoye Michael Wang et.al. 2505.23310 null
2025-05-29 MathArena: Evaluating LLMs on Uncontaminated Math Competitions Mislav Balunović et.al. 2505.23281 link
2025-05-29 Wireless Agentic AI with Retrieval-Augmented Multimodal Semantic Perception Guangyuan Liu et.al. 2505.23275 null
2025-05-29 Context-Aware Semantic Communication for the Wireless Networks Guangyuan Liu et.al. 2505.23249 null
2025-05-29 The Meeseeks Mesh: Spatially Consistent 3D Adversarial Objects for BEV Detector Aixuan Li et.al. 2505.22499 null
2025-05-29 YH-MINER: Multimodal Intelligent System for Natural Ecological Reef Metric Extraction Mingzhuang Wang et.al. 2505.22250 null
2025-05-28 VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models Ce Zhang et.al. 2505.22654 null
2025-05-28 VR-Based Control of Multi-Copter Operation Jack T. Hughes et.al. 2505.22599 null
2025-05-28 A Graph-Based Laser Path Solver Algorithm for Virtual Reality Laboratory Simulations Andreas Müller et.al. 2505.22540 null
2025-05-28 AI instructional agent improves student's perceived learner control and learning outcome: empirical evidence from a randomized controlled trial Fei Qin et.al. 2505.22526 null
2025-05-28 CPINN-ABPI: Physics-Informed Neural Networks for Accurate Power Estimation in MPSoCs Mohamed R. Elshamy et.al. 2505.22469 null
2025-05-28 STDR: Spatio-Temporal Decoupling for Real-Time Dynamic Scene Rendering Zehao Li et.al. 2505.22400 null
2025-05-28 Learning to Pursue AC Optimal Power Flow Solutions with Feasibility Guarantees Damola Ajeyemi et.al. 2505.22399 null
2025-05-28 UP-SLAM: Adaptively Structured Gaussian SLAM with Uncertainty Prediction in Dynamic Environments Wancai Zheng et.al. 2505.22335 null
2025-05-28 Versatile Cardiovascular Signal Generation with a Unified Diffusion Transformer Zehua Chen et.al. 2505.22306 null
2025-05-28 Two-stage Audio-Visual Target Speaker Extraction System for Real-Time Processing On Edge Device Zixuan Li et.al. 2505.22229 null
2025-05-28 ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation Jiawen Yu et.al. 2505.22159 null
2025-05-28 Streaming Remote rendering services: Comparison of QUIC-based and WebRTC Protocols Daniel Mejías et.al. 2505.22132 null
2025-05-28 Real-Time Blind Defocus Deblurring for Earth Observation: The IMAGIN-e Mission Approach Alejandro D. Mousist et.al. 2505.22128 null
2025-05-28 Leveraging 5G Physical Layer Monitoring for Adaptive Remote Rendering in XR Applications Inhar Yeregui et.al. 2505.22123 null
2025-05-28 A simulation framework for autonomous lunar construction work Mattias Linde et.al. 2505.22091 null
2025-05-28 High Volume Rate 3D Ultrasound Reconstruction with Diffusion Models Tristan S. W. Stevens et.al. 2505.22090 null
2025-05-28 Cognitively-Inspired Emergent Communication via Knowledge Graphs for Assisting the Visually Impaired Ruxiao Chen et.al. 2505.22087 null
2025-05-28 On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition Shujie HU et.al. 2505.22072 null
2025-05-27 Visual Product Graph: Bridging Visual Products And Composite Images For End-to-End Style Recommendations Yue Li Du et.al. 2505.21454 null
2025-05-27 Hume: Introducing System-2 Thinking in Visual-Language-Action Model Haoming Song et.al. 2505.21432 null
2025-05-27 Autonomous Multi-Modal LLM Agents for Treatment Planning in Focused Ultrasound Ablation Surgery Lina Zhao et.al. 2505.21418 null
2025-05-27 AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs Xuanwen Ding et.al. 2505.21389 link
2025-05-27 A first look at ROS~2 applications written in asynchronous Rust Martin Škoudlil et.al. 2505.21323 null
2025-05-27 Assured Autonomy with Neuro-Symbolic Perception R. Spencer Hallyburton et.al. 2505.21322 null
2025-05-27 Data-Driven Cellular Mobility Management via Bayesian Optimization and Reinforcement Learning Mohamed Benzaghta et.al. 2505.21249 null
2025-05-27 Towards Quantum Simulation of Meson Scattering in a Z2 Lattice Gauge Theory Yahui Chai et.al. 2505.21240 null
2025-05-27 3D-UIR: 3D Gaussian for Underwater 3D Scene Reconstruction via Physics-Based Appearance-Medium Decouplin Jieyu Yuan et.al. 2505.21238 null
2025-05-27 Think Twice, Act Once: Token-Aware Compression and Action Reuse for Efficient Inference in Vision-Language-Action Models Xudong Tan et.al. 2505.21200 null
2025-05-27 Constructive community race: full-density spiking neural network model drives neuromorphic computing Johanna Senk et.al. 2505.21185 null
2025-05-27 Hybrid Machine Learning and Mathematical Modeling for Tumor Dynamics Prediction: Comparing SPIONs against mNP-FDG Amit K Chattopadhyay et.al. 2505.21094 null
2025-05-27 All-optical discrete illumination-based compressed ultrafast photography Long Cheng et.al. 2505.21086 null
2025-05-27 Modeling of Water Evaporation in Hydrogels from Aspect of Mechanical Analytics Zehua Yu et.al. 2505.21075 null
2025-05-27 Nonreciprocal and long-range three-body interactions in Bose-Einstein condensates induced by optical feedback Yi-Qing Zhang et.al. 2505.21044 null
2025-05-27 CityGo: Lightweight Urban Modeling and Rendering with Proxy Buildings and Residual Gaussians Weihang Liu et.al. 2505.21041 null
2025-05-27 ClearSphere: Multi-Earphone Synergy for Enhanced Conversational Clarity Lixing He et.al. 2505.21004 null
2025-05-27 CNN-Based Channel Map Estimation for Movable Antenna Systems Yitai Huang et.al. 2505.21001 null
2025-05-27 SCALOFT: An Initial Approach for Situation Coverage-Based Safety Analysis of an Autonomous Aerial Drone in a Mine Environment Nawshin Mannan Proma et.al. 2505.20969 null
2025-05-27 YOLO-FireAD: Efficient Fire Detection via Attention-Guided Inverted Residual Learning and Dual-Pooling Feature Preservation Weichao Pan et.al. 2505.20884 null
2025-05-26 Understanding and Supporting Co-viewing Comedy in VR with Embodied Expressive Avatars Ryo Ohara et.al. 2505.20082 null
2025-05-26 M3DHMR: Monocular 3D Hand Mesh Recovery Yihong Lin et.al. 2505.20058 null
2025-05-26 Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion Zheqi Lv et.al. 2505.20053 link
2025-05-26 Uncertainty-Aware Attention Heads: Efficient Unsupervised Uncertainty Quantification for LLMs Artem Vazhentsev et.al. 2505.20045 null
2025-05-26 Optimizing edge AI models on HPC systems with the edge in the loop Marcel Aach et.al. 2505.19995 link
2025-05-26 A Cooperative Aerial System of A Payload Drone Equipped with Dexterous Rappelling End Droid for Cluttered Space Pickup Wenjing Ren et.al. 2505.19980 null
2025-05-26 Dynamically Learned Test-Time Model Routing in Language Model Zoos with Service Level Guarantees Herbert Woisetschläger et.al. 2505.19947 null
2025-05-26 Weather-Magician: Reconstruction and Rendering Framework for 4D Weather Synthesis In Real Time Chen Sang et.al. 2505.19919 null
2025-05-26 EMAC+: Embodied Multimodal Agent for Collaborative Planning with VLM+LLM Shuang Ao et.al. 2505.19905 null
2025-05-26 Adaptive Indexing for Approximate Query Processing in Exploratory Data Analysis Stavros Maroulis et.al. 2505.19872 null
2025-05-26 PCDCNet: A Surrogate Model for Air Quality Forecasting with Physical-Chemical Dynamics and Constraints Shuo Wang et.al. 2505.19842 link
2025-05-26 A Cost-efficient Credit-Based Shaper Deployment Framework for Time-Sensitive Networks Santiago Torres-Borda et.al. 2505.19771 null
2025-05-26 GeoPF: Infusing Geometry into Potential Fields for Reactive Planning in Non-trivial Environments Yuhe Gong et.al. 2505.19688 null
2025-05-26 A Fluorescent Material Model for Non-Spectral Editing & Rendering Belcour Laurent et.al. 2505.19672 null
2025-05-26 Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling Haiyang Sun et.al. 2505.19669 null
2025-05-26 Autonomous Flights inside Narrow Tunnels Luqi Wang et.al. 2505.19657 link
2025-05-26 Software Engineering for Self-Adaptive Robotics: A Research Agenda Shaukat Ali et.al. 2505.19629 null
2025-05-26 Indoor Air Quality Detection Robot Model Based on the Internet of Things (IoT) Anggiat Mora Simamora et.al. 2505.19600 link
2025-05-26 Situationally-Aware Dynamics Learning Alejandro Murillo-Gonzalez et.al. 2505.19574 null
2025-05-26 LLM-Agent-Controller: A Universal Multi-Agent Large Language Model System as a Control Engineer Rasoul Zahedifar et.al. 2505.19567 null
2025-05-23 VideoGameBench: Can Vision-Language Models complete popular video games? Alex L. Zhang et.al. 2505.18134 null
2025-05-23 ManuSearch: Democratizing Deep Search in Large Language Models with a Transparent and Open Multi-Agent Framework Lisheng Huang et.al. 2505.18105 link
2025-05-23 SHARDeg: A Benchmark for Skeletal Human Action Recognition in Degraded Scenarios Simon Malzard et.al. 2505.18048 null
2025-05-23 Clip4Retrofit: Enabling Real-Time Image Labeling on Edge Devices via Cross-Architecture CLIP Distillation Li Zhong et.al. 2505.18039 null
2025-05-23 Clinical Validation of Deep Learning for Real-Time Tissue Oxygenation Estimation Using Spectral Imaging Jens De Winne et.al. 2505.18010 null
2025-05-23 Re-evaluation of Logical Specification in Behavioural Verification Radoslaw Klimek et.al. 2505.17979 null
2025-05-23 Evolving Machine Learning: A Survey Ignacio Cabrera Martin et.al. 2505.17902 null
2025-05-23 Geometric Shape Modelling and Volume Estimation of Dry Bulk Cargo Piles using a Single Image Debanshu Ratha et.al. 2505.17896 null
2025-05-23 Toward Optimal ANC: Establishing Mutual Information Lower Bound François Derrida et.al. 2505.17877 null
2025-05-23 Light-Driven Bound State of Interacting Impurities in a Dirac-Like Bath Vinayak M. Kulkarni et.al. 2505.17811 null
2025-05-23 DialogXpert: Driving Intelligent and Emotion-Aware Conversations through Online Value-Based Reinforcement Learning with LLM Priors Tazeek Bin Abdur Rakib et.al. 2505.17795 null
2025-05-23 Real-time calibrations for future detectors at FAIR Valentin Kladov et.al. 2505.17781 null
2025-05-23 Sec5GLoc: Securing 5G Indoor Localization via Adversary-Resilient Deep Learning Architecture Ildi Alla et.al. 2505.17776 link
2025-05-23 HRSim: An agent-based simulation platform for high-capacity ride-sharing services Wang Chen et.al. 2505.17758 link
2025-05-23 Instruct2See: Learning to Remove Any Obstructions Across Distributions Junhang Li et.al. 2505.17649 null
2025-05-23 MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation Jihan Yao et.al. 2505.17613 null
2025-05-23 A Unified Multi-Scale Attention-Based Network for Automatic 3D Segmentation of Lung Parenchyma & Nodules In Thoracic CT Images Muhammad Abdullah et.al. 2505.17602 link
2025-05-23 JELAI: Integrating AI and Learning Analytics in Jupyter Notebooks Manuel Valle Torre et.al. 2505.17593 null
2025-05-23 Distance Estimation in Outdoor Driving Environments Using Phase-only Correlation Method with Event Cameras Masataka Kobayashi et.al. 2505.17582 null
2025-05-23 Direct Feature Access -- Scaling Network Traffic Feature Collection to Terabit Speed Lukas Froschauer et.al. 2505.17573 null
2025-05-22 Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models Junjie Xiong et.al. 2505.16957 null
2025-05-22 From Reality to Virtual Worlds: The Role of Photogrammetry in Game Development Santiago Berrezueta-Guzman et.al. 2505.16951 null
2025-05-22 Scalable and Interpretable Contextual Bandits: A Literature Review and Retail Offer Prototype Nikola Tankovic et.al. 2505.16918 null
2025-05-22 Identifying, Evaluating, and Mitigating Risks of AI Thought Partnerships Kerem Oktar et.al. 2505.16899 null
2025-05-22 FlashBack: Consistency Model-Accelerated Shared Autonomy Luzhe Sun et.al. 2505.16892 null
2025-05-22 Arbor-TVB: A Novel Multi-Scale Co-Simulation Framework with a Case Study on Neural-Level Seizure Generation and Whole-Brain Propagation Thorsten Hater et.al. 2505.16861 null
2025-05-22 Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate Hanglei Zhang et.al. 2505.16845 null
2025-05-22 SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving Xuesong Chen et.al. 2505.16805 null
2025-05-22 Detecting Fake News Belief via Skin and Blood Flow Signals Gennie Nguyen et.al. 2505.16730 null
2025-05-22 SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding Sushant Gautam et.al. 2505.16630 null
2025-05-22 Recursive Offloading for LLM Serving in Multi-tier Networks Zhiyuan Wu et.al. 2505.16502 link
2025-05-22 Human-like Semantic Navigation for Autonomous Driving using Knowledge Representation and Large Language Models Augusto Luis Ballardini et.al. 2505.16498 null
2025-05-22 InspectionV3: Enhancing Tobacco Quality Assessment with Deep Convolutional Neural Networks for Automated Workshop Management Yao Wei et.al. 2505.16485 null
2025-05-22 Beyond Static Testbeds: An Interaction-Centric Agent Simulation Platform for Dynamic Recommender Systems Song Jin et.al. 2505.16429 null
2025-05-22 Dynamic Caustics by Ultrasonically Modulated Liquid Surface Koki Nagakura et.al. 2505.16397 null
2025-05-22 Quantum-Driven Multihead Inland Waterbody Detection With Transformer-Encoded CYGNSS Delay-Doppler Map Data Chia-Hsiang Lin et.al. 2505.16391 null
2025-05-22 Observing dynamics of distinct structural transitions in trapped-ion clusters Akhil Ayyadevara et.al. 2505.16378 null
2025-05-22 Multimodal Generative AI for Story Point Estimation in Software Development Mohammad Rubyet Islam et.al. 2505.16290 null
2025-05-22 Energy Spectra of Secondary Particles Induced by Solar Energetic Proton Events and Magnetospheric Effects A. Chilingarian et.al. 2505.16269 null
2025-05-22 Interpretable Anomaly Detection in Encrypted Traffic Using SHAP with Machine Learning Models Kalindi Singh et.al. 2505.16261 null
2025-05-21 Direct Detection of Cosmic Walls with Paleo Detectors Wen Yin et.al. 2505.15764 null
2025-05-21 Majorana Zero Modes in a Heterogenous Structure of Topological and Trivial Domains in FeSe $_{1-x}$Te$_x$ Prashant Gupta et.al. 2505.15745 null
2025-05-21 iBitter-Stack: A Multi-Representation Ensemble Learning Model for Accurate Bitter Peptide Identification Sarfraz Ahmad et.al. 2505.15730 link
2025-05-21 Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model Ke Hu et.al. 2505.15670 null
2025-05-21 Lithium Intercalation in the Anisotropic van der Waals Magnetic Semiconductor CrSBr Kseniia Mosina et.al. 2505.15663 null
2025-05-21 Self-powered smart contact lenses: a multidisciplinary approach to micro-scale energy and 900 MHz - 1.1 GHz bandwidth microfabricated loop antennas communication systems Patrice Salzenstein et.al. 2505.15593 null
2025-05-21 VP Lab: a PEFT-Enabled Visual Prompting Laboratory for Semantic Segmentation Niccolo Avogaro et.al. 2505.15592 null
2025-05-21 Decreasing Utilization of Systems with Multi-Rate Cause-Effect Chains While Reducing End-to-End Latencies Luiz Maia et.al. 2505.15546 null
2025-05-21 Exploiting Age of Information in Network Digital Twins for AI-driven Real-Time Link Blockage Detection Michele Zhu et.al. 2505.15519 null
2025-05-21 AI-empowered Real-Time Line-of-Sight Identification via Network Digital Twins Michele Zhu et.al. 2505.15478 null
2025-05-21 FAV-NSS: An HIL Framework for Accelerating Validation of Automotive Network Security Strategies Changhong Li et.al. 2505.15393 null
2025-05-21 EVA: Expressive Virtual Avatars from Multi-view Videos Hendrik Junkawitsch et.al. 2505.15385 null
2025-05-21 Real-Time Detection of Insider Threats Using Behavioral Analytics and Deep Evidential Clustering Anas Ali et.al. 2505.15383 null
2025-05-21 RAZER: Robust Accelerated Zero-Shot 3D Open-Vocabulary Panoptic Reconstruction with Spatio-Temporal Aggregation Naman Patel et.al. 2505.15373 null
2025-05-21 AI vs. Human Judgment of Content Moderation: LLM-as-a-Judge and Ethics-Based Response Refusals Stefan Pasch et.al. 2505.15365 null
2025-05-21 Human in the Loop Adaptive Optimization for Improved Time Series Forecasting Malik Tiomoko et.al. 2505.15354 link
2025-05-21 Subgap pumping of antiferromagnetic Mott insulators: photoexcitation mechanisms and applications Radu Andrei et.al. 2505.15343 null
2025-05-21 High-Throughput Mechanical Characterization of Giant Unilamellar Vesicles by Real-Time Deformability Cytometry Maximilian Kloppe et.al. 2505.15341 null
2025-05-21 LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models Qianyue Hao et.al. 2505.15293 null
2025-05-21 LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval Zhenyu Ning et.al. 2505.15269 null
2025-05-20 Emerging Properties in Unified Multimodal Pretraining Chaorui Deng et.al. 2505.14683 null
2025-05-20 NExT-Search: Rebuilding User Feedback Ecosystem for Generative AI Search Sunhao Dai et.al. 2505.14680 null
2025-05-20 Beyond Words: Multimodal LLM Knows When to Speak Zikai Liao et.al. 2505.14654 null
2025-05-20 Separatrix configurations in holomorphic flows Nicolas Kainz et.al. 2505.14594 null
2025-05-20 Representation Learning for Semantic Alignment of Language, Audio, and Visual Modalities Parthasaarathy Sudarsanam et.al. 2505.14562 null
2025-05-20 Automated, Cross-Layer Root Cause Analysis of 5G Video-Conferencing Quality Degradation Fan Yi et.al. 2505.14540 null
2025-05-20 PAST: Phonetic-Acoustic Speech Tokenizer Nadav Har-Tuv et.al. 2505.14470 null
2025-05-20 Efficient Configuration-Constrained Tube MPC via Variables Restriction and Template Selection Filippo Badalamenti et.al. 2505.14440 link
2025-05-20 Two Empirical Studies on Audiovisual Semiotics of Uncertainty Sita Vriend et.al. 2505.14379 null
2025-05-20 Information-optimal measurement: From fixed sampling protocols to adaptive spectroscopy J. Schroeder et.al. 2505.14364 null
2025-05-20 Local Minima Prediction using Dynamic Bayesian Filtering for UGV Navigation in Unstructured Environments Seung Hun Lee et.al. 2505.14337 null
2025-05-20 Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach Umberto Cappellazzo et.al. 2505.14336 null
2025-05-20 Exploring Jailbreak Attacks on LLMs through Intent Concealment and Diversion Tiehan Cui et.al. 2505.14316 null
2025-05-20 Timely CPU Scheduling for Computation-intensive Status Updates Mengqiu Zhou et.al. 2505.14307 null
2025-05-20 SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors Maheep Chaudhary et.al. 2505.14300 null
2025-05-20 AquaSignal: An Integrated Framework for Robust Underwater Acoustic Analysis Eirini Panteli et.al. 2505.14285 null
2025-05-20 Hybrid Adaptive Modeling in Process Monitoring: Leveraging Sequence Encoders and Physics-Informed Neural Networks Mouad Elaarabi et.al. 2505.14252 null
2025-05-20 Visual Agentic Reinforcement Fine-Tuning Ziyu Liu et.al. 2505.14246 link
2025-05-20 Automatic Dataset Generation for Knowledge Intensive Question Answering Tasks Sizhe Yuen et.al. 2505.14212 null
2025-05-20 Dynamic Replanning for Improved Public Transport Routing Abdallah Abuaisha et.al. 2505.14193 null
2025-05-20 QSVM-QNN: Quantum Support Vector Machine Based Quantum Neural Network Learning Algorithm for Brain-Computer Interfacing Systems Bikash K. Behera et.al. 2505.14192 null
2025-05-20 Gaming Strategies in European Imbalance Settlement Mechanisms Seyed Soroush Karimi Madahi et.al. 2505.14133 null
2025-05-20 Place Recognition: A Comprehensive Review, Current Challenges and Future Directions Zhenyu Li et.al. 2505.14068 link
2025-05-19 Quantum Hardware-in-the-Loop for Optimal Power Flow in Renewable-Integrated Power Systems Zeynab Kaseb et.al. 2505.13356 null
2025-05-19 Approximating Global Contact-Implicit MPC via Sampling and Local Complementarity Sharanya Venkatesh et.al. 2505.13350 null
2025-05-19 Level Generation with Quantum Reservoir Computing João S. Ferreira et.al. 2505.13287 null
2025-05-19 MAGI-1: Autoregressive Video Generation at Scale Sand. ai et.al. 2505.13211 link
2025-05-19 Combinatorial Sample-and Back-Focal-Plane (BFP) Imaging. Pt. I: Instrument and acquisition parameters affecting BFP images and their analysis Omer Shavit et.al. 2505.13190 null
2025-05-19 ToolSpectrum : Towards Personalized Tool Utilization for Large Language Models Zihao Cheng et.al. 2505.13176 null
2025-05-19 A conformally mapped numerical wave tank supporting piston and flap wavemakers Andreas Holm Akselsen et.al. 2505.13154 null
2025-05-19 Ocean wave spectrum reconstruction from HF radar data and its application to wave height estimation Kaede Watanabe et.al. 2505.13132 null
2025-05-19 Constraint-Aware Diffusion Guidance for Robotics: Real-Time Obstacle Avoidance for Autonomous Racing Hao Ma et.al. 2505.13131 null
2025-05-19 Adaptive Image Restoration for Video Surveillance: A Real-Time Approach Muhammad Awais Amin et.al. 2505.13130 null
2025-05-19 Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation Guo Chen et.al. 2505.13094 null
2025-05-19 PPTNet: A Hybrid Periodic Pattern-Transformer Architecture for Traffic Flow Prediction and Congestion Identification Hongrui Kou et.al. 2505.13047 link
2025-05-19 Ultrafast Laser Induces Macroscopic Symmetry-Breaking of Diamond Color Centers Yang Gao et.al. 2505.12989 null
2025-05-19 Regularized Model Predictive Control Komeil Nosrati et.al. 2505.12977 null
2025-05-19 Fast, Not Fancy: Rethinking G2P with Rich Data and Rule-Based Models Mahta Fetrat Qharabagh et.al. 2505.12973 link
2025-05-19 Multiscale Adaptive Conflict-Balancing Model For Multimedia Deepfake Detection Zihan Xiong et.al. 2505.12966 null
2025-05-19 Effects of the Auto-Correlation of Delays on the Age of Information: A Gaussian Process Framework Atsushi Inoie et.al. 2505.12885 null
2025-05-19 Optimization of Hybrid Quantum-Classical Algorithms Lian Remme et.al. 2505.12853 null
2025-05-19 Reasoning BO: Enhancing Bayesian Optimization with Long-Context Reasoning Power of LLMs Zhuo Yang et.al. 2505.12833 null
2025-05-19 Rethinking Features-Fused-Pyramid-Neck for Object Detection Hulin Li et.al. 2505.12820 link
2025-05-16 msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML Zhaolan Huang et.al. 2505.11483 link
2025-05-16 REACT: Runtime-Enabled Active Collision-avoidance Technique for Autonomous Driving Heye Huang et.al. 2505.11474 null
2025-05-16 Learning Multimodal AI Algorithms for Amplifying Limited User Input into High-dimensional Control Space Ali Rabiee et.al. 2505.11366 link
2025-05-16 Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models Keunwoo Peter Yu et.al. 2505.11326 link
2025-05-16 Time-dependent Hole States in Multiconfigurational Time-Dependent Hartree-Fock Approaches: Applications in Photoionization of Water Molecule Zhao-Han Zhang et.al. 2505.11319 null
2025-05-16 Diffusion Learning with Partial Agent Participation and Local Updates Elsa Rizk et.al. 2505.11307 null
2025-05-16 MTevent: A Multi-Task Event Camera Dataset for 6D Pose Estimation and Moving Object Detection Shrutarv Awasthi et.al. 2505.11282 link
2025-05-16 Semantic Caching of Contextual Summaries for Efficient Question-Answering with Language Models Camille Couturier et.al. 2505.11271 null
2025-05-16 Learning traffic flows: Graph Neural Networks for Metamodelling Traffic Assignment Oskar Bohn Lassen et.al. 2505.11230 null
2025-05-16 Real-Time Verification of Embodied Reasoning for Generative Skill Acquisition Bo Yue et.al. 2505.11175 null
2025-05-16 Maximizing Asynchronicity in Event-based Neural Networks Haiqing Hao et.al. 2505.11165 null
2025-05-16 Sonification of entanglement dynamics in many-qubit systems Juliette Tudoce et.al. 2505.11159 null
2025-05-16 Open-Source Multi-Viewpoint Surgical Telerobotics Guido Caccianiga et.al. 2505.11142 null
2025-05-16 A Multi-modal Fusion Network for Terrain Perception Based on Illumination Aware Rui Wang et.al. 2505.11066 null
2025-05-16 Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment Benchmarking Changlun Li et.al. 2505.11065 link
2025-05-16 Leveraging Real-Time Data Analysis and Multiple Kernel Learning for Manufacturing of Innovative Steels Wolfgang Rannetbauer et.al. 2505.11024 null
2025-05-16 DRL-Based Injection Molding Process Parameter Optimization for Adaptive and Profitable Production Joon-Young Kim et.al. 2505.10988 null
2025-05-16 GROQLoco: Generalist and RObot-agnostic Quadruped Locomotion Control using Offline Datasets Narayanan PP et.al. 2505.10973 null
2025-05-16 Vaiage: A Multi-Agent Solution to Personalized Travel Planning Binwen Liu et.al. 2505.10922 null
2025-05-16 Automated Identification of Logical Errors in Programs: Advancing Scalable Analysis of Student Misconceptions Muntasir Hoq et.al. 2505.10913 null
2025-05-15 An AI-driven framework for the prediction of personalised health response to air pollution Nazanin Zounemat Kermani et.al. 2505.10556 null
2025-05-15 Real-Time Out-of-Distribution Failure Prevention via Multi-Modal Reasoning Milan Ganai et.al. 2505.10547 null
2025-05-15 LibIQ: Toward Real-Time Spectrum Classification in O-RAN dApps Filippo Olimpieri et.al. 2505.10537 link
2025-05-15 Internal State Estimation in Groups via Active Information Gathering Xuebo Ji et.al. 2505.10415 null
2025-05-15 Two-Stage Generative Model for Intracranial Aneurysm Meshes with Morphological Marker Conditioning Wenhao Ding et.al. 2505.10407 link
2025-05-15 Schreier-Coset Graph Propagation Aryan Mishra et.al. 2505.10392 null
2025-05-15 Arbitrarily Small Execution-Time Certificate: What was Missed in Analog Optimization Liang Wu et.al. 2505.10366 link
2025-05-15 FactsR: A Safer Method for Producing High Quality Healthcare Documentation Victor Petrén Bach Hansen et.al. 2505.10360 null
2025-05-15 Optimizing Electric Bus Charging Scheduling with Uncertainties Using Hierarchical Deep Reinforcement Learning Jiaju Qi et.al. 2505.10296 null
2025-05-15 From Questions to Clinical Recommendations: Large Language Models Driving Evidence-Based Clinical Decision Making Dubai Li et.al. 2505.10282 link
2025-05-15 AttentionGuard: Transformer-based Misbehavior Detection for Secure Vehicular Platoons Hexu Li et.al. 2505.10273 null
2025-05-15 Defect Detection in Photolithographic Patterns Using Deep Learning Models Trained on Synthetic Data Prashant P. Shinde et.al. 2505.10192 null
2025-05-15 KAITIAN: A Unified Communication Framework for Enabling Efficient Collaboration Across Heterogeneous Accelerators in Embodied AI Systems Jieke Lin et.al. 2505.10183 null
2025-05-15 Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence Xiang He et.al. 2505.10176 link
2025-05-15 High-performance local automaton decoders for defect matching in 1D Louis Paletta et.al. 2505.10162 null
2025-05-15 CFARNet: Learning-Based High-Resolution Multi-Target Detection for Rainbow Beam Radar Qiushi Liang et.al. 2505.10150 null
2025-05-15 VRSplat: Fast and Robust Gaussian Splatting for Virtual Reality Xuechang Tu et.al. 2505.10144 link
2025-05-15 IMITATE: Image Registration with Context for unknown time frame recovery Ziad Kheil et.al. 2505.10124 link
2025-05-15 Learning Virtual Machine Scheduling in Cloud Computing through Language Agents JieHao Wu et.al. 2505.10117 null
2025-05-15 LAV: Audio-Driven Dynamic Visual Generation with Neural Compression and StyleGAN2 Jongmin Jung et.al. 2505.10101 null
2025-05-14 UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing Yung-Hsuan Lai et.al. 2505.09615 link
2025-05-14 Quantum simulation of bubble nucleation across a quantum phase transition De Luo et.al. 2505.09607 null
2025-05-14 Spec2VolCAMU-Net: A Spectrogram-to-Volume Model for EEG-to-fMRI Reconstruction based on Multi-directional Time-Frequency Convolutional Attention Encoder and Vision-Mamba U-Net Dongyi He et.al. 2505.09521 link
2025-05-14 Wearable Tracking of Eye and Body Movements During Breaching Training: Towards Real-Time Blast Injury Monitoring Jeremy P. Kemmerer et.al. 2505.09508 null
2025-05-14 Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput Bo Zhang et.al. 2505.09498 null
2025-05-14 Decentralized Nonlinear Model Predictive Control-Based Flock Navigation with Real-Time Obstacle Avoidance in Unknown Obstructed Environments Nuthasith Gerdpratoom et.al. 2505.09434 null
2025-05-14 UMotion: Uncertainty-driven Human Motion Estimation from Inertial and Ultra-wideband Units Huakun Liu et.al. 2505.09393 link
2025-05-14 Examining Deployment and Refinement of the VIOLA-AI Intracranial Hemorrhage Model Using an Interactive NeoMedSys Platform Qinghui Liu et.al. 2505.09380 link
2025-05-14 ARCANE -- Early Detection of Interplanetary Coronal Mass Ejections H. T. Rüdisser et.al. 2505.09365 link
2025-05-14 APR-Transformer: Initial Pose Estimation for Localization in Complex Environments through Absolute Pose Regression Srinivas Ravuri et.al. 2505.09356 link
2025-05-14 Neural Video Compression using 2D Gaussian Splatting Lakshya Gupta et.al. 2505.09324 null
2025-05-14 Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging Hongjin Qian et.al. 2505.09316 null
2025-05-14 Robot-Assisted Drone Recovery on a Wavy Surface Using Error-State Kalman Filter and Receding Horizon Model Predictive Control Yimou Wu et.al. 2505.09145 null
2025-05-14 Quantum Error-Corrected Computation of Molecular Energies Kentaro Yamamoto et.al. 2505.09133 null
2025-05-14 Non-equilibrium scalar fields at finite temperature and density Sebastian Mendizabal et.al. 2505.09104 null
2025-05-14 OpenLKA: An Open Dataset of Lane Keeping Assist from Recent Car Models under Real-world Driving Conditions Yuhang Wang et.al. 2505.09092 link
2025-05-14 Modeling Interdependent Cybersecurity Threats Using Bayesian Networks: A Case Study on In-Vehicle Infotainment Systems Sangita Sridar et.al. 2505.09048 null
2025-05-14 RT-cache: Efficient Robot Trajectory Retrieval System Owen Kwon et.al. 2505.09040 null
2025-05-14 Multiparty Selective Disclosure using Attribute-Based Encryption Shigenori Ohashi et.al. 2505.09034 null
2025-05-13 Enhancing Aerial Combat Tactics through Hierarchical Multi-Agent Reinforcement Learning Ardian Selmonaj et.al. 2505.08995 null
2025-05-13 Aya Vision: Advancing the Frontier of Multilingual Multimodality Saurabh Dash et.al. 2505.08751 null
2025-05-13 A Study of Data-driven Methods for Inventory Optimization Lee Yeung Ping et.al. 2505.08673 null
2025-05-13 Claycode: Stylable and Deformable 2D Scannable Codes Marco Maida et.al. 2505.08666 null
2025-05-13 ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking Haofeng Liu et.al. 2505.08581 link
2025-05-13 End-to-End Multi-Task Policy Learning from NMPC for Quadruped Locomotion Anudeep Sajja et.al. 2505.08574 null
2025-05-13 Extract the Best, Discard the Rest: CSI Feedback with Offline Large AI Models Jialin Zhuang et.al. 2505.08566 null
2025-05-13 Towards Digital Twin in Flood Forecasting with Data Assimilation Satellite Earth Observations -- A Proof-of-Concept in the Alzette Catchment Thanh Huy Nguyen et.al. 2505.08553 null
2025-05-13 The RaspGrade Dataset: Towards Automatic Raspberry Ripeness Grading with Deep Learning Mohamed Lamine Mekhalfi et.al. 2505.08537 null
2025-05-13 Diffusion-assisted Model Predictive Control Optimization for Power System Real-Time Operation Linna Xu et.al. 2505.08535 null
2025-05-13 Towards Resilient SDA: Graph Theory and Cooperative Control in Distributed Network Architectures Nesrine Benchoubane et.al. 2505.08520 null
2025-05-13 Isolation Forest in Novelty Detection Scenario Adam Ulrich et.al. 2505.08489 null
2025-05-13 BAT: Benchmark for Auto-bidding Task Alexandra Khirianova et.al. 2505.08485 link
2025-05-13 Large Language Models Meet Stance Detection: A Survey of Tasks, Methods, Applications, Challenges and Future Directions Lata Pangtey et.al. 2505.08464 null
2025-05-13 Measurements of molecular size and shape on a chip Xin Zhu et.al. 2505.08452 null
2025-05-13 Anisotropic fluctuations of momentum and angular momentum of heavy quarks in the pre-equilibrium stage of pA collisions at the LHC Gabriele Parisi et.al. 2505.08441 null
2025-05-13 MDF: Multi-Modal Data Fusion with CNN-Based Object Detection for Enhanced Indoor Localization Using LiDAR-SLAM Saqi Hussain Kalan et.al. 2505.08388 null
2025-05-13 FauForensics: Boosting Audio-Visual Deepfake Detection with Facial Action Units Jian Wang et.al. 2505.08294 null
2025-05-13 Ground-based Observations of Temporal Variation of Cosmic Ray Spectrum during Forbush Decreases W. Mitthumsiri et.al. 2505.08248 null
2025-05-13 Motion Control of High-Dimensional Musculoskeletal Systems with Hierarchical Model-Based Planning Yunyue Wei et.al. 2505.08238 null
2025-05-13 Online differentially private inference in stochastic gradient descent Jinhan Xie et.al. 2505.08227 null
2025-05-13 VTutor for High-Impact Tutoring at Scale: Managing Engagement and Real-Time Multi-Screen Monitoring with P2P Connections Eason Chen et.al. 2505.07736 null
2025-05-12 MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering Rushi Qiang et.al. 2505.07782 link
2025-05-12 Robo-Taxi Fleet Coordination with Accelerated High-Capacity Ridepooling Xinling Li et.al. 2505.07776 null
2025-05-12 Benchmarking of CPU-intensive Stream Data Processing in The Edge Computing Systems Tomasz Szydlo et.al. 2505.07755 null
2025-05-12 Gameplay Highlights Generation Vignesh Edithal et.al. 2505.07721 null
2025-05-12 Hybrid Control Strategies for Safe and Adaptive Robot-Assisted Dressing Yasmin Rafiq et.al. 2505.07710 null
2025-05-12 Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications Biel Tura Vecino et.al. 2505.07701 null
2025-05-12 Verified Purely Functional Catenable Real-Time Deques Jules Viennot et.al. 2505.07681 null
2025-05-12 SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models Hang Wu et.al. 2505.07680 null
2025-05-12 Intuitive Human-Robot Interfaces Leveraging on Autonomy Features for the Control of Highly-redundant Robots Davide Torielli et.al. 2505.07668 link
2025-05-12 Neural Brain: A Neuroscience-inspired Framework for Embodied Agents Jian Liu et.al. 2505.07634 link
2025-05-12 Deep Learning Advances in Vision-Based Traffic Accident Anticipation: A Comprehensive Review of Methods,Datasets,and Future Directions Yi Zhang et.al. 2505.07611 null
2025-05-12 AgentFlow: Resilient Adaptive Cloud-Edge Framework for Multi-Agent Coordination Ching Han Chen et.al. 2505.07603 null
2025-05-12 Decoding Chess Puzzle Play and Standard Cognitive Tasks for BCI: A Low-Cost EEG Study Matthew Russell et.al. 2505.07592 null
2025-05-12 Privacy-Preserving Real-Time Vietnamese-English Translation on iOS using Edge AI Cong Le et.al. 2505.07583 null
2025-05-12 Superstring entanglement at finite temperature and its Hagedorn behavior Daniel Luiz Nedel et.al. 2505.07567 null
2025-05-12 Self-Supervised Event Representations: Towards Accurate, Real-Time Perception on SoC FPGAs Kamil Jeziorek et.al. 2505.07556 null
2025-05-12 GIFStream: 4D Gaussian-based Immersive Video with Feature Stream Hao Li et.al. 2505.07539 null
2025-05-12 Convex Trajectory Optimization via Monomial Coordinates Transcription for Cislunar Rendezvous Omar Regantini et.al. 2505.07521 null
2025-05-12 Lightweight Multispectral Crop-Weed Segmentation for Precision Agriculture Zeynep Galymzhankyzy et.al. 2505.07444 null
2025-05-09 A Large Language Model-Enhanced Q-learning for Capacitated Vehicle Routing Problem with Time Windows Linjiang Cao et.al. 2505.06178 null
2025-05-09 Estimating Quality in Therapeutic Conversations: A Multi-Dimensional Natural Language Processing Framework Alice Rueda et.al. 2505.06151 null
2025-05-09 S2MNet: Speckle-To-Mesh Net for Three-Dimensional Cardiac Morphology Reconstruction via Echocardiogram Xilin Gong et.al. 2505.06105 null
2025-05-09 HashKitty: Distributed Password Analysis Pedro Antunes et.al. 2505.06084 link
2025-05-09 Centralized Decision-Making for Platooning By Using SPaT-Driven Reference Speeds Melih Yazgan et.al. 2505.06071 null
2025-05-09 Context Informed Incremental Learning Improves Myoelectric Control Performance in Virtual Reality Object Manipulation Tasks Gabriel Gagné et.al. 2505.06064 link
2025-05-09 Fast Differentiable Modal Simulation of Non-linear Strings, Membranes, and Plates Rodrigo Diaz et.al. 2505.05940 link
2025-05-09 Priority-Driven Safe Model Predictive Control Approach to Autonomous Driving Applications Francesco Prignoli et.al. 2505.05933 null
2025-05-09 Multi-armed Bandit for Stochastic Shortest Path in Mixed Autonomy Yu Bai et.al. 2505.05878 null
2025-05-09 DaringFed: A Dynamic Bayesian Persuasion Pricing for Online Federated Learning under Two-sided Incomplete Information Yun Xin et.al. 2505.05842 null
2025-05-09 Human-in-the-Loop AI for HVAC Management Enhancing Comfort and Energy Efficiency Xinyu Liang et.al. 2505.05796 null
2025-05-09 Quantitative Hardness Assessment with Vision-based Tactile Sensing for Fruit Classification and Grasping Zhongyuan Liao et.al. 2505.05725 null
2025-05-08 An Efficient Transport-Based Dissimilarity Measure for Time Series Classification under Warping Distortions Akram Aldroubi et.al. 2505.05676 null
2025-05-08 Adaptive Stress Testing Black-Box LLM Planners Neeloy Chakraborty et.al. 2505.05665 null
2025-05-08 UltraGauss: Ultrafast Gaussian Reconstruction of 3D Ultrasound Volumes Mark C. Eid et.al. 2505.05643 null
2025-05-08 LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities Kalyan Nakka et.al. 2505.05619 link
2025-05-08 Trading Under Uncertainty: A Distribution-Based Strategy for Futures Markets Using FutureQuant Transformer Wenhao Guo et.al. 2505.05595 null
2025-05-08 Flight Validation of Learning-Based Trajectory Optimization for the Astrobee Free-Flyer Somrita Banerjee et.al. 2505.05588 null
2025-05-08 Steepest Descent Density Control for Compact 3D Gaussian Splatting Peihao Wang et.al. 2505.05587 null
2025-05-08 Quantum-network nodes with real-time noise mitigation using spectator qubits S. J. H. Loenen et.al. 2505.05582 null
2025-05-08 SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation Yonwoo Choi et.al. 2505.05475 link
2025-05-08 StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant Haibo Wang et.al. 2505.05467 null
2025-05-08 EDmamba: A Simple yet Effective Event Denoising Method with State Space Model Ciyu Ruan et.al. 2505.05391 null
2025-05-08 OcularAge: A Comparative Study of Iris and Periocular Images for Pediatric Age Estimation Naveenkumar G Venkataswamy et.al. 2505.05374 null
2025-05-08 Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization Sooyoung Park et.al. 2505.05343 link
2025-05-08 Progressive Inertial Poser: Progressive Real-Time Kinematic Chain Estimation for 3D Full-Body Pose from Three IMU Sensors Zunjie Zhu et.al. 2505.05336 null
2025-05-08 Advanced Stock Market Prediction Using Long Short-Term Memory Networks: A Comprehensive Deep Learning Framework Rajneesh Chaudhary et.al. 2505.05325 null
2025-05-08 SmartTrap: Automated Precision Experiments with Optical Tweezers Martin Selin et.al. 2505.05290 null
2025-05-08 CV-MP: Max-Pressure Control in Heterogeneously Distributed and Partially Connected Vehicle Environments Chaopeng Tan et.al. 2505.05258 null
2025-05-08 Adaptive Biased User Scheduling for Heterogeneous Wireless Federate Learning Network Changxiang Wu et.al. 2505.05231 null
2025-05-08 PaniCar: Securing the Perception of Advanced Driving Assistance Systems Against Emergency Vehicle Lighting Elad Feldman et.al. 2505.05183 null
2025-05-08 Multi-agent Embodied AI: Advances and Future Directions Zhaohan Feng et.al. 2505.05108 null
2025-05-08 Pairing Real-Time Piano Transcription with Symbol-level Tracking for Precise and Robust Score Following Silvan Peter et.al. 2505.05078 null
2025-05-08 xTrace: A Facial Expressive Behaviour Analysis Tool for Continuous Affect Recognition Mani Kumar Tellamekala et.al. 2505.05043 null
2025-05-08 Reality-infused Deep Learning Framework via Angle-resolved Metasurfaces Wei Chen et.al. 2505.05011 null
2025-05-08 StabStitch++: Unsupervised Online Video Stitching with Spatiotemporal Bidirectional Warps Lang Nie et.al. 2505.05001 link
2025-05-08 Robust Model-Based In-Hand Manipulation with Integrated Real-Time Motion-Contact Planning and Tracking Yongpeng Jiang et.al. 2505.04978 null
2025-05-08 The candidates of 2 $α$ condensate around the 16O nucleus studied by the real-time evolution method Y. M. Htet et.al. 2505.04975 null
2025-05-08 AI and Vision based Autonomous Navigation of Nano-Drones in Partially-Known Environments Mattia Sartori et.al. 2505.04972 null
2025-05-08 Real-Time Model Predictive Control of Vehicles with Convex-Polygon-Aware Collision Avoidance in Tight Spaces Haruki Kojima et.al. 2505.04935 null
2025-05-07 EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning Zhenghao Xing et.al. 2505.04623 link
2025-05-07 Dynamic Network Flow Optimization for Task Scheduling in PTZ Camera Surveillance Systems Mohammad Merati et.al. 2505.04596 null
2025-05-07 Runtime Advocates: A Persona-Driven Framework for Requirements@Runtime Decision Support Demetrius Hernandez et.al. 2505.04551 null
2025-05-07 Edge-GPU Based Face Tracking for Face Detection and Recognition Acceleration Asma Baobaid et.al. 2505.04524 null
2025-05-07 Leveraging Simultaneous Usage of Edge GPU Hardware Engines for Video Face Detection and Recognition Asma Baobaid et.al. 2505.04502 null
2025-05-07 Estimating Dynamic Soft Continuum Robot States From Boundaries Tongjia Zheng et.al. 2505.04491 null
2025-05-07 "I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments Ziyi Zhang et.al. 2505.04488 null
2025-05-07 Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration Shigeki Karita et.al. 2505.04457 link
2025-05-07 Meta-Learning Driven Lightweight Phase Shift Compression for IRS-Assisted Wireless Systems Xianhua Yu et.al. 2505.04453 null
2025-05-07 Phase Shift Information Compression in IRS-aided Wireless Systems: Challenges and Opportunities Xianhua Yu et.al. 2505.04449 null
2025-05-07 SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer Young-Hu Park et.al. 2505.04394 null
2025-05-07 Predicting Road Surface Anomalies by Visual Tracking of a Preceding Vehicle Petr Jahoda et.al. 2505.04392 null
2025-05-07 Design and Evaluation of an NDN-Based Network for Distributed Digital Twins Chen Chen et.al. 2505.04326 null
2025-05-07 Massive MIMO: Instantaneous versus Statistical CSI-Based Power Allocation Zahra Mobini et.al. 2505.04294 null
2025-05-07 Integrated Airline Fleet and Crew Recovery through Local Search Philip de Bruin et.al. 2505.04274 null
2025-05-07 RGB-Event Fusion with Self-Attention for Collision Prediction Pietro Bonazzi et.al. 2505.04258 link
2025-05-07 Multi-Agent Reinforcement Learning-based Cooperative Autonomous Driving in Smart Intersections Taoyuan Yu et.al. 2505.04231 null
2025-05-07 An Enhanced YOLOv8 Model for Real-Time and Accurate Pothole Detection and Measurement Mustafa Yurdakul et.al. 2505.04207 null
2025-05-07 Spatial-Wavelength Multiplexing Reliable Photonic Integrated General-Purpose Analog Computing System Tao Zhu et.al. 2505.04197 null
2025-05-07 Beyond Task Performance: Human Experience in Human-Robot Collaboration Sean Kille et.al. 2505.04182 null
2025-05-06 VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model Zuwei Long et.al. 2505.03739 link
2025-05-06 AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control Jialong Li et.al. 2505.03738 null
2025-05-06 Frenet Corridor Planner: An Optimal Local Path Planning Framework for Autonomous Driving Faizan M. Tariq et.al. 2505.03695 null
2025-05-06 RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration Huajie Tan et.al. 2505.03673 link
2025-05-06 PAHA: Parts-Aware Audio-Driven Human Animation with Diffusion Model Y. B. Wang et.al. 2505.03603 null
2025-05-06 LlamaFirewall: An open source guardrail system for building secure AI agents Sahana Chennabasappa et.al. 2505.03574 null
2025-05-06 Real-Time Person Image Synthesis Using a Flow Matching Model Jiwoo Jeong et.al. 2505.03562 link
2025-05-06 Rapid AI-based generation of coverage paths for dispensing applications Simon Baeuerle et.al. 2505.03560 null
2025-05-06 Real-time small area estimation of food security in Zimbabwe: integrating mobile-phone and face-to-face surveys using joint multilevel regression and poststratification Sahoko Ishida et.al. 2505.03517 link
2025-05-06 Learning-based Homothetic Tube MPC Yulong Gao et.al. 2505.03482 link
2025-05-06 A generalised non-linear reconstructor for all Fourier-type wavefront sensors Victoria Laidlaw et.al. 2505.03477 null
2025-05-06 Simulation to Reality: Testbeds and Architectures for Connected and Automated Vehicles David Klüner et.al. 2505.03472 null
2025-05-06 Mitigating Backdoor Triggered and Targeted Data Poisoning Attacks in Voice Authentication Systems Alireza Mohammadi et.al. 2505.03455 null
2025-05-06 Advancing Remote and Continuous Cardiovascular Patient Monitoring through a Novel and Resource-efficient IoT-Driven Framework Sanam Nayab et.al. 2505.03409 null
2025-05-06 Quantum Feature Space of a Qubit Coupled to an Arbitrary Bath Chris Wise et.al. 2505.03397 link
2025-05-06 DroidRetriever: An Autonomous Navigation and Information Integration System Facilitating Mobile Sensemaking Yiheng Bian et.al. 2505.03364 null
2025-05-06 GUAVA: Generalizable Upper Body 3D Gaussian Avatar Dongbin Zhang et.al. 2505.03351 null
2025-05-06 Artificial Behavior Intelligence: Technology, Challenges, and Future Directions Kanghyun Jo et.al. 2505.03315 null
2025-05-06 An Active Inference perspective on Neurofeedback Training Côme Annicchiarico et.al. 2505.03308 null
2025-05-06 Model Predictive Fuzzy Control: A Hierarchical Multi-Agent Control Architecture for Outdoor Search-and-Rescue Robots Craig Maxwell et.al. 2505.03257 null
2025-05-05 Beyond the Monitor: Mixed Reality Visualization and AI for Enhanced Digital Pathology Workflow Jai Prakash Veerla et.al. 2505.02780 link
2025-05-05 Teaching the social media generation: rethinking learning without sacrificing quality Sepinoud Azimi et.al. 2505.02770 null
2025-05-05 Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play Yemin Shi et.al. 2505.02707 link
2025-05-05 Dance of Fireworks: An Interactive Broadcast Gymnastics Training System Based on Pose Estimation Haotian Chen et.al. 2505.02690 null
2025-05-05 Adaptive Budgeted Multi-Armed Bandits for IoT with Dynamic Resource Constraints Shubham Vaishnav et.al. 2505.02640 null
2025-05-05 LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis Qingkai Fang et.al. 2505.02625 link
2025-05-05 Wise Goose Chase: A Predictive Path Planning Algorithm for Dynamic Rebalancing in Ride-Hailing Systems Avalpreet Singh Brar et.al. 2505.02603 null
2025-05-05 Maximal Compatibility Matching for Preference-Aware Ride-Hailing Systems Avalpreet Singh Brar et.al. 2505.02599 null
2025-05-05 LiDAR-Inertial SLAM-Based Navigation and Safety-Oriented AI-Driven Control System for Skid-Steer Robots Mehdi Heydari Shahna et.al. 2505.02598 null
2025-05-05 Spatiotemporal Non-Uniformity-Aware Online Task Scheduling in Collaborative Edge Computing for Industrial Internet of Things Yang Li et.al. 2505.02597 null
2025-05-05 HapticVLM: VLM-Driven Texture Recognition Aimed at Intelligent Haptic Interaction Muhammad Haris Khan et.al. 2505.02569 null
2025-05-05 Machine-Learning-Powered Neural Interfaces for Smart Prosthetics and Diagnostics MohammadAli Shaeri et.al. 2505.02516 null
2025-05-05 ReeM: Ensemble Building Thermodynamics Model for Efficient HVAC Control via Hierarchical Reinforcement Learning Yang Deng et.al. 2505.02439 null
2025-05-05 Towards Effective Issue Assignment using Online Machine Learning Athanasios Michailoudis et.al. 2505.02437 link
2025-05-05 Encrypted Federated Search Using Homomorphic Encryption Om Rathod et.al. 2505.02409 null
2025-05-05 A Real-Time Control Barrier Function-Based Safety Filter for Motion Planning with Arbitrary Road Boundary Constraints Jianye Xu et.al. 2505.02395 link
2025-05-05 Sloshing suppression with a controlled elastic baffle via deep reinforcement learning and SPH simulation Mai Ye et.al. 2505.02354 null
2025-05-05 Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering Jihao Zhao et.al. 2505.02311 link
2025-05-04 RNBF: Real-Time RGB-D Based Neural Barrier Functions for Safe Robotic Navigation Satyajeet Das et.al. 2505.02294 null
2025-05-04 Real-time Spatial Retrieval Augmented Generation for Urban Environments David Nazareno Campo et.al. 2505.02271 null
2025-05-02 FalconWing: An Open-Source Platform for Ultra-Light Fixed-Wing Aircraft Research Yan Miao et.al. 2505.01383 null
2025-05-02 An Efficient Real-Time Planning Method for Swarm Robotics Based on an Optimal Virtual Tube Pengda Mao et.al. 2505.01380 null
2025-05-02 Closing the Loop: A Systematic Review of Experience-Driven Game Adaptation Phil Lopes et.al. 2505.01351 null
2025-05-02 How much to Dereverberate? Low-Latency Single-Channel Speech Enhancement in Distant Microphone Scenarios Satvik Venkatesh et.al. 2505.01338 null
2025-05-02 Early Detection of Patient Deterioration from Real-Time Wearable Monitoring System Lo Pang-Yun Ting et.al. 2505.01305 null
2025-05-02 Contactless pulse rate assessment: Results and insights for application in driving simulator Đorđe D. Nešković et.al. 2505.01299 null
2025-05-02 FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing Gaoxiang Cong et.al. 2505.01263 null
2025-05-02 CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment Edson Araujo et.al. 2505.01237 link
2025-05-02 Efficient Vision-based Vehicle Speed Estimation Andrej Macko et.al. 2505.01203 null
2025-05-02 Machine learning-based prediction of species mass fraction and flame characteristics in partially premixed turbulent jet flame Amirali Shateri et.al. 2505.01201 null
2025-05-02 AGRO: An Autonomous AI Rover for Precision Agriculture Simar Ghumman et.al. 2505.01200 null
2025-05-02 A Secured Triad of IoT, Machine Learning, and Blockchain for Crop Forecasting in Agriculture Najmus Sakib Sizan et.al. 2505.01196 null
2025-05-02 Fast Flow-based Visuomotor Policies via Conditional Optimal Transport Couplings Andreas Sochopoulos et.al. 2505.01179 null
2025-05-02 Empirical Comparison of Lightweight Forecasting Models for Seasonal and Non-Seasonal Time Series Thanh Son Nguyen et.al. 2505.01163 null
2025-05-02 Machine Learning for Physical Simulation Challenge Results and Retrospective Analysis: Power Grid Use Case Milad Leyli-Abadi et.al. 2505.01156 null
2025-05-02 In-Situ Growth and Ionic Switching Behavior of Single-Crystalline Silver Iodide Nanoflakes Amir Parsi et.al. 2505.01062 null
2025-05-02 Model Tensor Planning An T. Le et.al. 2505.01059 link
2025-05-02 Kinetic roughening transition of ice crystals and its implications during recrystallization Jorge H. Melillo et.al. 2505.01055 null
2025-05-02 Tightly Coupled Range Inertial Odometry and Mapping with Exact Point Cloud Downsampling Kenji Koide et.al. 2505.01017 null
2025-05-02 Identifying Root Cause of bugs by Capturing Changed Code Lines with Relational Graph Neural Networks Jiaqi Zhang et.al. 2505.00990 link
2025-05-01 A Practical Framework for Simulating Time-Resolved Spectroscopy Based on a Real-time Dyson Expansion Cian Reeves et.al. 2505.00667 null
2025-05-01 Open-Source LLM-Driven Federated Transformer for Predictive IoV Management Yazan Otoum et.al. 2505.00651 null
2025-05-01 Deep Learning Assisted Outer Volume Removal for Highly-Accelerated Real-Time Dynamic MRI Merve Gülle et.al. 2505.00643 null
2025-05-01 Fully passive quantum random number generation with untrusted light KaiWei Qiu et.al. 2505.00636 null
2025-05-01 A Novel Feature-Aware Chaotic Image Encryption Scheme For Data Security and Privacy in IoT and Edge Networks Muhammad Shahbaz Khan et.al. 2505.00593 null
2025-05-01 Bridging Cultural and Digital Divides: A Low-Latency JackTrip Framework for Equitable Music Education in the Global South Tiange Zhou et.al. 2505.00550 null
2025-05-01 Leveraging Partial SMILES Validation Scheme for Enhanced Drug Design in Reinforcement Learning Frameworks Xinyu Wang et.al. 2505.00530 null
2025-05-01 UserCentrix: An Agentic Memory-augmented AI Framework for Smart Spaces Alaa Saleh et.al. 2505.00472 null
2025-05-01 HoneyWin: High-Interaction Windows Honeypot in Enterprise Environment Yan Lin Aung et.al. 2505.00465 null
2025-05-01 Real-Time Animatable 2DGS-Avatars with Detail Enhancement from Monocular Videos Xia Yuan et.al. 2505.00421 null
2025-05-01 Multi-dimensional optical imaging on a chip Liheng Bian et.al. 2505.00408 link
2025-05-01 Stealth Signals: Multi-Discriminator GANs for Covert Communications Against Diverse Wardens Afan Ali et.al. 2505.00399 null
2025-05-01 Urban Air Mobility as a System of Systems: An LLM-Enhanced Holonic Approach Ahmed R. Sadik et.al. 2505.00368 null
2025-05-01 Edge Large AI Models: Revolutionizing 6G Networks Zixin Wang et.al. 2505.00321 null
2025-05-01 Avatar Communication Provides More Efficient Online Social Support Than Text Communication Masanori Takano et.al. 2505.00287 null
2025-05-01 Empowering Agentic Video Analytics Systems with Video Language Models Yuxuan Yan et.al. 2505.00254 null
2025-05-01 LLM-Based Threat Detection and Prevention Framework for IoT Ecosystems Yazan Otoum et.al. 2505.00240 null
2025-04-30 Real-Time Brain-Computer Interface Control of Walking Exoskeleton with Bilateral Sensory Feedback Jeffrey Lim et.al. 2505.00219 null
2025-04-30 PSN Game: Game-theoretic Planning via a Player Selection Network Tianyu Qiu et.al. 2505.00213 null
2025-04-30 Generative Machine Learning in Adaptive Control of Dynamic Manufacturing Processes: A Review Suk Ki Lee et.al. 2505.00210 null
2025-04-30 A Survey of Interactive Generative Video Jiwen Yu et.al. 2504.21853 null
2025-04-30 Differentiable Room Acoustic Rendering with Multi-View Vision Priors Derong Jin et.al. 2504.21847 null
2025-04-30 Why Compress What You Can Generate? When GPT-4o Generation Ushers in Image Compression Fields Yixin Gao et.al. 2504.21814 null
2025-04-30 WebThinker: Empowering Large Reasoning Models with Deep Research Capability Xiaoxi Li et.al. 2504.21776 link
2025-04-30 Smart Environmental Monitoring of Marine Pollution using Edge AI Mohamed Moursi et.al. 2504.21759 null
2025-04-30 TheraQuest: A Gamified, LLM-Powered Simulation for Massage Therapy Training Shengqian Wang et.al. 2504.21735 null
2025-04-30 MovementVR: An open-source tool for the study of motor control and learning in virtual reality Cristina Rossi et.al. 2504.21696 null
2025-04-30 Enhancing Health Mention Classification Performance: A Study on Advancements in Parameter Efficient Tuning Reem Abdel-Salam et.al. 2504.21685 null
2025-04-30 Effect of eccentric mixing parameters on chaotic characteristics and mixing time for viscous liquid based on sound decibels Ronfgang Wang et.al. 2504.21621 null
2025-04-30 Real Time Semantic Segmentation of High Resolution Automotive LiDAR Scans Hannes Reichert et.al. 2504.21602 link
2025-04-30 Real-time Program Evaluation using Anytime-valid Rank Tests Sam van Meer et.al. 2504.21595 null
2025-04-30 Toward Realization of Low-Altitude Economy Networks: Core Architecture, Integrated Technologies, and Future Directions Yixian Wang et.al. 2504.21583 null
2025-04-30 Scientific Workflow Scheduling in Cloud Considering Cold Start and Variable Pricing Model Suvarthi Sarkar et.al. 2504.21536 null
2025-04-30 Efficient Conversational Search via Topical Locality in Dense Retrieval Cristina Ioana Muntean et.al. 2504.21507 link
2025-04-30 Turning a Disposable Bronchoscope into a Dynamic Speckle Imaging Tool: Yes, It Works Aurélien plyer et.al. 2504.21469 null
2025-04-30 Integration of a Synthetic Molecular Motor Into a Rotary DNA Nanostructure: A Framework for Single-Molecule Actuation Seham Helmi et.al. 2504.21434 null
2025-04-30 Enhanced Semi-Supervised Stamping Process Monitoring with Physically-Informed Feature Extraction Jianyu Zhang et.al. 2504.21389 null
2025-04-30 DGFNet: End-to-End Audio-Visual Source Separation Based on Dynamic Gating Fusion Yinfeng Yu et.al. 2504.21366 null
2025-04-30 ImaginateAR: AI-Assisted In-Situ Authoring in Augmented Reality Jaewook Lee et.al. 2504.21360 null
2025-04-30 Generative QoE Modeling: A Lightweight Approach for Telecom Networks Vinti Nayar et.al. 2504.21353 null
2025-04-29 Real-Time Wayfinding Assistant for Blind and Low-Vision Users Dabbrata Das et.al. 2504.20976 null
2025-04-29 SVD Based Least Squares for X-Ray Pneumonia Classification Using Deep Features Mete Erdogan et.al. 2504.20970 null
2025-04-29 AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security Zikui Cai et.al. 2504.20965 link
2025-04-29 Mìmir: A real-time interactive visualization library for CUDA programs Francisco Carter et.al. 2504.20937 null
2025-04-29 SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings Florian Vahl et.al. 2504.20808 null
2025-04-29 Integrating Human Feedback into a Reinforcement Learning-Based Framework for Adaptive User Interfaces Daniel Gaspar-Figueiredo et.al. 2504.20782 null
2025-04-29 An Online Cross-layered Defense Strategy with Bandwidth Allocation for Multi-channel Systems under DoS Attacks Liheng Wan et.al. 2504.20762 null
2025-04-29 Confidence-based Intent Prediction for Teleoperation in Bimanual Robotic Suturing Zhaoyang Jacopo Hu et.al. 2504.20761 null
2025-04-29 Graph-Based Fault Diagnosis for Rotating Machinery: Adaptive Segmentation and Structural Feature Integration Moirangthem Tiken Singh et.al. 2504.20756 null
2025-04-29 Formal and Empirical Study of Metadata-Based Profiling for Resource Management in the Computing Continuum Andrea Morichetta et.al. 2504.20740 link
2025-04-29 Intelligent Task Offloading in VANETs: A Hybrid AI-Driven Approach for Low-Latency and Energy Efficiency Tariq Qayyum et.al. 2504.20735 null
2025-04-29 A High-Granularity Proton CT Enhanced by Track Discrimination Huang-Chao Shi et.al. 2504.20698 null
2025-04-29 Efficient Listener: Dyadic Facial Motion Synthesis via Action Diffusion Zesheng Wang et.al. 2504.20685 null
2025-04-29 Quantum Computation for Jets in Heavy Ion Collisions Wenyang Qian et.al. 2504.20683 null
2025-04-29 FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection Yao Xiao et.al. 2504.20670 null
2025-04-29 Quantum-Enhanced Hybrid Reinforcement Learning Framework for Dynamic Path Planning in Autonomous Systems Sahil Tomar et.al. 2504.20660 null
2025-04-29 PaRT: Enhancing Proactive Social Chatbots with Personalized Real-Time Retrieval Zihan Niu et.al. 2504.20624 null
2025-04-29 Information Retrieval in the Age of Generative AI: The RGB Model Michele Garetto et.al. 2504.20610 link
2025-04-29 WakeLoc: An Ultra-Low Power, Accurate and Scalable On-Demand RTLS using Wake-Up Radios Silvano Cortesi et.al. 2504.20545 null
2025-04-29 Digital Twin-Empowered Cooperative Autonomous Car-sharing Services: Proof-of-Concept Kazuma Nonomura et.al. 2504.20542 null
2025-04-29 Feelbert: A Feedback Linearization-based Embedded Real-Time Quadrupedal Locomotion Framework Aristide Emanuele Casucci et.al. 2504.19965 null
2025-04-28 Learning Streaming Video Representation via Multitask Training Yibin Yan et.al. 2504.20041 null
2025-04-28 HJRNO: Hamilton-Jacobi Reachability with Neural Operators Yankai Li et.al. 2504.19989 null
2025-04-28 Real-Time Imitation of Human Head Motions, Blinks and Emotions by Nao Robot: A Closed-Loop Approach Keyhan Rayati et.al. 2504.19985 null
2025-04-28 Shopformer: Transformer-Based Framework for Detecting Shoplifting via Human Pose Narges Rashvand et.al. 2504.19970 null
2025-04-28 Automated decision-making for dynamic task assignment at scale Riccardo Lo Bianco et.al. 2504.19933 link
2025-04-28 NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks Chia-Yu Hung et.al. 2504.19854 null
2025-04-28 Optimizing the Charging of Open Quantum Batteries using Long Short-Term Memory-Driven Reinforcement Learning Shadab Zakavati et.al. 2504.19840 null
2025-04-28 Optimal real-time dynamic treatment regimes with application to oxytocin use in preventing postpartum hemorrhage Haiyan Zhu et.al. 2504.19831 null
2025-04-28 Digital Twin-based Out-of-Distribution Detection in Autonomous Vessels Erblin Isaku et.al. 2504.19816 null
2025-04-28 Contrastive Language-Image Learning with Augmented Textual Prompts for 3D/4D FER Using Vision-Language Model Muzammil Behzad et.al. 2504.19739 null
2025-04-28 The ATLAS of Traffic Lights: A Reliable Perception Framework for Autonomous Driving Rupert Polley et.al. 2504.19722 null
2025-04-28 Advances in Approximate Bayesian Inference for Models in Epidemiology Xiahui Li et.al. 2504.19698 null
2025-04-28 GPA-RAM: Grasp-Pretraining Augmented Robotic Attention Mamba for Spatial Task Learning Juyi Sheng et.al. 2504.19683 null
2025-04-28 Neuronal correlations shape the scaling behavior of memory capacity and nonlinear computational capability of recurrent neural networks Shotaro Takasu et.al. 2504.19657 null
2025-04-28 Transformation & Translation Occupancy Grid Mapping: 2-Dimensional Deep Learning Refined SLAM Leon Davies et.al. 2504.19654 null
2025-04-28 GAN-SLAM: Real-Time GAN Aided Floor Plan Creation Through SLAM Leon Davies et.al. 2504.19653 null
2025-04-28 Robot Motion Planning using One-Step Diffusion with Noise-Optimized Approximate Motions Tomoharu Aizu et.al. 2504.19652 null
2025-04-28 QFDNN: A Resource-Efficient Variational Quantum Feature Deep Neural Networks for Fraud Detection and Loan Prediction Subham Das et.al. 2504.19632 null
2025-04-28 ARMOR: Adaptive Meshing with Reinforcement Optimization for Real-time 3D Monitoring in Unexposed Scenes Yizhe Zhang et.al. 2504.19624 null
2025-04-25 Automating Nanoindentation: Optimizing Workflows for Precision and Accuracy Vivek Chawla et.al. 2504.18525 null
2025-04-25 Online Distributed Queue Length Estimation Aditya Bhaskara et.al. 2504.18503 null
2025-04-25 A Taylor Series Approach to Correction of Input Errors in Gaussian Process Regression Muzaffar Qureshi et.al. 2504.18463 null
2025-04-25 Enhancing Strawberry Yield Forecasting with Backcasted IoT Sensor Data and Machine Learning Tewodros Alemu Ayall et.al. 2504.18451 null
2025-04-25 Online learning to accelerate nonlinear PDE solvers: applied to multiphase porous media flow Vinicius L S Silva et.al. 2504.18414 null
2025-04-25 Virial theorem for rigidly rotating matter Sourav Dey et.al. 2504.18388 null
2025-04-25 Renewable-Colocated Green Hydrogen Production: Optimal Scheduling and Profitability Siying Li et.al. 2504.18368 null
2025-04-25 SSD-Poser: Avatar Pose Estimation with State Space Duality from Sparse Observations Shuting Zhao et.al. 2504.18332 null
2025-04-25 STP4D: Spatio-Temporal-Prompt Consistent Modeling for Text-to-4D Gaussian Splatting Yunze Deng et.al. 2504.18318 null
2025-04-25 Design and Evaluation of a UGV-Based Robotic Platform for Precision Soil Moisture Remote Sensing Ilektra Tsimpidi et.al. 2504.18284 null
2025-04-25 Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator Minjae Kang et.al. 2504.18283 null
2025-04-25 SecCityVR: Visualization and Collaborative Exploration of Software Vulnerabilities in Virtual Reality Dennis Wüppelman et.al. 2504.18238 null
2025-04-25 Time and Frequency Domain-based Anomaly Detection in Smart Meter Data for Distribution Network Studies Petar Labura et.al. 2504.18231 null
2025-04-25 Sampling-Based Grasp and Collision Prediction for Assisted Teleoperation Simon Manschitz et.al. 2504.18186 null
2025-04-25 PerfCam: Digital Twinning for Production Lines Using 3D Gaussian Splatting and Vision Models Michel Gokan Khan et.al. 2504.18165 link
2025-04-25 Evaluation of Distimation's Real-world Performance on a Superconducting Quantum Computer Hikaru Yokomori et.al. 2504.18141 null
2025-04-25 Study on Real-Time Road Surface Reconstruction Using Stereo Vision Deepak Ghimire et.al. 2504.18112 null
2025-04-25 Teleportation-based Speed Meter for Precision Measurement Yohei Nishino et.al. 2504.18111 null
2025-04-25 Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation Weipeng Tan et.al. 2504.18087 null
2025-04-25 Phonon-Assisted Radiative Lifetimes and Exciton Dynamics from First Principles Chunhao Guo et.al. 2504.18071 null
2025-04-24 Replay to Remember: Retaining Domain Knowledge in Streaming Language Models Sneh Pillai et.al. 2504.17780 null
2025-04-24 Disaggregated Deep Learning via In-Physics Computing at Radio Frequency Zhihui Gao et.al. 2504.17752 null
2025-04-24 BIM-Constrained Optimization for Accurate Localization and Deviation Correction in Construction Monitoring Asier Bikandi et.al. 2504.17693 null
2025-04-24 Optimized Cloud Resource Allocation Using Genetic Algorithms for Energy Efficiency and QoS Assurance Caroline Panggabean et.al. 2504.17675 null
2025-04-24 Unifying Complementarity Constraints and Control Barrier Functions for Safe Whole-Body Robot Control Rafael I. Cabral Muchacho et.al. 2504.17647 null
2025-04-24 Beyond Labels: Zero-Shot Diabetic Foot Ulcer Wound Segmentation with Self-attention Diffusion Models and the Potential for Text-Guided Customization Abderrachid Hamrani et.al. 2504.17628 null
2025-04-24 TSUE: A Two-Stage Data Update Method for an Erasure Coded Cluster File System Zheng Wei et.al. 2504.17598 null
2025-04-24 RGB-D Tracking via Hierarchical Modality Aggregation and Distribution Network Boyue Xu et.al. 2504.17595 null
2025-04-24 A Multi-Agent, Laxity-Based Aggregation Strategy for Cost-Effective Electric Vehicle Charging and Local Transformer Overload Prevention Kristoffer Christensen et.al. 2504.17575 null
2025-04-24 Flying through cluttered and dynamic environments with LiDAR Huajie Wu et.al. 2504.17569 null
2025-04-24 IRA: Adaptive Interest-aware Representation and Alignment for Personalized Multi-interest Retrieval Youngjune Lee et.al. 2504.17529 null
2025-04-24 Adaptive Orchestration of Modular Generative Information Access Systems Mohanna Hoveyda et.al. 2504.17454 link
2025-04-24 Storing and Querying Evolving Graphs in NoSQL Storage Models Alexandros Spitalas et.al. 2504.17438 null
2025-04-24 StereoMamba: Real-time and Robust Intraoperative Stereo Disparity Estimation via Long-range Spatial Dependencies Xu Wang et.al. 2504.17401 null
2025-04-24 Inverse-Designed Metasurfaces for Wavefront Restoration in Under-Display Camera Systems Jaegang Jo et.al. 2504.17368 null
2025-04-24 TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos Linli Yao et.al. 2504.17343 link
2025-04-24 Bridging Optical Sensing and Wearable Health Monitoring: A Functionalized Plasmonic Nanopillar for Non-Invasive Sweat Glucose Detection Ling Liu et.al. 2504.17339 null
2025-04-24 EdgePoint2: Compact Descriptors for Superior Efficiency and Accuracy Haodi Yao et.al. 2504.17280 null
2025-04-24 MV-Crafter: An Intelligent System for Music-guided Video Generation Chuer Chen et.al. 2504.17267 null
2025-04-24 Symbolic Representation for Any-to-Any Generative Tasks Jiaqi Chen et.al. 2504.17261 null
2025-04-24 Fast Online Adaptive Neural MPC via Meta-Learning Yu Mei et.al. 2504.16369 link
2025-04-23 Meta-Learning Online Dynamics Model Adaptation in Off-Road Autonomous Driving Jacob Levy et.al. 2504.16923 null
2025-04-23 An Accelerated Camera 3DMA Framework for Efficient Urban GNSS Multipath Estimation Shiyao Lv et.al. 2504.16906 null
2025-04-23 Reconfigurable Intelligent Surface Control for a Moving Receiver Hamed Radpour et.al. 2504.16874 null
2025-04-23 Graph2Nav: 3D Object-Relation Graph Generation to Robot Navigation Tixiao Shan et.al. 2504.16782 null
2025-04-23 Evaluation Framework for AI Systems in "the Wild" Sarah Jabbour et.al. 2504.16778 null
2025-04-23 Deep photonic reservoir computer for nonlinear equalization of 16-level quadrature amplitude modulation signals Rui-Qian Li et.al. 2504.16769 null
2025-04-23 Beating the break-even point with autonomous quantum error correction Yi Li et.al. 2504.16746 null
2025-04-23 PP-Tac: Paper Picking Using Tactile Feedback in Dexterous Robotic Hands Pei Lin et.al. 2504.16649 null
2025-04-23 Bridging Econometrics and AI: VaR Estimation via Reinforcement Learning and GARCH Models Fredy Pokou et.al. 2504.16635 null
2025-04-23 Data-Assimilated Model-Based Reinforcement Learning for Partially Observed Chaotic Flows Defne E. Ozan et.al. 2504.16588 null
2025-04-23 PsyCounAssist: A Full-Cycle AI-Powered Psychological Counseling Assistant System Xianghe Liu et.al. 2504.16573 null
2025-04-23 A Collaborative Intrusion Detection System Using Snort IDS Nodes Tom Davies et.al. 2504.16550 null
2025-04-23 6G EdgeAI: Performance Evaluation and Analysis Chien-Sheng Yang et.al. 2504.16529 null
2025-04-23 Intelligent Depression Prevention via LLM-Based Dialogue Analysis: Overcoming the Limitations of Scale-Dependent Diagnosis through Precise Emotional Pattern Recognition Zhenguang Zhong et.al. 2504.16504 null
2025-04-23 FeedQUAC: Quick Unobtrusive AI-Generated Commentary Tao Long et.al. 2504.16416 null
2025-04-23 Circinus: Efficient Query Planner for Compound ML Serving Banruo Liu et.al. 2504.16397 null
2025-04-23 Fast and Modular Whole-Body Lagrangian Dynamics of Legged Robots with Changing Morphology Sahand Farghdani et.al. 2504.16383 null
2025-04-23 SILM: A Subjective Intent Based Low-Latency Framework for Multiple Traffic Participants Joint Trajectory Prediction Qu Weiming et.al. 2504.16377 null
2025-04-23 Revisiting Radar Camera Alignment by Contrastive Learning for 3D Object Detection Linhua Kong et.al. 2504.16368 null
2025-04-22 PRIME: Fast Primal-Dual Feedback Optimization for Markets with Application to Optimal Power Flow Nicholas Julian Behr et.al. 2504.16048 link
2025-04-22 A Comparative and Measurement-Based Study on Real-Time Network KPI Extraction Methods for 5G and Beyond Applications Batuhan Kaplan et.al. 2504.16039 null
2025-04-22 LLMs meet Federated Learning for Scalable and Secure IoT Management Yazan Otoum et.al. 2504.16032 null
2025-04-22 LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale Joya Chen et.al. 2504.16030 null
2025-04-22 A UAV-Aided Digital Twin Framework for IoT Networks with High Accuracy and Synchronization Ghofran Khalaf et.al. 2504.15967 null
2025-04-22 FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation Zebin Yao et.al. 2504.15958 link
2025-04-22 Monocular inspection of spacecraft under illumination constraints and avoidance regions Tochukwu Elijah Ogri et.al. 2504.15954 null
2025-04-22 Real-time raw signal genomic analysis using fully integrated memristor hardware Peiyi He et.al. 2504.15934 link
2025-04-22 Learning the Spoofability of Limit Order Books With Interpretable Probabilistic Neural Networks Timothée Fabre et.al. 2504.15908 null
2025-04-22 RaSCL: Radar to Satellite Crossview Localization Blerim Abdullai et.al. 2504.15899 null
2025-04-22 An Extended Horizon Tactical Decision-Making for Automated Driving Based on Monte Carlo Tree Search Karim Essalmi et.al. 2504.15869 null
2025-04-22 Adaptive PCA-Based Outlier Detection for Multi-Feature Time Series in Space Missions Jonah Ekelund et.al. 2504.15846 null
2025-04-22 Characterization and ex vivo application of flexible 2D scintillating coatings in ultra-high dose rate electron beams for FLASH radiotherapy Verdi Vanreusel et.al. 2504.15824 null
2025-04-22 Microstructure and Manipulation: Quantifying Pump-and-Dump Dynamics in Cryptocurrency Markets Mahya Karbalaii et.al. 2504.15790 null
2025-04-22 Enhancing Tennis Training with Real-Time Swing Data Visualisation in Immersive Virtual Reality Ryan Najami et.al. 2504.15746 null
2025-04-22 You Sense Only Once Beneath: Ultra-Light Real-Time Underwater Object Detection Jun Dong et.al. 2504.15694 null
2025-04-22 Comparative Analysis of Evolutionary Algorithms for Energy-Aware Production Scheduling Sascha C Burmeister et.al. 2504.15672 null
2025-04-22 Symbolic Runtime Verification and Adaptive Decision-Making for Robot-Assisted Dressing Yasmin Rafiq et.al. 2504.15666 null
2025-04-22 Neural Kinematic Bases for Fluids Yibo Liu et.al. 2504.15657 null
2025-04-22 A Vision-Enabled Prosthetic Hand for Children with Upper Limb Disabilities Md Abdul Baset Sarker et.al. 2504.15654 null
2025-04-21 StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians Cailin Zhuang et.al. 2504.15281 null
2025-04-21 DRAWER: Digital Reconstruction and Articulation With Environment Realism Hongchi Xia et.al. 2504.15278 null
2025-04-21 Impulsive pattern recognition of a myoelectric hand via Dynamic Time Warping Mustafa Can Kadilar et.al. 2504.15256 null
2025-04-21 Scalable Discrete Event Simulation Tool for Large-Scale Cyber-Physical Energy Systems: Advancing System Efficiency and Scalability Khandaker Akramul Haque et.al. 2504.15198 null
2025-04-21 Time-Series Analysis on Edge-AI Hardware for Healthcare Monitoring Jinhai Hu et.al. 2504.15178 null
2025-04-21 Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture Meng Cui et.al. 2504.15171 null
2025-04-21 Neural ATTF: A Scalable Solution to Lifelong Multi-Agent Path Planning Kushal Shah et.al. 2504.15130 null
2025-04-21 A General Infrastructure and Workflow for Quadrotor Deep Reinforcement Learning and Reality Deployment Kangyao Huang et.al. 2504.15129 null
2025-04-21 Robust and Real-time Surface Normal Estimation from Stereo Disparities using Affine Transformations Csongor Csanad Kariko et.al. 2504.15121 null
2025-04-21 Muon Imaging of Hydrotreatment Towers Rafael Armando Martínez-Rivero et.al. 2504.15103 null
2025-04-21 NeuGaze: Reshaping the future BCI Yiqian Yang et.al. 2504.15101 link
2025-04-21 VistaDepth: Frequency Modulation With Bias Reweighting For Enhanced Long-Range Depth Estimation Mingxia Zhan et.al. 2504.15095 null
2025-04-21 Reconfiguration and Real-Time Operation of Networked Microgrids Under Load Uncertainty Hannah Moring et.al. 2504.15084 null
2025-04-21 Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides Jinghua Zhao et.al. 2504.15066 null
2025-04-21 Beyond Terabit/s Integrated Neuromorphic Photonic Processor for DSP-Free Optical Interconnects Benshan Wang et.al. 2504.15044 null
2025-04-21 Dual Utilization of Perturbation for Stream Data Publication under Local Differential Privacy Rong Du et.al. 2504.14993 null
2025-04-21 3D Gaussian Head Avatars with Expressive Dynamic Appearances by Compact Tensorial Representations Yating Wang et.al. 2504.14967 null
2025-04-21 Dynamic Graph-Like Learning with Contrastive Clustering on Temporally-Factored Ship Motion Data for Imbalanced Sea State Estimation in Autonomous Vessel Kexin Wang et.al. 2504.14907 null
2025-04-21 Distributed Time-Varying Gaussian Regression via Kalman Filtering Nicola Taddei et.al. 2504.14900 link
2025-04-21 Physics-Aware Compression of Plasma Distribution Functions with GPU-Accelerated Gaussian Mixture Models Andong Hu et.al. 2504.14897 null
2025-04-18 ChatNekoHacker: Real-Time Fan Engagement with Conversational Agents Takuya Sera et.al. 2504.13793 null
2025-04-18 Equi-Euler GraphNet: An Equivariant, Temporal-Dynamics Informed Graph Neural Network for Dual Force and Trajectory Prediction in Multi-Body Systems Vinay Sharma et.al. 2504.13768 null
2025-04-18 Realizing string breaking dynamics in a $Z_2$ lattice gauge theory on quantum hardware Constantia Alexandrou et.al. 2504.13760 null
2025-04-18 Intelligent Interaction Strategies for Context-Aware Cognitive Augmentation Xiangrong et.al. 2504.13684 null
2025-04-18 Lightweight LiDAR-Camera 3D Dynamic Object Detection and Multi-Class Trajectory Prediction Yushen He et.al. 2504.13647 link
2025-04-18 SupResDiffGAN a new approach for the Super-Resolution task Dawid Kopeć et.al. 2504.13622 null
2025-04-18 FocusTrack: A Self-Adaptive Local Sampling Algorithm for Efficient Anti-UAV Tracking Ying Wang et.al. 2504.13604 link
2025-04-18 Memristive chaotic circuit for information processing through time Manuel Escudero et.al. 2504.13600 null
2025-04-18 RAG Without the Lag: Interactive Debugging for Retrieval-Augmented Generation Pipelines Quentin Romero Lauro et.al. 2504.13587 null
2025-04-18 Estimating constraints on cosmological parameters via the canonical and the differential redshift drift with SKA HI 21-cm observations Jiangang Kang et.al. 2504.13583 null
2025-04-18 MAAM: A Lightweight Multi-Agent Aggregation Module for Efficient Image Classification Based on the MindSpore Framework Zhenkai Qin et.al. 2504.13574 null
2025-04-18 Enhancing Multilingual Sentiment Analysis with Explainability for Sinhala, English, and Code-Mixed Content Azmarah Rizvi et.al. 2504.13545 null
2025-04-18 Can Local Representation Alignment RNNs Solve Temporal Tasks? Nikolay Manchev et.al. 2504.13531 null
2025-04-18 Neural Ganglion Sensors: Learning Task-specific Event Cameras Inspired by the Neural Circuit of the Human Retina Haley M. So et.al. 2504.13457 null
2025-04-18 RT-HDIST: Ray-Tracing Core-based Hausdorff Distance Computation YoungWoo Kim et.al. 2504.13436 null
2025-04-18 POET: Supporting Prompting Creativity and Personalization with Automated Expansion of Text-to-Image Generation Evans Xu Han et.al. 2504.13392 null
2025-04-17 Multi-Sensor Fusion-Based Mobile Manipulator Remote Control for Intelligent Smart Home Assistance Xiao Jin et.al. 2504.13370 null
2025-04-17 AI-Empowered Integrated Sensing and Communications Mojtaba Vaezi et.al. 2504.13363 null
2025-04-17 Physical Reservoir Computing in Hook-Shaped Rover Wheel Spokes for Real-Time Terrain Identification Xiao Jin et.al. 2504.13348 null
2025-04-17 Adaptive AI decision interface for autonomous electronic material discovery Yahao Dai et.al. 2504.13344 null
2025-04-17 Should We Tailor the Talk? Understanding the Impact of Conversational Styles on Preference Elicitation in Conversational Recommender Systems Ivica Kostric et.al. 2504.13095 link
2025-04-17 EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance Yang Yue et.al. 2504.13065 link
2025-04-17 Pose and Facial Expression Transfer by using StyleGAN Petr Jahoda et.al. 2504.13021 null
2025-04-17 GSAC: Leveraging Gaussian Splatting for Photorealistic Avatar Creation with Unity Integration Rendong Zhang et.al. 2504.12999 link
2025-04-17 New Frontiers in Muon-Spin Spectroscopy Using Si-Pixel Detectors Heiko Augustin et.al. 2504.12993 null
2025-04-17 Efficient Chebyshev Reconstruction for the Anisotropic Equilibrium Model in Magnetic Particle Imaging Christine Droigk et.al. 2504.12981 null
2025-04-17 Real-time High-fidelity Gaussian Human Avatars with Position-based Interpolation of Spatially Distributed MLPs Youyi Zhan et.al. 2504.12909 link
2025-04-17 Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation Yuyang Li et.al. 2504.12908 link
2025-04-17 Market-Driven Flexibility Provision: A Tri-Level Optimization Approach for Carbon Reduction Shijie Pan et.al. 2504.12877 null
2025-04-17 AAA-Gaussians: Anti-Aliased and Artifact-Free 3D Gaussian Rendering Michael Steiner et.al. 2504.12811 null
2025-04-17 Distributed Intelligent Sensing and Communications for 6G: Architecture and Use Cases Kyriakos Stylianopoulos et.al. 2504.12765 null
2025-04-17 Biasing the Driving Style of an Artificial Race Driver for Online Time-Optimal Maneuver Planning Sebastiano Taddei et.al. 2504.12744 null
2025-04-17 Chinese-Vicuna: A Chinese Instruction-following Llama-based Model Chenghao Fan et.al. 2504.12737 null
2025-04-17 Incorporating a Deep Neural Network into Moving Horizon Estimation for Embedded Thermal Torque Derating of an Electric Machine Alexander Winkler et.al. 2504.12736 null
2025-04-17 Embodied Neuromorphic Control Applied on a 7-DOF Robotic Manipulator Ziqi Wang et.al. 2504.12702 link
2025-04-17 Predicting Driver's Perceived Risk: a Model Based on Semi-Supervised Learning Strategy Siwei Huang et.al. 2504.12665 null
2025-04-17 Autonomous Drone for Dynamic Smoke Plume Tracking Srijan Kumar Pal et.al. 2504.12664 null
2025-04-17 AdaptoVision: A Multi-Resolution Image Recognition Model for Robust and Scalable Classification Md. Sanaullah Chowdhury Lameya Sabrin et.al. 2504.12652 null
2025-04-17 Observation of the Axion quasiparticle in 2D MnBi $_2$Te$_4$ Jian-Xiang Qiu et.al. 2504.12572 null
2025-04-17 Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions Yifei Dong et.al. 2504.11967 null
2025-04-17 Real-Time Reconstruction of Ground Motion During Small Magnitude Earthquakes: A Pilot Study Youngkyu Kim et.al. 2504.11752 null
2025-04-16 Decision-based AI Visual Navigation for Cardiac Ultrasounds Andy Dimnaku et.al. 2504.12535 null
2025-04-16 SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians Liam Schoneveld et.al. 2504.12292 null
2025-04-16 An Evaluation of N-Gram Selection Strategies for Regular Expression Indexing in Contemporary Text Analysis Tasks Ling Zhang et.al. 2504.12251 link
2025-04-16 Data Assimilation for Robust UQ Within Agent-Based Simulation on HPC Systems Adam Spannaus et.al. 2504.12228 null
2025-04-16 Deep Generative Models for Bayesian Inference on High-Rate Sensor Data: Applications in Automotive Radar and Medical Imaging Tristan S. W. Stevens et.al. 2504.12154 null
2025-04-16 GripMap: An Efficient, Spatially Resolved Constraint Framework for Offline and Online Trajectory Planning in Autonomous Racing Frederik Werner et.al. 2504.12115 null
2025-04-16 Self-Supervised Traversability Learning with Online Prototype Adaptation for Off-Road Autonomous Driving Yafeng Bu et.al. 2504.12109 null
2025-04-16 A Review of YOLOv12: Attention-Based Enhancements vs. Previous Versions Rahima Khanam et.al. 2504.11995 null
2025-04-16 The Evolution of Zero Trust Architecture (ZTA) from Concept to Implementation Md Nasiruzzaman et.al. 2504.11984 null
2025-04-16 Flow Intelligence: Robust Feature Matching via Temporal Signature Correlation Jie Wang et.al. 2504.11949 null
2025-04-16 Mind2Matter: Creating 3D Models from EEG Signals Xia Deng et.al. 2504.11936 link
2025-04-16 Broadening Participation through Physical Computing: Replicating Sensor-Based Programming Workshops for Rural Students in Sri Lanka Poornima Meegammana et.al. 2504.11913 null
2025-04-16 Trajectory Dispersion Control for Precision Landing Guidance of Reusable Rockets Xinglun Chen et.al. 2504.11894 null
2025-04-16 Real-Time Shape Estimation of Tensegrity Structures Using Strut Inclination Angles Tufail Ahmad Bhat et.al. 2504.11868 null
2025-04-16 Network-Integrated Decoding System for Real-Time Quantum Error Correction with Lattice Surgery Namitha Liyanage et.al. 2504.11805 null
2025-04-16 TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion Yiran Wang et.al. 2504.11773 null
2025-04-16 Polarisation-Inclusive Spiking Neural Networks for Real-Time RFI Detection in Modern Radio Telescopes Nicholas J. Pritchard et.al. 2504.11720 link
2025-04-16 A New Paradigm of User-Centric Wireless Communication Driven by Large Language Models Kuiyuan Ding et.al. 2504.11696 null
2025-04-16 3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians Zeming Wei et.al. 2504.11218 link
2025-04-16 Efficient Distributed Retrieval-Augmented Generation for Enhancing Language Model Performance Shangyu Liu et.al. 2504.11197 null
2025-04-16 A Real-time Anomaly Detection Method for Robots based on a Flexible and Sparse Latent Space Taewook Kang et.al. 2504.11170 null
2025-04-15 Real-time Object and Event Detection Service through Computer Vision and Edge Computing Marcos Mendes et.al. 2504.11662 null
2025-04-15 TextArena Leon Guertler et.al. 2504.11442 link
2025-04-15 Predicting Wave Dynamics using Deep Learning with Multistep Integration Inspired Attention and Physics-Based Loss Decomposition Indu Kant Deo et.al. 2504.11433 null
2025-04-15 HeatSense: Intelligent Thermal Anomaly Detection for Securing NoC-Enabled MPSoCs Mahdi Hasanzadeh et.al. 2504.11421 null
2025-04-15 Sensitivity Analysis of State Space Models for Scrap Composition Estimation in EAF and BOF Yiqing Zhou et.al. 2504.11319 null
2025-04-15 Hybrid Compton-PET Imaging for ion-range verification:A Preclinical Study for Proton-, Helium-, and Carbon-Therapy at HIT Javier Balibrea-Correa et.al. 2504.11273 null
2025-04-15 Enhanced Small Target Detection via Multi-Modal Fusion and Attention Mechanisms: A YOLOv5 Approach Xiaoxiao Ma et.al. 2504.11262 null
2025-04-15 Focal Split: Untethered Snapshot Depth from Differential Defocus Junjie Luo et.al. 2504.11202 null
2025-04-15 QAMA: Quantum annealing multi-head attention operator with classical deep learning framework Peng Du et.al. 2504.11083 null
2025-04-15 Intraoperative perfusion assessment by continuous, low-latency hyperspectral light-field imaging: development, methodology, and clinical application Stefan Kray et.al. 2504.10953 null
2025-04-15 A Signal Matrix-Based Local Flaw Detection Framework for Steel Wire Ropes Using Convolutional Neural Networks Siyu You et.al. 2504.10952 null
2025-04-15 Design and Verification of a Synchronus First In First Out (FIFO) Yatheeswar Penta et.al. 2504.10901 null
2025-04-15 ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic Grasping Shun Iwase et.al. 2504.10857 null
2025-04-15 Real-Time Word-Level Temporal Segmentation in Streaming Speech Recognition Naoto Nishida et.al. 2504.10849 null
2025-04-15 LightFormer: A lightweight and efficient decoder for remote sensing image segmentation Sihang Chen et.al. 2504.10834 null
2025-04-15 Hallucination-Aware Generative Pretrained Transformer for Cooperative Aerial Mobility Control Hyojun Ahn et.al. 2504.10831 null
2025-04-15 SonicSieve: Bringing Directional Speech Extraction to Smartphones Using Acoustic Microstructures Kuang Yuan et.al. 2504.10793 null
2025-04-15 ATLASv2: LLM-Guided Adaptive Landmark Acquisition and Navigation on the Edge Mikolaj Walczak et.al. 2504.10784 null
2025-04-15 Diversity-Fair Online Selection Ming Hu et.al. 2504.10389 null
2025-04-15 LL-Gaussian: Low-Light Scene Reconstruction and Enhancement via Gaussian Splatting for Novel View Synthesis Hao Sun et.al. 2504.10331 null
2025-04-15 WildLive: Near Real-time Visual Wildlife Tracking onboard UAVs Nguyen Ngoc Dat et.al. 2504.10165 null
2025-04-15 TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models Jaewoo Lee et.al. 2504.09897 link
2025-04-14 DNF-Avatar: Distilling Neural Fields for Real-time Animatable Avatar Relighting Zeren Jiang et.al. 2504.10486 link
2025-04-14 HybridCollab: Unifying In-Person and Remote Collaboration for Cardiovascular Surgical Planning in Mobile Augmented Reality Pratham Darrpan Mehta et.al. 2504.10440 null
2025-04-14 Towards Low-Latency Event-based Obstacle Avoidance on a FPGA-Drone Pietro Bonazzi et.al. 2504.10400 link
2025-04-14 Patch and Shuffle: A Preprocessing Technique for Texture Classification in Autonomous Cementitious Fabrication Jeremiah Giordani et.al. 2504.10353 null
2025-04-14 SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model Zongcan Ding et.al. 2504.10320 null
2025-04-14 CAT: A Conditional Adaptation Tailor for Efficient and Effective Instance-Specific Pansharpening on Real-World Data Tianyu Xin et.al. 2504.10242 null
2025-04-14 ROSFD: Robust Online Streaming Fraud Detection with Resilience to Concept Drift in Data Streams Vivek Yelleti et.al. 2504.10229 null
2025-04-14 Unleashing Expert Opinion from Social Media for Stock Prediction Wanyun Zhou et.al. 2504.10078 link
2025-04-14 DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction Kiana Hoshanfar et.al. 2504.10070 null
2025-04-14 Time for Timed Monitorability Thomas M. Grosen et.al. 2504.10008 null
2025-04-14 VR MRI Training for Adolescents: A Comparative Study of Gamified VR, Passive VR, 360 Video, and Traditional Educational Video Yue Yang et.al. 2504.09955 null
2025-04-14 Efficient Task-specific Conditional Diffusion Policies: Shortcut Model Acceleration and SO(3) Optimization Haiyong Yu et.al. 2504.09927 null
2025-04-14 Fusing Bluetooth with Pedestrian Dead Reckoning: A Floor Plan-Assisted Positioning Approach Wenxuan Pan et.al. 2504.09905 null
2025-04-14 LiteTracker: Leveraging Temporal Causality for Accurate Low-latency Tissue Tracking Mert Asim Karaoglu et.al. 2504.09904 null
2025-04-14 MCBlock: Boosting Neural Radiance Field Training Speed by MCTS-based Dynamic-Resolution Ray Sampling Yunpeng Tan et.al. 2504.09878 null
2025-04-14 CKMImageNet: A Dataset for AI-Based Channel Knowledge Map Towards Environment-Aware Communication and Sensing Zijian Wu et.al. 2504.09849 null
2025-04-14 RINGO: Real-time Navigation with a Guiding Trajectory for Aerial Manipulators in Unknown Environments Zhaopeng Zhang et.al. 2504.08338 null
2025-04-11 TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning Hang Ni et.al. 2504.08694 null
2025-04-11 Safe Flow Matching: Robot Motion Planning with Control Barrier Functions Xiaobing Dai et.al. 2504.08661 null
2025-04-11 TinyCenterSpeed: Efficient Center-Based Object Detection for Autonomous Racing Neil Reichlin et.al. 2504.08655 link
2025-04-11 Enhancing Neutrino Reconstruction in Water-Cherenkov Air Shower Arrays Using Multi-Photosensors J. Alvarez-Muñiz et.al. 2504.08652 null
2025-04-11 TorchFX: A modern approach to Audio DSP with PyTorch and GPU acceleration Matteo Spanio et.al. 2504.08624 link
2025-04-11 Enterprise-Grade Security for the Model Context Protocol (MCP): Frameworks and Mitigation Strategies Vineeth Sai Narajala et.al. 2504.08623 null
2025-04-11 FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment Sebastián Barbas Laina et.al. 2504.08603 null
2025-04-11 POD-Based Sparse Stochastic Estimation of Wind Turbine Blade Vibrations Lorenzo Schena et.al. 2504.08505 null
2025-04-11 AI-Driven Smart Sportswear for Real-Time Fitness Monitoring Using Textile Strain Sensors Chenyu Tang et.al. 2504.08500 null
2025-04-11 A Comparative Study of Recommender Systems under Big Data Constraints Arimondo Scrivano et.al. 2504.08457 null
2025-04-11 Muon-Accelerated Attention Distillation for Real-Time Edge Synthesis via Optimized Latent Diffusion Weiye Chen et.al. 2504.08451 link
2025-04-11 The Composite Visual-Laser Navigation Method Applied in Indoor Poultry Farming Environments Jiafan Lu et.al. 2504.08431 null
2025-04-11 Light-YOLOv8-Flame: A Lightweight High-Performance Flame Detection Algorithm Jiawei Lan et.al. 2504.08389 null
2025-04-11 MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft Junliang Guo et.al. 2504.08388 null
2025-04-11 PCA-RAG: Principal Component Analysis for Efficient Retrieval-Augmented Generation Arman Khaledian et.al. 2504.08386 null
2025-04-11 DRIP: DRop unImportant data Points -- Enhancing Machine Learning Efficiency with Grad-CAM-Based Real-Time Data Prioritization for On-Device Training Marcus Rüb et.al. 2504.08364 null
2025-04-11 Trabant: A Serverless Architecture for Multi-Tenant Orbital Edge Computing Tobias Pfandzelter et.al. 2504.08337 link
2025-04-11 Towards a Digital Twin of Noisy Quantum Computers: Calibration-Driven Emulation of Transmon Qubits Ronny Müller et.al. 2504.08313 null
2025-04-11 Gigabit-rate Quantum Key Distribution on Integrated Photonic Chips Si Qi Ng et.al. 2504.08298 null
2025-04-10 A Review of HPC-Accelerated CFD in National Security and Defense James Afful et.al. 2504.07837 null
2025-04-10 A Hybrid Semantic RAN Protocol Stack Design for 6G System and Its Implementation Luhan wang et.al. 2504.07829 null
2025-04-10 MMLA: Multi-Environment, Multi-Species, Low-Altitude Aerial Footage Dataset Jenna Kline et.al. 2504.07744 null
2025-04-10 A Novel Deep Learning Approach for Emulating Computationally Expensive Postfire Debris Flows Palak Patel et.al. 2504.07736 null
2025-04-10 Finite-temperature real-time properties of magnetic polarons in two-dimensional quantum antiferromagnets Toni Guthardt et.al. 2504.07715 null
2025-04-10 Heart Failure Prediction using Modal Decomposition and Masked Autoencoders for Scarce Echocardiography Databases Andrés Bell-Navas et.al. 2504.07606 link
2025-04-10 Tuning chirality amplitude at ultrafast timescales Hiroki Ueda et.al. 2504.07599 null
2025-04-10 MUFFLER: Secure Tor Traffic Obfuscation with Dynamic Connection Shuffling and Splitting Minjae Seo et.al. 2504.07543 null
2025-04-10 Intelligent DoS and DDoS Detection: A Hybrid GRU-NTM Approach to Network Security Caroline Panggabean et.al. 2504.07478 null
2025-04-10 Nonlinear Optimal Guidance for Intercepting Moving Targets Han Wang et.al. 2504.07430 null
2025-04-10 ThermoStereoRT: Thermal Stereo Matching in Real Time via Knowledge Distillation and Attention-based Refinement Anning Hu et.al. 2504.07418 null
2025-04-10 WK-Pnet: FM-Based Positioning via Wavelet Packet Decomposition and Knowledge Distillation Shilian Zheng et.al. 2504.07399 null
2025-04-10 MicroNAS: An Automated Framework for Developing a Fall Detection System Seyed Mojtaba Mohasel et.al. 2504.07397 null
2025-04-09 CiMBA: Accelerating Genome Sequencing through On-Device Basecalling via Compute-in-Memory William Andrew Simon et.al. 2504.07298 null
2025-04-09 Data-Enabled Neighboring Extremal: Case Study on Model-Free Trajectory Tracking for Robotic Arm Amin Vahidi-Moghaddam et.al. 2504.07292 null
2025-04-09 Enabling Continuous 5G Connectivity in Aircraft through Low Earth Orbit Satellites Raúl Parada et.al. 2504.07262 null
2025-04-09 Visual-Aware Speech Recognition for Noisy Scenarios Lakshmipathi Balaji et.al. 2504.07229 null
2025-04-09 Discovery of extreme Quasi-Periodic Eruptions in a newly accreting massive black hole Lorena Hernández-García et.al. 2504.07169 null
2025-04-09 OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Jiacheng Liu et.al. 2504.07096 null
2025-04-09 FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution Gene Chou et.al. 2504.07093 link
2025-04-09 Cerebral blood flow monitoring using a deep learning implementation of the two-layer DCS analytical model with a 512 512 SPAD array Mingliang Pan et.al. 2504.06997 null
2025-04-09 Audio-visual Event Localization on Portrait Mode Short Videos Wuyang Liu et.al. 2504.06884 null
2025-04-09 Determining Fetal Orientations From Blind Sweep Ultrasound Video Jakub Maciej Wiśniewski et.al. 2504.06836 null
2025-04-09 Integrated Sensing and Communications Over the Years: An Evolution Perspective Di Zhang et.al. 2504.06830 null
2025-04-09 SVG-IR: Spatially-Varying Gaussian Splatting for Inverse Rendering Hanxiao Sun et.al. 2504.06815 link
2025-04-09 Modeling and analysis methods for early detection of leakage points in gas transmission systems Ilgar Aliyev et.al. 2504.06809 null
2025-04-09 How do Copilot Suggestions Impact Developers' Frustration and Productivity? Emanuela Guglielmi et.al. 2504.06808 null
2025-04-09 Controllable Automatic Foley Artist Roi Benita et.al. 2504.06778 link
2025-04-09 Bridging Research and Standardization: Innovations and Methodology for 6G Standard Contributions Francesca Conserva et.al. 2504.06682 null
2025-04-09 Dynamic Residual Safe Reinforcement Learning for Multi-Agent Safety-Critical Scenarios Decision-Making Kaifeng Wang et.al. 2504.06670 null
2025-04-09 Robust and Noise-resilient Long-Term Prediction of Spatiotemporal Data Using Variational Mode Graph Neural Networks with 3D Attention Osama Ahmad et.al. 2504.06660 null
2025-04-09 InstantSticker: Realistic Decal Blending via Disentangled Object Reconstruction Yi Zhang et.al. 2504.06620 null
2025-04-09 InteractRank: Personalized Web-Scale Search Pre-Ranking with Cross Interaction Features Sujay Khandagale et.al. 2504.06609 link
2025-04-09 Overcoming Dynamic Environments: A Hybrid Approach to Motion Planning for Manipulators Ho Minh Quang Ngo et.al. 2504.06596 null
2025-04-09 NAPER: Fault Protection for Real-Time Resource-Constrained Deep Neural Networks Rian Adam Rajagede et.al. 2504.06591 null
2025-04-09 A Streamable Neural Audio Codec with Residual Scalar-Vector Quantization for Real-Time Communication Xiao-Hang Jiang et.al. 2504.06561 link
2025-04-09 ICPS: Real-Time Resource Configuration for Cloud Serverless Functions Considering Affinity Long Chen et.al. 2504.06512 null
2025-04-09 Equivalent Circuit Modeling of a Lumped-element Loaded Metasurface under Arbitrary Incidence and Polarization Athanasios Nousiou et.al. 2504.06501 null
2025-04-08 A Case for Network-wide Orchestration of Host-based Intrusion Detection and Response Mark Timmons et.al. 2504.06241 null
2025-04-08 Accessible and Pedagogically-Grounded Explainability for Human-Robot Interaction: A Framework Based on UDL and Symbolic Interfaces Francisco J. Rodríguez Lera et.al. 2504.06189 link
2025-04-08 Efficient algorithms to solve atom reconfiguration problems. III. The bird and batching algorithms and other parallel implementations on GPUs Fouad Afiouni et.al. 2504.06182 null
2025-04-08 Real-Time Pitch/F0 Detection Using Spectrogram Images and Convolutional Neural Networks Xufang Zhao et.al. 2504.06165 null
2025-04-08 Accelerating Vehicle Routing via AI-Initialized Genetic Algorithms Ido Greenberg et.al. 2504.06126 null
2025-04-08 Safe Interaction via Monte Carlo Linear-Quadratic Games Benjamin A. Christie et.al. 2504.06124 link
2025-04-08 A Robust Real-Time Lane Detection Method with Fog-Enhanced Feature Fusion for Foggy Conditions Ronghui Zhang et.al. 2504.06121 null
2025-04-08 Real-Time LaCAM Runzhe Liang et.al. 2504.06091 null
2025-04-08 $L_\textrm{dT}$ : An ionospheric activity index based on distributions in GNSS-derived TEC rates of change Paul Kinsler et.al. 2504.06056 null
2025-04-08 Modular Soft Wearable Glove for Real-Time Gesture Recognition and Dynamic 3D Shape Reconstruction Huazhi Dong et.al. 2504.05983 null
2025-04-08 An Empirical Study of GPT-4o Image Generation Capabilities Sixiang Chen et.al. 2504.05979 link
2025-04-08 Context-aware Rate Adaptation for Predictive Flying Networks using Contextual Bandits Ruben Queiros et.al. 2504.05964 null
2025-04-08 Hybrid Control as a Proxy for Detection and Mitigation of Sensor Attacks in Cooperative Driving Mischa Huisman et.al. 2504.05958 link
2025-04-08 InstructMPC: A Human-LLM-in-the-Loop Framework for Context-Aware Control Ruixiang Wu et.al. 2504.05946 null
2025-04-08 Réduire le bruit grâce à la réalité augmentée sonore -- Auditory Concealer Clara Boukhemia et.al. 2504.05847 null
2025-04-08 Negotiating Strict Latency Limits for Dynamic Real-Time Services in Vehicular Time-Sensitive Networks Timo Häckel et.al. 2504.05793 null
2025-04-08 Residual U-Net for accurate and efficient prediction of hemodynamics in two-dimensional asymmetric stenosis Xintong Zou et.al. 2504.05778 null
2025-04-08 A Lightweight Multi-Module Fusion Approach for Korean Character Recognition Inho Jake Park et.al. 2504.05770 null
2025-04-08 Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation Zhihua Xu et.al. 2504.05746 null
2025-04-08 Micro-splatting: Maximizing Isotropic Constraints for Refined Optimization in 3D Gaussian Splatting Jee Won Lee et.al. 2504.05740 null
2025-04-08 REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning Jihyun Lee et.al. 2504.04956 null
2025-04-07 Using Physiological Measures, Gaze, and Facial Expressions to Model Human Trust in a Robot Partner Haley N. Green et.al. 2504.05291 null
2025-04-07 RobustDexGrasp: Robust Dexterous Grasping of General Objects from Single-view Perception Hui Zhang et.al. 2504.05287 null
2025-04-07 A Telecentric Offset Reflective Imaging System (TORIS) for Terahertz Imaging and Spectroscopy Pouyan Rezapoor et.al. 2504.05267 null
2025-04-07 From Sparse Signal to Smooth Motion: Real-Time Motion Generation with Rolling Prediction Models German Barquero et.al. 2504.05265 null
2025-04-07 Vision-Language Model Predictive Control for Manipulation Planning and Trajectory Generation Jiaming Chen et.al. 2504.05225 link
2025-04-07 LLM-Alignment Live-Streaming Recommendation Yueyang Liu et.al. 2504.05217 null
2025-04-07 Post-Training Language Models for Continual Relation Extraction Sefika Efeoglu et.al. 2504.05214 null
2025-04-07 Stereo-LiDAR Fusion by Semi-Global Matching With Discrete Disparity-Matching Cost and Semidensification Yasuhiro Yao et.al. 2504.05148 link
2025-04-07 Decentralized Semantic Federated Learning for Real-Time Public Safety Tasks: Challenges, Methods, and Directions Baosheng Li et.al. 2504.05107 null
2025-04-07 SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation Stephen Brade et.al. 2504.05106 null
2025-04-07 Speech-to-Trajectory: Learning Human-Like Verbal Guidance for Robot Motion Eran Beeri Bamani et.al. 2504.05084 null
2025-04-07 AI-Driven Tactical Communications and Networking for Defense: A Survey and Emerging Trends Victor Monzon Baeza et.al. 2504.05071 null
2025-04-07 SILVIA: Ultra-precision formation flying demonstration for space-based interferometry Takahiro Ito et.al. 2504.05001 null
2025-04-07 Transforming Future Data Center Operations and Management via Physical AI Zhiwei Cao et.al. 2504.04982 null
2025-04-07 Boosting Relational Deep Learning with Pretrained Tabular Models Veronica Lachi et.al. 2504.04934 link
2025-04-07 Real-time tuneable bright bonding plasmonic modes in Ga nanostructures Renu Raman Sahu et.al. 2504.04922 null
2025-04-07 Parallelization is All System Identification Needs: End-to-end Vibration Diagnostics on a multi-core RISC-V edge device Amirhossein Kiamarzi et.al. 2504.04884 null
2025-04-07 Closed-Loop Neural Operator-Based Observer of Traffic Density Alice Harting et.al. 2504.04873 null
2025-04-07 Embracing Dynamics: Dynamics-aware 4D Gaussian Splatting SLAM Zhicong Sun et.al. 2504.04844 link
2025-04-04 CAMINO: Cloud-native Autonomous Management and Intent-based Orchestrator Konstantinos Antonakoglou et.al. 2504.03586 null
2025-04-04 The building blocks of software work explain coding careers and language popularity Xiangnan Feng et.al. 2504.03581 null
2025-04-04 Online Traffic Density Estimation using Physics-Informed Neural Networks Dennis Wilkman et.al. 2504.03483 null
2025-04-04 DML-RAM: Deep Multimodal Learning Framework for Robotic Arm Manipulation using Pre-trained Models Sathish Kumar et.al. 2504.03423 null
2025-04-04 NeRFlex: Resource-aware Real-time High-quality Rendering of Complex Scenes on Mobile Devices Zhe Wang et.al. 2504.03415 null
2025-04-04 An Efficient GPU-based Implementation for Noise Robust Sound Source Localization Zirui Lin et.al. 2504.03373 null
2025-04-04 Stance-Driven Multimodal Controlled Statement Generation: New Dataset and Task Bingqian Wang et.al. 2504.03295 null
2025-04-04 Mitigating the Impact of Electrode Shift on Classification Performance in Electromyography-Based Motion Prediction Using Sliding-Window Normalization Taichi Tanaka et.al. 2504.03196 null
2025-04-04 Real-Time Roadway Obstacle Detection for Electric Scooters Using Deep Learning and Multi-Sensor Fusion Zeyang Zheng et.al. 2504.03171 link
2025-04-04 Water Mapping and Change Detection Using Time Series Derived from the Continuous Monitoring of Land Disturbance Algorithm Huong Pham et.al. 2504.03170 null
2025-04-04 Performance-Aware Control of Modular Batteries For Fast Frequency Response Yutong He et.al. 2504.03150 null
2025-04-04 A Human Digital Twin Architecture for Knowledge-based Interactions and Context-Aware Conversations Abdul Mannan Mohammed et.al. 2504.03147 null
2025-04-04 Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation Fa-Ting Hong et.al. 2504.02542 link
2025-04-03 Compressing 3D Gaussian Splatting by Noise-Substituted Vector Quantization Haishan Wang et.al. 2504.03059 link
2025-04-03 Cooperative Inference for Real-Time 3D Human Pose Estimation in Multi-Device Edge Networks Hyun-Ho Choi et.al. 2504.03052 link
2025-04-03 Emotion Recognition Using Convolutional Neural Networks Shaoyuan Xu et.al. 2504.03010 null
2025-04-03 Generating Diverse Audio-Visual 360 Soundscapes for Sound Event Localization and Detection Adrian S. Roman et.al. 2504.02988 link
2025-04-03 Level Up Peer Review in Education: Investigating genAI-driven Gamification system and its influence on Peer Feedback Effectiveness Rafal Wlodarski et.al. 2504.02962 null
2025-04-03 LiDAR-based Object Detection with Real-time Voice Specifications Anurag Kulkarni et.al. 2504.02920 link
2025-04-03 Bubbles in a box: Eliminating edge nucleation in cold-atom simulators of vacuum decay Alexander C. Jenkins et.al. 2504.02829 null
2025-04-03 Dynamic Directional Routing of Freight in the Physical Internet Sahrish Jaleel Shaikh et.al. 2504.02722 null
2025-04-03 UAV-Assisted 5G Networks: Mobility-Aware 3D Trajectory Optimization and Resource Allocation for Dynamic Environments Asad Mahmood et.al. 2504.02613 null
2025-04-03 Human-Centered Development of an Explainable AI Framework for Real-Time Surgical Risk Surveillance Andrea E Davidson et.al. 2504.02551 null
2025-04-03 Online Multivariate Regularized Distributional Regression for High-dimensional Probabilistic Electricity Price Forecasting Simon Hirsch et.al. 2504.02518 link
2025-04-03 Industrial Internet Robot Collaboration System and Edge Computing Optimization Qian Zuo et.al. 2504.02492 null
2025-04-03 Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision Xiaofeng Han et.al. 2504.02477 null
2025-04-03 MonoGS++: Fast and Accurate Monocular RGB Gaussian SLAM Renwu Li et.al. 2504.02437 null
2025-04-03 OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication Zhongjian Wang et.al. 2504.02433 null
2025-04-03 **Life

About

🎓 Update HumanAIGC related papers from ArXiv daily

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%