- Talking Face
- Image Animation
- Video Generation
- TryOn
- Visual Edit
- Others
- Music2Dance and Co-speech
- Speech and Interaction
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-06-24 | Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router | Yubo Huang et.al. | 2506.19833 | null |
2025-06-23 | OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation | Qijun Gan et.al. | 2506.18866 | null |
2025-06-17 | SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting | Ziqiao Peng et.al. | 2506.14742 | null |
2025-06-17 | Compressed Video Super-Resolution based on Hierarchical Encoding | Yuxuan Jiang et.al. | 2506.14381 | null |
2025-06-16 | Audio-Visual Driven Compression for Low-Bitrate Talking Head Videos | Riku Takahashi et.al. | 2506.13419 | null |
2025-06-15 | iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer | Zhelun Shen et.al. | 2506.12847 | null |
2025-06-13 | ICME 2025 Grand Challenge on Video Super-Resolution for Video Conferencing | Babak Naderi et.al. | 2506.12269 | link |
2025-06-10 | HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation | Ziyao Huang et.al. | 2506.08797 | null |
2025-06-03 | NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results | Xiaohong Liu et.al. | 2506.02875 | null |
2025-06-02 | Cocktail-Party Audio-Visual Speech Recognition | Thai-Binh Nguyen et.al. | 2506.02178 | null |
2025-06-02 | Low-Rank Head Avatar Personalization with Registers | Sai Tanmay Reddy Chakkera et.al. | 2506.01935 | null |
2025-06-02 | Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation | Yuan Gan et.al. | 2506.01591 | link |
2025-06-01 | SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers | Zhengcong Fei et.al. | 2506.00830 | null |
2025-05-30 | TalkingHeadBench: A Multi-Modal Benchmark & Analysis of Talking-Head DeepFake Detection | Xinqi Xiong et.al. | 2505.24866 | null |
2025-05-29 | Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion Modulation | Jiahao Cui et.al. | 2505.23525 | link |
2025-05-29 | Video Editing for Audio-Visual Dubbing | Binyamin Manela et.al. | 2505.23406 | link |
2025-05-29 | Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation | Hao Li et.al. | 2505.23290 | link |
2025-05-29 | MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation | Siyuan Wang et.al. | 2505.23120 | link |
2025-05-28 | Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation | Zhe Kong et.al. | 2505.22647 | link |
2025-05-28 | Tell me Habibi, is it Real or Fake? | Kartik Kuckreja et.al. | 2505.22581 | null |
2025-05-28 | Neural Face Skinning for Mesh-agnostic Facial Expression Cloning | Sihun Cha et.al. | 2505.22416 | null |
2025-05-28 | FaceEditTalker: Interactive Talking Head Generation with Facial Attribute Editing | Guanwen Feng et.al. | 2505.22141 | null |
2025-05-28 | RESOUND: Speech Reconstruction from Silent Videos via Acoustic-Semantic Decomposed Modeling | Long-Khanh Pham et.al. | 2505.22024 | null |
2025-05-27 | OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers | Ziqiao Peng et.al. | 2505.21448 | null |
2025-05-26 | Total-Editing: Head Avatar with Editable Appearance, Motion, and Lighting | Yizhou Zhao et.al. | 2505.20582 | null |
2025-05-26 | DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations | Ziqiao Peng et.al. | 2505.18096 | null |
2025-05-22 | Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis | Radek Daněček et.al. | 2504.13386 | null |
2025-05-14 | Test-Time Augmentation for Pose-invariant Face Recognition | Jaemin Jung et.al. | 2505.09256 | null |
2025-05-10 | VTutor: An Animated Pedagogical Agent SDK that Provide Real Time Multi-Model Feedback | Eason Chen et.al. | 2505.06676 | null |
2025-05-10 | OT-Talk: Animating 3D Talking Head with Optimal Transportation | Xinmu Wang et.al. | 2505.01932 | null |
2025-05-10 | MagicPortrait: Temporally Consistent Face Reenactment with 3D Geometric Guidance | Mengting Wei et.al. | 2504.21497 | link |
2025-05-08 | OXSeg: Multidimensional attention UNet-based lip segmentation using semi-supervised lip contours | Hanie Moghaddasi et.al. | 2505.05531 | null |
2025-05-03 | GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting | Anushka Agarwal et.al. | 2505.01928 | null |
2025-05-02 | Model See Model Do: Speech-Driven Facial Animation with Style Control | Yifang Pan et.al. | 2505.01319 | null |
2025-05-02 | FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing | Gaoxiang Cong et.al. | 2505.01263 | null |
2025-05-01 | KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution | Antoni Bigata et.al. | 2505.00497 | null |
2025-04-29 | IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos | Yuan Li et.al. | 2504.19165 | null |
2025-04-27 | Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions | Mohammad Mahdi Abootorabi et.al. | 2504.19056 | link |
2025-04-26 | Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning | Yifan Xie et.al. | 2504.18810 | null |
2025-04-25 | Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation | Weipeng Tan et.al. | 2504.18087 | null |
2025-04-14 | SpinMeRound: Consistent Multi-View Identity Generation Using Diffusion Models | Stathis Galanakis et.al. | 2504.10716 | null |
2025-04-10 | ChildlikeSHAPES: Semantic Hierarchical Region Parsing for Animating Figure Drawings | Astitva Srivastava et.al. | 2504.08022 | null |
2025-04-08 | VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing | Juan Luis Gonzalez Bello et.al. | 2504.07146 | null |
2025-04-08 | SE4Lip: Speech-Lip Encoder for Talking Head Synthesis to Solve Phoneme-Viseme Alignment Ambiguity | Yihuan Huang et.al. | 2504.05803 | null |
2025-04-08 | Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation | Zhihua Xu et.al. | 2504.05746 | null |
2025-04-08 | Contrastive Decoupled Representation Learning and Regularization for Speech-Preserving Facial Expression Manipulation | Tianshui Chen et.al. | 2504.05672 | null |
2025-04-07 | Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation | Fa-Ting Hong et.al. | 2504.02542 | link |
2025-04-06 | FluentLip: A Phonemes-Based Two-stage Approach for Audio-Driven Lip Synthesis with Optical Flow Consistency | Shiyan Liu et.al. | 2504.04427 | null |
2025-04-04 | A Human Digital Twin Architecture for Knowledge-based Interactions and Context-Aware Conversations | Abdul Mannan Mohammed et.al. | 2504.03147 | null |
2025-04-03 | OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication | Zhongjian Wang et.al. | 2504.02433 | null |
2025-04-03 | VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models | Kim Sung-Bin et.al. | 2504.02386 | null |
2025-04-02 | Detecting Lip-Syncing Deepfakes: Vision Temporal Transformer for Analyzing Mouth Inconsistencies | Soumyya Kanti Datta et.al. | 2504.01470 | link |
2025-04-02 | EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters | Xuli Shen et.al. | 2503.19416 | null |
2025-04-01 | Monocular and Generalizable Gaussian Talking Head Animation | Shengjie Gong et.al. | 2504.00665 | null |
2025-03-31 | Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics | Lee Chae-Yeon et.al. | 2503.20308 | null |
2025-03-30 | MoCha: Towards Movie-Grade Talking Character Synthesis | Cong Wei et.al. | 2503.23307 | null |
2025-03-29 | STSA: Spatial-Temporal Semantic Alignment for Visual Dubbing | Zijun Ding et.al. | 2503.23039 | link |
2025-03-28 | Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis | Shuai Shen et.al. | 2503.22605 | null |
2025-03-28 | Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance | Haijie Yang et.al. | 2503.22225 | null |
2025-03-27 | ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model | Jinwei Qi et.al. | 2503.21144 | null |
2025-03-26 | Dual Audio-Centric Modality Coupling for Talking Head Generation | Ao Fu et.al. | 2503.22728 | null |
2025-03-25 | AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers | Jiazhi Guan et.al. | 2503.19824 | null |
2025-03-25 | MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation | Yukang Lin et.al. | 2503.19383 | null |
2025-03-25 | HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation | Zunnan Xu et.al. | 2503.18860 | null |
2025-03-25 | Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model | Yingying Fan et.al. | 2503.16942 | null |
2025-03-24 | DisentTalk: Cross-lingual Talking Face Generation via Semantic Disentangled Diffusion Model | Kangwei Liu et.al. | 2503.19001 | null |
2025-03-24 | Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation | Dingcheng Zhen et.al. | 2503.18429 | null |
2025-03-23 | DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation | Peng Chen et.al. | 2503.18159 | link |
2025-03-21 | TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting | Jianchuan Chen et.al. | 2503.17032 | null |
2025-03-21 | From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech | Ji-Hoon Kim et.al. | 2503.16956 | null |
2025-03-20 | UniSync: A Unified Framework for Audio-Visual Synchronization | Tao Feng et.al. | 2503.16357 | null |
2025-03-20 | PC-Talk: Precise Facial Animation Control for Audio-Driven Talking Face Generation | Baiqin Wang et.al. | 2503.14295 | null |
2025-03-19 | DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis | Yuming Gu et.al. | 2503.15667 | link |
2025-03-19 | KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation | Antoni Bigata et.al. | 2503.01715 | null |
2025-03-17 | SyncDiff: Diffusion-based Talking Head Synthesis with Bottlenecked Temporal Visual Prior for Improved Synchronization | Xulin Fan et.al. | 2503.13371 | null |
2025-03-17 | Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait | Chaolong Yang et.al. | 2503.12963 | link |
2025-03-16 | Versatile Multimodal Controls for Whole-Body Talking Human Animation | Zheng Qin et.al. | 2503.08714 | null |
2025-03-14 | Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control | Hejia Chen et.al. | 2503.14517 | null |
2025-03-14 | EmoDiffusion: Enhancing Emotional 3D Facial Animation with Latent Diffusion Models | Yixuan Zhang et.al. | 2503.11028 | null |
2025-03-12 | StyleSpeaker: Audio-Enhanced Fine-Grained Style Modeling for Speech-Driven 3D Facial Animation | An Yang et.al. | 2503.09852 | null |
2025-03-12 | Bidirectional Learned Facial Animation Codec for Low Bitrate Talking Head Videos | Riku Takahashi et.al. | 2503.09787 | null |
2025-03-09 | Removing Averaging: Personalized Lip-Sync Driven Characters Based on Identity Adapter | Yanyu Zhu et.al. | 2503.06397 | null |
2025-03-07 | MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice | Hongwei Yi et.al. | 2503.05978 | null |
2025-03-06 | FREAK: Frequency-modulated High-fidelity and Real-time Audio-driven Talking Portrait Synthesis | Ziqi Ni et.al. | 2503.04067 | null |
2025-03-02 | FaceShot: Bring Any Character into Life | Junyao Gao et.al. | 2503.00740 | null |
2025-03-01 | Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture | Xuanchen Li et.al. | 2503.00495 | null |
2025-02-28 | Two-Stream Spatial-Temporal Transformer Framework for Person Identification via Natural Conversational Keypoints | Masoumeh Chapariniya et.al. | 2502.20803 | null |
2025-02-28 | ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model | Xuangeng Chu et.al. | 2502.20323 | null |
2025-02-27 | InsTaG: Learning Personalized 3D Talking Head from Few-Second Video | Jiahe Li et.al. | 2502.20387 | link |
2025-02-27 | High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model | Mingtao Guo et.al. | 2502.19894 | link |
2025-02-26 | FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion mode | Lingzhou Mu et.al. | 2502.19455 | null |
2025-02-24 | Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation | Baptiste Chopin et.al. | 2502.17198 | null |
2025-02-20 | NeRF-3DTalker: Neural Radiance Field with 3D Prior Aided Audio Disentanglement for Talking Head Synthesis | Xiaoxing Liu et.al. | 2502.14178 | null |
2025-02-18 | AV-Flow: Transforming Text to Audio-Visual Human-like Interactions | Aggelina Chatziagapi et.al. | 2502.13133 | null |
2025-02-17 | SayAnything: Audio-Driven Lip Synchronization with Conditional Video Diffusion | Junxian Ma et.al. | 2502.11515 | null |
2025-02-15 | SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers | Di Qiu et.al. | 2502.10841 | link |
2025-02-13 | Long-Term TalkingFace Generation via Motion-Prior Conditional Diffusion Model | Fei Shen et.al. | 2502.09533 | null |
2025-02-13 | VTutor: An Open-Source SDK for Generative AI-Powered Animated Pedagogical Agents with Multi-Media Output | Eason Chen et.al. | 2502.04103 | null |
2025-02-11 | Playmate: Flexible Control of Portrait Animation via 3D-Implicit Space Guided Diffusion | Xingpei Ma et.al. | 2502.07203 | null |
2025-02-07 | Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark | Han Zhang et.al. | 2502.04976 | null |
2025-02-02 | EmoTalkingGaussian: Continuous Emotion-conditioned Talking Head Synthesis | Junuk Cha et.al. | 2502.00654 | null |
2025-01-24 | SyncAnimation: A Real-Time End-to-End Framework for Audio-Driven Human Pose and Talking Head Animation | Yujian Liu et.al. | 2501.14646 | null |
2025-01-21 | A Lightweight and Interpretable Deepfakes Detection Framework | Muhammad Umar Farooq et.al. | 2501.11927 | null |
2025-01-18 | EMO2: End-Effector Guided Audio-Driven Avatar Video Generation | Linrui Tian et.al. | 2501.10687 | null |
2025-01-17 | TalkingEyes: Pluralistic Speech-Driven 3D Eye Gaze Animation | Yixiang Zhuang et.al. | 2501.09921 | null |
2025-01-15 | Joint Learning of Depth and Appearance for Portrait Image Animation | Xinya Ji et.al. | 2501.08649 | null |
2025-01-15 | Make-A-Character 2: Animatable 3D Character Generation From a Single Image | Lin Liu et.al. | 2501.07870 | null |
2025-01-09 | Towards Dynamic Neural Communication and Speech Neuroprosthesis Based on Viseme Decoding | Ji-Ha Park et.al. | 2501.14790 | null |
2025-01-09 | Identity-Preserving Video Dubbing Using Motion Warping | Runzhen Liu et.al. | 2501.04586 | null |
2025-01-09 | MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation | Huaize Liu et.al. | 2501.01808 | null |
2025-01-07 | Generating and Detecting Various Types of Fake Image and Audio Content: A Review of Modern Deep Learning Technologies and Tools | Arash Dehghani et.al. | 2501.06227 | null |
2025-01-07 | VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control | Yuanpeng Tu et.al. | 2501.01427 | null |
2025-01-06 | RDD4D: 4D Attention-Guided Road Damage Detection And Classification | Asma Alkalbani et.al. | 2501.02822 | link |
2025-01-06 | Takeaways from Applying LLM Capabilities to Multiple Conversational Avatars in a VR Pilot Study | Mykola Maslych et.al. | 2501.00168 | null |
2025-01-03 | JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing | Qili Wang et.al. | 2501.01798 | link |
2024-12-28 | DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis | Kaijun Deng et.al. | 2412.20148 | link |
2024-12-26 | UniAvatar: Taming Lifelike Audio-Driven Talking Head Generation with Comprehensive Motion and Lighting Control | Wenzhang Sun et.al. | 2412.19860 | null |
2024-12-26 | Generating Editable Head Avatars with 3D Gaussian GANs | Guohao Li et.al. | 2412.19149 | link |
2024-12-23 | FaceLift: Single Image to 3D Head with View Generation and GS-LRM | Weijie Lyu et.al. | 2412.17812 | null |
2024-12-22 | FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation | Tianyun Zhong et.al. | 2412.16915 | null |
2024-12-18 | Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters | Steven Hogue et.al. | 2412.14333 | link |
2024-12-18 | GLCF: A Global-Local Multimodal Coherence Analysis Framework for Talking Face Generation Detection | Xiaocan Chen et.al. | 2412.13656 | null |
2024-12-18 | Learning to Control an Android Robot Head for Facial Animation | Marcel Heisler et.al. | 2412.13641 | null |
2024-12-18 | Real-time One-Step Diffusion-based Expressive Portrait Videos Generation | Hanzhong Guo et.al. | 2412.13479 | link |
2024-12-18 | VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization | Tao Liu et.al. | 2412.09892 | null |
2024-12-16 | Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content | Rohit Kundu et.al. | 2412.12278 | null |
2024-12-13 | GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expression | Ziqi Zhou et.al. | 2412.09296 | link |
2024-12-12 | LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync | Chunyu Li et.al. | 2412.09262 | link |
2024-12-12 | EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing | Gaoxiang Cong et.al. | 2412.08988 | null |
2024-12-11 | PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis | Yifan Xie et.al. | 2412.08504 | null |
2024-12-10 | PortraitTalk: Towards Customizable One-Shot Audio-to-Talking Face Generation | Fatemeh Nazarieh et.al. | 2412.07754 | null |
2024-12-10 | IF-MDM: Implicit Face Motion Diffusion Model for High-Fidelity Realtime Talking Head Generation | Sejong Yang et.al. | 2412.04000 | null |
2024-12-05 | MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation | Longtao Zheng et.al. | 2412.04448 | null |
2024-12-05 | Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks | Jiahao Cui et.al. | 2412.00733 | link |
2024-12-04 | SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model | Yan Li et.al. | 2412.03430 | null |
2024-12-02 | One Shot, One Talk: Whole-body Talking Avatar from a Single Image | Jun Xiang et.al. | 2412.01106 | null |
2024-12-01 | Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation | Shuling Zhao et.al. | 2412.00719 | null |
2024-11-29 | LokiTalk: Learning Fine-Grained and Generalizable Correspondences to Enhance NeRF-based Talking Head Synthesis | Tianqi Li et.al. | 2411.19525 | null |
2024-11-29 | Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis | Tianqi Li et.al. | 2411.19509 | link |
2024-11-29 | V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow | Jeongsoo Choi et.al. | 2411.19486 | link |
2024-11-26 | Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey | Hong-Hanh Nguyen-Le et.al. | 2411.17911 | null |
2024-11-25 | Sonic: Shifting Focus to Global Audio Perception in Portrait Animation | Xiaozhong Ji et.al. | 2411.16331 | null |
2024-11-25 | ESARM: 3D Emotional Speech-to-Animation via Reward Model from Automatically-Ranked Demonstrations | Xulong Zhang et.al. | 2411.13089 | null |
2024-11-24 | LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis | Haojie Zhang et.al. | 2411.16748 | null |
2024-11-23 | EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion | Haotian Wang et.al. | 2411.16726 | null |
2024-11-23 | ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance | Haijie Yang et.al. | 2411.15436 | null |
2024-11-20 | Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis | Pegah Salehi et.al. | 2411.13209 | link |
2024-11-20 | JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation | Xuyang Cao et.al. | 2411.09209 | link |
2024-11-14 | LES-Talker: Fine-Grained Emotion Editing for Talking Head Generation in Linear Emotion Space | Guanwen Feng et.al. | 2411.09268 | null |
2024-11-06 | Large Generative Model-assisted Talking-face Semantic Communication System | Feibo Jiang et.al. | 2411.03876 | null |
2024-10-31 | Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts | Xiang Deng et.al. | 2410.23836 | null |
2024-10-29 | Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing | Haonan Tong et.al. | 2410.22112 | null |
2024-10-24 | Real-time 3D-aware Portrait Video Relighting | Ziqi Cai et.al. | 2410.18355 | link |
2024-10-21 | Joker: Conditional 3D Head Synthesis with Extreme Facial Expressions | Malte Prinzler et.al. | 2410.16395 | null |
2024-10-18 | Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization | Bin Lin et.al. | 2410.14283 | null |
2024-10-18 | DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation | Hanbo Cheng et.al. | 2410.13726 | link |
2024-10-16 | MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting | Yue Zhang et.al. | 2410.10122 | link |
2024-10-15 | Titanic Calling: Low Bandwidth Video Conference from the Titanic Wreck | Fevziye Irem Eyiokur et.al. | 2410.11434 | null |
2024-10-15 | MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes | Zhenhui Ye et.al. | 2410.06734 | null |
2024-10-14 | Character-aware audio-visual subtitling in context | Jaesung Huh et.al. | 2410.11068 | null |
2024-10-14 | Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads | Federico Nocentini et.al. | 2410.11041 | null |
2024-10-14 | TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model | Jiazhi Guan et.al. | 2410.10696 | null |
2024-10-14 | Generative Human Video Compression with Multi-granularity Temporal Trajectory Factorization | Shanzhi Yin et.al. | 2410.10171 | null |
2024-10-10 | MMHead: Towards Fine-grained Multi-modal 3D Facial Animation | Sijing Wu et.al. | 2410.07757 | null |
2024-10-09 | FreeAvatar: Robust 3D Facial Animation Transfer by Learning an Expression Foundation Model | Feng Qiu et.al. | 2409.13180 | null |
2024-10-01 | LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details | Jian Yang et.al. | 2410.00990 | null |
2024-09-29 | Learning Frame-Wise Emotion Intensity for Audio-Driven Talking-Head Generation | Jingyi Xu et.al. | 2409.19501 | null |
2024-09-27 | Diverse Code Query Learning for Speech-Driven Facial Animation | Chunzhi Gu et.al. | 2409.19143 | null |
2024-09-26 | Stable Video Portraits | Mirela Ostrek et.al. | 2409.18083 | null |
2024-09-25 | ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE | Sichun Wu et.al. | 2409.07966 | link |
2024-09-24 | FastTalker: Jointly Generating Speech and Conversational Gestures from Text | Zixin Guo et.al. | 2409.16404 | null |
2024-09-23 | FaceVid-1K: A Large-Scale High-Quality Multiracial Human Face Video Dataset | Donglin Di et.al. | 2410.07151 | null |
2024-09-23 | MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning | Yue Han et.al. | 2409.15179 | null |
2024-09-18 | JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation | Sai Tanmay Reddy Chakkera et.al. | 2409.12156 | null |
2024-09-18 | GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations | Kartik Teotia et.al. | 2409.11951 | null |
2024-09-17 | 3DFacePolicy: Speech-Driven 3D Facial Animation with Diffusion Policy | Xuanmeng Sha et.al. | 2409.10848 | null |
2024-09-16 | DreamHead: Learning Spatial-Temporal Correspondence via Hierarchical Diffusion for Audio-driven Talking Head Synthesis | Fa-Ting Hong et.al. | 2409.10281 | null |
2024-09-14 | StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads | Suzhen Wang et.al. | 2409.09292 | null |
2024-09-11 | DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures | Steven Hogue et.al. | 2409.07649 | null |
2024-09-11 | EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion | Jian Zhang et.al. | 2409.07255 | link |
2024-09-09 | PersonaTalk: Bring Attention to Your Persona in Visual Dubbing | Longhao Zhang et.al. | 2409.05379 | null |
2024-09-09 | KAN-Based Fusion of Dual-Domain for Audio-Driven Facial Landmarks Generation | Hoang-Son Vo-Thanh et.al. | 2409.05330 | link |
2024-09-05 | SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing | Lingyu Xiong et.al. | 2409.03605 | null |
2024-09-05 | SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model | Weipeng Tan et.al. | 2409.03270 | null |
2024-09-04 | PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation | Jun Ling et.al. | 2409.02657 | null |
2024-09-02 | KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding | Zhihao Xu et.al. | 2409.01113 | link |
2024-08-28 | Micro and macro facial expressions by driven animations in realistic Virtual Humans | Rubens Halbig Montanha et.al. | 2408.16110 | null |
2024-08-27 | MegActor- |
Shurong Yang et.al. | 2408.14975 | null |
2024-08-25 | TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation | Jack Saunders et.al. | 2408.13714 | null |
2024-08-23 | G3FA: Geometry-guided GAN for Face Animation | Alireza Javanmardi et.al. | 2408.13049 | null |
2024-08-21 | AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition | Minheng Ni et.al. | 2408.11564 | null |
2024-08-21 | EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face with Mesh Attention | Yihong Lin et.al. | 2408.11518 | null |
2024-08-20 | DEGAS: Detailed Expressions on Full-Body Gaussian Avatars | Zhijing Shao et.al. | 2408.10588 | link |
2024-08-18 | FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model | Ziyu Yao et.al. | 2408.09384 | null |
2024-08-18 | Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation | Xukun Zhou et.al. | 2408.09357 | null |
2024-08-18 | S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis | Dongze Li et.al. | 2408.09347 | null |
2024-08-16 | GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer | Yihong Lin et.al. | 2408.01826 | null |
2024-08-14 | Content and Style Aware Audio-Driven Facial Animation | Qingju Liu et.al. | 2408.07005 | null |
2024-08-12 | DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation | Jisoo Kim et.al. | 2408.06010 | null |
2024-08-10 | High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model | Weizhi Zhong et.al. | 2408.05416 | null |
2024-08-10 | Style-Preserving Lip Sync via Audio-Aware Style Reference | Weizhi Zhong et.al. | 2408.05412 | null |
2024-08-09 | DeepSpeak Dataset v1.0 | Sarah Barrington et.al. | 2408.05366 | null |
2024-08-06 | ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer | Jiazhi Guan et.al. | 2408.03284 | null |
2024-08-03 | Landmark-guided Diffusion Model for High-fidelity and Temporally Coherent Talking Head Generation | Jintao Tan et.al. | 2408.01732 | null |
2024-08-03 | JambaTalk: Speech-Driven 3D Talking Head Generation Based on Hybrid Transformer-Mamba Model | Farzaneh Jafari et.al. | 2408.01627 | null |
2024-08-01 | UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model | Xiangyu Fan et.al. | 2408.00762 | null |
2024-08-01 | Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion | Manuel Kansy et.al. | 2408.00458 | null |
2024-08-01 | EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head | Qianyun He et.al. | 2408.00297 | null |
2024-07-31 | Deformable 3D Shape Diffusion Model | Dengsheng Chen et.al. | 2407.21428 | null |
2024-07-26 | LinguaLinker: Audio-Driven Portraits Animation with Implicit Facial Control Enhancement | Rui Zhang et.al. | 2407.18595 | null |
2024-07-24 | A Comprehensive Review and Taxonomy of Audio-Visual Synchronization Techniques for Realistic Speech Animation | Jose Geraldo Fernandes et.al. | 2407.17430 | null |
2024-07-24 | The impact of differences in facial features between real speakers and 3D face models on synthesized lip motions | Rabab Algadhy et.al. | 2407.17253 | null |
2024-07-22 | PAV: Personalized Head Avatar from Unstructured Video Collection | Akin Caliskan et.al. | 2407.21047 | null |
2024-07-21 | Anchored Diffusion for Video Face Reenactment | Idan Kligvasser et.al. | 2407.15153 | null |
2024-07-20 | Text-based Talking Video Editing with Cascaded Conditional Diffusion | Bo Han et.al. | 2407.14841 | null |
2024-07-17 | Universal Facial Encoding of Codec Avatars from VR Headsets | Shaojie Bai et.al. | 2407.13038 | null |
2024-07-17 | EmoFace: Audio-driven Emotional 3D Face Animation | Chang Liu et.al. | 2407.12501 | link |
2024-07-13 | Learning Online Scale Transformation for Talking Head Video Generation | Fa-Ting Hong et.al. | 2407.09965 | null |
2024-07-12 | Real Face Video Animation Platform | Xiaokai Chen et.al. | 2407.18955 | null |
2024-07-12 | One-Shot Pose-Driving Face Animation Platform | He Feng et.al. | 2407.08949 | null |
2024-07-12 | EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions | Zhiyuan Chen et.al. | 2407.08136 | link |
2024-07-08 | MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices | Jianwen Jiang et.al. | 2407.05712 | null |
2024-07-08 | Audio-driven High-resolution Seamless Talking Head Video Editing via StyleGAN | Jiacheng Su et.al. | 2407.05577 | null |
2024-07-04 | Compressed Skinning for Facial Blendshapes | Ladislav Kavan et.al. | 2406.11597 | null |
2024-07-03 | LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control | Jianzhu Guo et.al. | 2407.03168 | link |
2024-07-01 | Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert | Han EunGi et.al. | 2407.01034 | null |
2024-06-26 | RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network | Xiaozhong Ji et.al. | 2406.18284 | null |
2024-06-24 | The Effects of Embodiment and Personality Expression on Learning in LLM-based Educational Agents | Sinan Sonlu et.al. | 2407.10993 | null |
2024-06-21 | EmpathyEar: An Open-source Avatar Multimodal Empathetic Chatbot | Hao Fei et.al. | 2406.15177 | link |
2024-06-20 | MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset | Kim Sung-Bin et.al. | 2406.14272 | null |
2024-06-19 | DF40: Toward Next-Generation Deepfake Detection | Zhiyuan Yan et.al. | 2406.13495 | link |
2024-06-19 | AniFaceDiff: High-Fidelity Face Reenactment via Facial Parametric Conditioned Diffusion Models | Ken Chen et.al. | 2406.13272 | null |
2024-06-18 | RITA: A Real-time Interactive Talking Avatars Framework | Wuxinlin Cheng et.al. | 2406.13093 | null |
2024-06-18 | A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing | Ming Meng et.al. | 2406.10553 | null |
2024-06-17 | NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head Generation | Niu Guanchen et.al. | 2406.11259 | null |
2024-06-17 | Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement | Runyi Yu et.al. | 2406.08096 | null |
2024-06-16 | Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation | Mingwang Xu et.al. | 2406.08801 | null |
2024-06-14 | DNPM: A Neural Parametric Model for the Synthesis of Facial Geometric Details | Haitao Cao et.al. | 2405.19688 | null |
2024-06-13 | Talking Heads: Understanding Inter-layer Communication in Transformer Language Models | Jack Merullo et.al. | 2406.09519 | null |
2024-06-13 | DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing | Neha Sahipjohn et.al. | 2406.08802 | null |
2024-06-12 | Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation | Jiadong Liang et.al. | 2406.07895 | null |
2024-06-07 | Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation | Yue Ma et.al. | 2406.01900 | null |
2024-06-05 | Controllable Talking Face Generation by Implicit Facial Keypoints Editing | Dong Zhao et.al. | 2406.02880 | link |
2024-05-31 | MunchSonic: Tracking Fine-grained Dietary Actions through Active Acoustic Sensing on Eyeglasses | Saif Mahmud et.al. | 2405.21004 | null |
2024-05-31 | MegActor: Harness the Power of Raw Video for Vivid Portrait Animation | Shurong Yang et.al. | 2405.20851 | link |
2024-05-30 | Audio2Rig: Artist-oriented deep learning tool for facial animation | Bastien Arcelin et.al. | 2405.20412 | null |
2024-05-28 | OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance | Shuheng Ge et.al. | 2405.14709 | null |
2024-05-24 | InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation | Yuchi Wang et.al. | 2405.15758 | link |
2024-05-22 | Metabook: An Automatically Generated Augmented Reality Storybook Interaction System to Improve Children's Engagement in Storytelling | Yibo Wang et.al. | 2405.13701 | null |
2024-05-21 | Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control | Yue Han et.al. | 2405.12970 | null |
2024-05-16 | Faces that Speak: Jointly Synthesising Talking Face and Speech from Text | Youngjoon Jang et.al. | 2405.10272 | null |
2024-05-14 | PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset | Yang Hou et.al. | 2405.08838 | link |
2024-05-12 | Listen, Disentangle, and Control: Controllable Speech-Driven Talking Head Generation | Changpeng Cai et.al. | 2405.07257 | null |
2024-05-10 | NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior | Gihoon Kim et.al. | 2405.05749 | null |
2024-05-09 | SwapTalk: Audio-Driven Talking Face Generation with One-Shot Customization in Latent Space | Zeren Zhang et.al. | 2405.05636 | null |
2024-05-08 | Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention | Ruijie Tao et.al. | 2404.18501 | link |
2024-05-07 | Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation | Dogucan Yaman et.al. | 2405.04327 | null |
2024-05-06 | AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding | Tao Liu et.al. | 2405.03121 | link |
2024-04-29 | EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars | Nikita Drobyshev et.al. | 2404.19110 | null |
2024-04-29 | GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting | Bo Chen et.al. | 2404.19040 | null |
2024-04-29 | Embedded Representation Learning Network for Animating Styled Video Portrait | Tianyong Wang et.al. | 2404.19038 | null |
2024-04-29 | CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation | Xiangyu Liang et.al. | 2404.18604 | null |
2024-04-28 | GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting | Hongyun Yu et.al. | 2404.14037 | null |
2024-04-25 | GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting | Kyusun Cho et.al. | 2404.16012 | link |
2024-04-23 | TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting | Jiahe Li et.al. | 2404.15264 | link |
2024-04-19 | Learn2Talk: 3D Talking Face Learns from 2D Talking Face | Yixiang Zhuang et.al. | 2404.12888 | null |
2024-04-16 | VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time | Sicheng Xu et.al. | 2404.10667 | null |
2024-04-15 | FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance, Head-pose, and Facial Expression Features | Andre Rochow et.al. | 2404.09736 | null |
2024-04-13 | THQA: A Perceptual Quality Assessment Database for Talking Heads | Yingjie Zhou et.al. | 2404.09003 | link |
2024-04-11 | EFHQ: Multi-purpose ExtremePose-Face-HQ dataset | Trung Tuan Dao et.al. | 2312.17205 | null |
2024-04-09 | Deepfake Generation and Detection: A Benchmark and Survey | Gan Pei et.al. | 2403.17881 | link |
2024-04-08 | SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation | Heyuan Li et.al. | 2404.05680 | null |
2024-04-07 | GvT: A Graph-based Vision Transformer with Talking-Heads Utilizing Sparsity, Trained from Scratch on Small Datasets | Dongjing Shan et.al. | 2404.04924 | null |
2024-04-07 | Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation | Renshuai Liu et.al. | 2401.01207 | null |
2024-04-03 | MI-NeRF: Learning a Single Face NeRF from Multiple Identities | Aggelina Chatziagapi et.al. | 2403.19920 | null |
2024-04-02 | EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis | Shuai Tan et.al. | 2404.01647 | null |
2024-04-02 | Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation | Taekyung Ki et.al. | 2404.00636 | null |
2024-04-01 | FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio | Chao Xu et.al. | 2403.01901 | link |
2024-04-01 | Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation | Se Jin Park et.al. | 2305.19556 | null |
2024-03-29 | Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior | Jaehoon Ko et.al. | 2403.20153 | link |
2024-03-28 | MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation | Seyeon Kim et.al. | 2403.19144 | link |
2024-03-28 | GOTCHA: Real-Time Video Deepfake Detection via Challenge-Response | Govind Mittal et.al. | 2210.06186 | link |
2024-03-27 | X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention | You Xie et.al. | 2403.15931 | null |
2024-03-26 | Superior and Pragmatic Talking Face Generation with Teacher-Student Framework | Chao Liang et.al. | 2403.17883 | null |
2024-03-26 | AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation | Huawei Wei et.al. | 2403.17694 | link |
2024-03-25 | DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment | Stella Bounareli et.al. | 2403.17217 | null |
2024-03-25 | AnimateMe: 4D Facial Expressions via Diffusion Models | Dimitrios Gerogiannis et.al. | 2403.17213 | null |
2024-03-25 | Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework | Ziyao Huang et.al. | 2403.16510 | link |
2024-03-23 | Adaptive Super Resolution For One-Shot Talking-Head Generation | Luchuan Song et.al. | 2403.15944 | link |
2024-03-23 | Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis | Zhenhui Ye et.al. | 2401.08503 | link |
2024-03-22 | LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One Example | Soyeon Yoon et.al. | 2403.15227 | link |
2024-03-22 | Virbo: Multimodal Multilingual Avatar Video Generation in Digital Marketing | Juan Zhang et.al. | 2403.11700 | null |
2024-03-19 | EmoVOCA: Speech-Driven Emotional 3D Talking Heads | Federico Nocentini et.al. | 2403.12886 | link |
2024-03-19 | ScanTalk: 3D Talking Heads from Unregistered Scans | Federico Nocentini et.al. | 2403.10942 | link |
2024-03-15 | StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation | Dongchan Min et.al. | 2208.10922 | null |
2024-03-14 | GAIA: Zero-shot Talking Avatar Generation | Tianyu He et.al. | 2311.15230 | null |
2024-03-13 | Say Anything with Any Style | Shuai Tan et.al. | 2403.06363 | null |
2024-03-12 | FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization | Shuai Tan et.al. | 2403.06375 | null |
2024-03-12 | Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style | Shuai Tan et.al. | 2403.06365 | null |
2024-03-11 | A Comparative Study of Perceptual Quality Metrics for Audio-driven Talking Head Videos | Weixia Zhang et.al. | 2403.06421 | link |
2024-03-05 | Memories are One-to-Many Mapping Alleviators in Talking Face Generation | Anni Tang et.al. | 2212.05005 | null |
2024-03-02 | G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment | Juan Zhang et.al. | 2402.18122 | null |
2024-03-01 | DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder | Chenpeng Du et.al. | 2303.17550 | null |
2024-02-29 | Learning a Generalized Physical Face Model From Data | Lingchen Yang et.al. | 2402.19477 | null |
2024-02-28 | Context-aware Talking Face Video Generation | Meidai Xuanyuan et.al. | 2402.18092 | null |
2024-02-27 | EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions | Linrui Tian et.al. | 2402.17485 | null |
2024-02-27 | Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis | Zicheng Zhang et.al. | 2402.17364 | link |
2024-02-26 | Resolution-Agnostic Neural Compression for High-Fidelity Portrait Video Conferencing via Implicit Radiance Fields | Yifei Li et.al. | 2402.16599 | null |
2024-02-25 | AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation | Yasheng Sun et.al. | 2402.16124 | null |
2024-02-21 | Bring Your Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized Characters | Zechen Bai et.al. | 2402.13724 | link |
2024-02-21 | StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing | Gaoxiang Cong et.al. | 2402.12636 | link |
2024-02-12 | StyleLipSync: Style-based Personalized Lip-sync Video Generation | Taekyung Ki et.al. | 2305.00521 | null |
2024-02-08 | DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer | Zhiyuan Ma et.al. | 2402.05712 | link |
2024-02-05 | One-shot Neural Face Reenactment via Finding Directions in GAN's Latent Space | Stella Bounareli et.al. | 2402.03553 | null |
2024-02-02 | EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation | Guanwen Feng et.al. | 2402.01422 | null |
2024-01-31 | MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis | Wenhao Guan et.al. | 2312.10687 | null |
2024-01-30 | Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance | Qingcheng Zhao et.al. | 2401.15687 | null |
2024-01-28 | Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-Syncing DeepFakes | Weifeng Liu et.al. | 2401.15668 | link |
2024-01-27 | An Implicit Physical Face Model Driven by Expression and Style | Lingchen Yang et.al. | 2401.15414 | null |
2024-01-26 | Implicit Neural Representation for Physics-driven Actuated Soft Bodies | Lingchen Yang et.al. | 2401.14861 | null |
2024-01-25 | SAiD: Speech-driven Blendshape Facial Animation with Diffusion | Inkyu Park et.al. | 2401.08655 | link |
2024-01-23 | NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for Talking Face Synthesis | Chongke Bi et.al. | 2401.12568 | null |
2024-01-19 | Fast Registration of Photorealistic Avatars for VR Facial Animation | Chaitanya Patel et.al. | 2401.11002 | null |
2024-01-18 | Exposing Lip-syncing Deepfakes from Mouth Inconsistencies | Soumyya Kanti Datta et.al. | 2401.10113 | link |
2024-01-18 | Text-driven Talking Face Synthesis by Reprogramming Audio-driven Models | Jeongsoo Choi et.al. | 2306.16003 | null |
2024-01-16 | EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model | Bingyuan Zhang et.al. | 2401.08049 | null |
2024-01-12 | DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder | Tao Liu et.al. | 2311.01811 | link |
2024-01-11 | Dubbing for Everyone: Data-Efficient Visual Dubbing using Neural Rendering Priors | Jack Saunders et.al. | 2401.06126 | null |
2024-01-11 | Jump Cut Smoothing for Talking Heads | Xiaojuan Wang et.al. | 2401.04718 | null |
2024-01-08 | AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation | Liyang Chen et.al. | 2310.07236 | null |
2024-01-07 | Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness | Sicheng Yang et.al. | 2401.03476 | null |
2024-01-04 | Expressive Speech-driven Facial Animation with controllable emotions | Yutong Chen et.al. | 2301.02008 | link |
2023-12-23 | TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation | Xize Cheng et.al. | 2312.15197 | null |
2023-12-21 | DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation | Chenxu Zhang et.al. | 2312.13578 | null |
2023-12-20 | FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability | Linze Li et.al. | 2312.03775 | null |
2023-12-19 | Learning Dense Correspondence for NeRF-Based Face Reenactment | Songlin Yang et.al. | 2312.10422 | null |
2023-12-19 | Gaussian3Diff: 3D Gaussian Diffusion for 3D Full Head Synthesis and Editing | Yushi Lan et.al. | 2312.03763 | null |
2023-12-18 | VectorTalker: SVG Talking Face Generation with Progressive Vectorisation | Hao Hu et.al. | 2312.11568 | null |
2023-12-18 | AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis | Dongze Li et.al. | 2312.10921 | null |
2023-12-18 | Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation | Hui Fu et.al. | 2312.10877 | null |
2023-12-15 | DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models | Yifeng Ma et.al. | 2312.09767 | link |
2023-12-15 | Attention-Based VR Facial Animation with Visual Mouth Camera Guidance for Immersive Telepresence Avatars | Andre Rochow et.al. | 2312.09750 | null |
2023-12-13 | uTalk: Bridging the Gap Between Humans and AI | Hussam Azzuni et.al. | 2310.02739 | null |
2023-12-13 | MMFace4D: A Large-Scale Multi-Modal 4D Face Dataset for Audio-Driven 3D Face Animation | Haozhe Wu et.al. | 2303.09797 | null |
2023-12-12 | GMTalker: Gaussian Mixture based Emotional talking video Portraits | Yibo Xia et.al. | 2312.07669 | null |
2023-12-12 | GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance | Haiming Zhang et.al. | 2312.07385 | null |
2023-12-11 | Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism | Georgios Milis et.al. | 2312.06613 | link |
2023-12-11 | Study of Non-Verbal Behavior in Conversational Agents | Camila Vicari Maccari et.al. | 2312.06530 | null |
2023-12-11 | DiT-Head: High-Resolution Talking Head Synthesis using Diffusion Transformers | Aaron Mir et.al. | 2312.06400 | null |
2023-12-11 | Audio-driven Talking Face Generation by Overcoming Unintended Information Flow | Dogucan Yaman et.al. | 2307.09368 | null |
2023-12-10 | DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation | Fa-Ting Hong et.al. | 2305.06225 | link |
2023-12-09 | R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning | Zhiling Ye et.al. | 2312.05572 | null |
2023-12-09 | FT2TF: First-Person Statement Text-To-Talking Face Generation | Xingjian Diao et.al. | 2312.05430 | null |
2023-12-08 | SingingHead: A Large-scale 4D Dataset for Singing Head Animation | Sijing Wu et.al. | 2312.04369 | null |
2023-12-07 | VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior | Xusen Sun et.al. | 2312.01841 | null |
2023-12-05 | PMMTalk: Speech-Driven 3D Facial Animation from Complementary Pseudo Multi-modal Features | Tianshun Han et.al. | 2312.02781 | null |
2023-12-05 | MyPortrait: Morphable Prior-Guided Personalized Portrait Generation | Bo Ding et.al. | 2312.02703 | null |
2023-12-02 | DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser | Peng Chen et.al. | 2311.16565 | null |
2023-12-01 | 3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing | Balamurugan Thambiraja et.al. | 2312.00870 | null |
2023-11-30 | Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data | Yu Deng et.al. | 2311.18729 | null |
2023-11-30 | Talking Head(?) Anime from a Single Image 4: Improved Model and Its Distillation | Pramook Khungurn et.al. | 2311.17409 | null |
2023-11-29 | SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis | Ziqiao Peng et.al. | 2311.17590 | link |
2023-11-28 | THInImg: Cross-modal Steganography for Presenting Talking Heads in Images | Lin Zhao et.al. | 2311.17177 | null |
2023-11-28 | BakedAvatar: Baking Neural Fields for Real-Time Head Avatar Synthesis | Hao-Bin Duan et.al. | 2311.05521 | link |
2023-11-28 | Continuously Controllable Facial Expression Editing in Talking Face Videos | Zhiyao Sun et.al. | 2209.08289 | null |
2023-11-20 | MemoryCompanion: A Smart Healthcare Solution to Empower Efficient Alzheimer's Care Via Unleashing Generative AI | Lifei Zheng et.al. | 2311.14730 | null |
2023-11-15 | CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding | Jianzong Wang et.al. | 2311.08673 | null |
2023-11-13 | DualTalker: A Cross-Modal Dual Learning Approach for Speech-Driven 3D Facial Animation | Guinan Su et.al. | 2311.04766 | null |
2023-11-12 | ChatAnything: Facetime Chat with LLM-Enhanced Personas | Yilin Zhao et.al. | 2311.06772 | null |
2023-11-08 | Synthetic Speaking Children -- Why We Need Them and How to Make Them | Muhammad Ali Farooq et.al. | 2311.06307 | null |
2023-11-06 | RADIO: Reference-Agnostic Dubbing Video Synthesis | Dongyeun Lee et.al. | 2309.01950 | null |
2023-11-05 | 3D-Aware Talking-Head Video Motion Transfer | Haomiao Ni et.al. | 2311.02549 | null |
2023-11-03 | Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading | Songtao Luo et.al. | 2310.05058 | link |
2023-11-02 | LaughTalk: Expressive 3D Talking Head Generation with Laughter | Kim Sung-Bin et.al. | 2311.00994 | null |
2023-11-02 | High-Fidelity and Freely Controllable Talking Head Video Generation | Yue Gao et.al. | 2304.10168 | null |
2023-10-31 | Breathing Life into Faces: Speech-driven 3D Facial Animation with Natural Head Pose and Detailed Shape | Wei Zhao et.al. | 2310.20240 | null |
2023-10-29 | On the Vulnerability of DeepFake Detectors to Attacks Generated by Denoising Diffusion Models | Marija Ivanovska et.al. | 2307.05397 | null |
2023-10-25 | Personalized Speech-driven Expressive 3D Facial Animation Synthesis with Style Control | Elif Bozkurt et.al. | 2310.17011 | null |
2023-10-23 | The Self 2.0: How AI-Enhanced Self-Clones Transform Self-Perception and Improve Presentation Skills | Qingxiao Zheng et.al. | 2310.15112 | null |
2023-10-19 | Gemino: Practical and Robust Neural Compression for Video Conferencing | Vibhaalakshmi Sivaraman et.al. | 2209.10507 | null |
2023-10-17 | CorrTalk: Correlation Between Hierarchical Speech and Facial Activity Variances for 3D Animation | Zhaojie Chu et.al. | 2310.11295 | null |
2023-10-15 | HyperLips: Hyper Control Lips with High Resolution Decoder for Talking Face Generation | Yaosen Chen et.al. | 2310.05720 | link |
2023-10-12 | CleftGAN: Adapting A Style-Based Generative Adversarial Network To Create Images Depicting Cleft Lip Deformity | Abdullah Hayajneh et.al. | 2310.07969 | link |
2023-10-12 | Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation | Yuan Gan et.al. | 2309.04946 | link |
2023-10-08 | GestSync: Determining who is speaking without a talking head | Sindhu B Hegde et.al. | 2310.05304 | link |
2023-09-30 | DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models | Zhiyao Sun et.al. | 2310.00434 | null |
2023-09-28 | OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions | Jin Liu et.al. | 2309.16148 | null |
2023-09-26 | Emotional Speech-Driven Animation with Content-Emotion Disentanglement | Radek Daněček et.al. | 2306.08990 | null |
2023-09-20 | FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion | Stefan Stan et.al. | 2309.11306 | link |
2023-09-20 | Context-Aware Talking-Head Video Editing | Songlin Yang et.al. | 2308.00462 | null |
2023-09-18 | That's What I Said: Fully-Controllable Talking Face Generation | Youngjoon Jang et.al. | 2304.03275 | null |
2023-09-15 | Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech | Junjie Li et.al. | 2309.08408 | link |
2023-09-14 | DT-NeRF: Decomposed Triplane-Hash Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis | Yaoyu Su et.al. | 2309.07752 | null |
2023-09-14 | DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks | Zipeng Qi et.al. | 2309.07509 | null |
2023-09-14 | HDTR-Net: A Real-Time High-Definition Teeth Restoration Network for Arbitrary Talking Face Generation Methods | Yongyuan Li et.al. | 2309.07495 | link |
2023-09-13 | PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network | Qinghua Liu et.al. | 2309.06723 | null |
2023-09-12 | DF-TransFusion: Multimodal Deepfake Detection via Lip-Audio Cross-Attention and Facial Self-Attention | Aaditya Kharel et.al. | 2309.06511 | null |
2023-09-12 | Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos | Ekta Prashnani et.al. | 2305.03713 | null |
2023-09-11 | ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment | Yicheng Zhong et.al. | 2308.14448 | null |
2023-09-10 | MaskRenderer: 3D-Infused Multi-Mask Realistic Face Reenactment | Tina Behrouzi et.al. | 2309.05095 | null |
2023-09-09 | Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video | Xiuzhe Wu et.al. | 2309.04814 | link |
2023-09-01 | Unsupervised Learning of Style-Aware Facial Animation from Real Acting Performances | Wolfgang Paier et.al. | 2306.10006 | null |
2023-08-30 | From Pixels to Portraits: A Comprehensive Survey of Talking Head Generation Techniques and Applications | Shreyank N Gowda et.al. | 2308.16041 | null |
2023-08-30 | SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces | Ziqiao Peng et.al. | 2306.10799 | link |
2023-08-30 | Laughing Matters: Introducing Laughing-Face Generation using Diffusion Models | Antoni Bigata Casademunt et.al. | 2305.08854 | link |
2023-08-29 | Papeos: Augmenting Research Papers with Talk Videos | Tae Soo Kim et.al. | 2308.15224 | null |
2023-08-25 | EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation | Ziqiao Peng et.al. | 2303.11089 | link |
2023-08-24 | ToonTalker: Cross-Domain Face Reenactment | Yuan Gong et.al. | 2308.12866 | null |
2023-08-24 | Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis | Jiahe Li et.al. | 2307.09323 | link |
2023-08-23 | DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion | Se Jin Park et.al. | 2310.05934 | null |
2023-08-21 | Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesis | Tong Sha et.al. | 2109.02081 | null |
2023-08-18 | Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization | Soumik Mukhopadhyay et.al. | 2308.09716 | link |
2023-08-18 | Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation | Fa-Ting Hong et.al. | 2307.09906 | link |
2023-08-17 | A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation | Li Liu et.al. | 2308.08849 | link |
2023-08-16 | Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with Instructions | Yuqi Sun et.al. | 2306.10813 | null |
2023-08-12 | Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation | Zhichao Wang et.al. | 2308.06457 | link |
2023-08-12 | DialogueNeRF: Towards Realistic Avatar Face-to-Face Conversation Video Generation | Yichao Yan et.al. | 2203.07931 | null |
2023-08-11 | Versatile Face Animator: Driving Arbitrary 3D Facial Avatar in RGBD Space | Haoyu Wang et.al. | 2308.06076 | link |
2023-08-11 | VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer | Liyang Chen et.al. | 2308.04830 | null |
2023-08-10 | Near-realtime Facial Animation by Deep 3D Simulation Super-Resolution | Hyojoon Park et.al. | 2305.03216 | null |
2023-08-02 | Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis | Zhenhui Ye et.al. | 2306.03504 | null |
2023-07-29 | Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation | Michał Stypułkowski et.al. | 2301.03396 | null |
2023-07-26 | Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation | Federico Nocentini et.al. | 2306.01415 | link |
2023-07-20 | HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces | Stella Bounareli et.al. | 2307.10797 | link |
2023-07-19 | MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions | Yunfei Liu et.al. | 2307.10008 | null |
2023-07-19 | Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline | Zhigang Chang et.al. | 2307.09821 | null |
2023-07-19 | OPHAvatars: One-shot Photo-realistic Head Avatars | Shaoxu Li et.al. | 2307.09153 | link |
2023-07-18 | FACTS: Facial Animation Creation using the Transfer of Styles | Jack Saunders et.al. | 2307.09480 | null |
2023-07-09 | Predictive Coding For Animation-Based Video Compression | Goluck Konuko et.al. | 2307.04187 | null |
2023-07-08 | FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction | Ganglai Wang et.al. | 2307.03990 | null |
2023-07-05 | Interactive Conversational Head Generation | Mohan Zhou et.al. | 2307.02090 | null |
2023-07-04 | A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation | Louis Airale et.al. | 2307.03270 | link |
2023-07-04 | Generating Animatable 3D Cartoon Faces from Single Portraits | Chuanyu Pan et.al. | 2307.01468 | null |
2023-07-03 | RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting Self-Supervised Representations | Neha Sahipjohn et.al. | 2307.01233 | null |
2023-06-20 | Audio-Driven 3D Facial Animation from In-the-Wild Videos | Liying Lu et.al. | 2306.11541 | null |
2023-06-13 | Parametric Implicit Face Representation for Audio-Driven Facial Reenactment | Ricong Huang et.al. | 2306.07579 | null |
2023-06-13 | AniFaceDrawing: Anime Portrait Exploration during Your Sketching | Zhengyu Huang et.al. | 2306.07476 | null |
2023-06-12 | NPVForensics: Jointing Non-critical Phonemes and Visemes for Deepfake Detection | Yu Chen et.al. | 2306.06885 | null |
2023-06-10 | StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles | Yifeng Ma et.al. | 2301.01081 | link |
2023-06-08 | ReliableSwap: Boosting General Face Swapping Via Reliable Supervision | Ge Yuan et.al. | 2306.05356 | link |
2023-06-06 | Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks | Jianrong Wang et.al. | 2306.03594 | null |
2023-06-05 | Instruct-Video2Avatar: Video-to-Avatar Generation with Instructions | Shaoxu Li et.al. | 2306.02903 | link |
2023-05-31 | High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning | Chao Xu et.al. | 2305.02572 | null |
2023-05-23 | CPNet: Exploiting CLIP-based Attention Condenser and Probability Map Guidance for High-fidelity Talking Face Generation | Jingning Xu et.al. | 2305.13962 | null |
2023-05-22 | RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars | Dongwei Pan et.al. | 2305.13353 | link |
2023-05-19 | UniFLG: Unified Facial Landmark Generator from Text or Speech | Kentaro Mitsui et.al. | 2302.14337 | null |
2023-05-18 | An Android Robot Head as Embodied Conversational Agent | Marcel Heisler et.al. | 2305.10945 | null |
2023-05-18 | Audio-Visual Person-of-Interest DeepFake Detection | Davide Cozzolino et.al. | 2204.03083 | link |
2023-05-17 | INCLG: Inpainting for Non-Cleft Lip Generation with a Multi-Task Image Processing Network | Shuang Chen et.al. | 2305.10589 | null |
2023-05-17 | LPMM: Intuitive Pose Control for Neural Talking-Head Model via Landmark-Parameter Morphable Model | Kwangho Lee et.al. | 2305.10456 | null |
2023-05-15 | Identity-Preserving Talking Face Generation with Landmark and Appearance Priors | Weizhi Zhong et.al. | 2305.08293 | link |
2023-05-09 | Zero-shot personalized lip-to-speech synthesis with face image based voice control | Zheng-Yan Sheng et.al. | 2305.14359 | null |
2023-05-09 | StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator | Jiazhi Guan et.al. | 2305.05445 | null |
2023-05-09 | Multimodal-driven Talking Face Generation via a Unified Diffusion-based Generator | Chao Xu et.al. | 2305.02594 | null |
2023-05-01 | StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video | Lizhen Wang et.al. | 2305.00942 | link |
2023-05-01 | GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation | Zhenhui Ye et.al. | 2305.00787 | null |
2023-04-28 | A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation | Bo-Kyeong Kim et.al. | 2304.00471 | null |
2023-04-27 | Controllable One-Shot Face Video Synthesis With Semantic Aware Prior | Kangning Liu et.al. | 2304.14471 | null |
2023-04-25 | AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head | Rongjie Huang et.al. | 2304.12995 | link |
2023-04-24 | VR Facial Animation for Immersive Telepresence Avatars | Andre Rochow et.al. | 2304.12051 | null |
2023-04-21 | Implicit Neural Head Synthesis via Controllable Local Deformation Fields | Chuhan Chen et.al. | 2304.11113 | null |
2023-04-20 | DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation | Shuai Shen et.al. | 2301.03786 | link |
2023-04-18 | Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations | Rongliang Wu et.al. | 2304.08945 | null |
2023-04-17 | Autoregressive GAN for Semantic Unconditional Head Motion Generation | Louis Airale et.al. | 2211.00987 | link |
2023-04-11 | One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field | Weichuang Li et.al. | 2304.05097 | null |
2023-04-06 | Face Animation with an Attribute-Guided Diffusion Model | Bohan Zeng et.al. | 2304.03199 | link |
2023-04-06 | 4D Agnostic Real-Time Facial Animation Pipeline for Desktop Scenarios | Wei Chen et.al. | 2304.02814 | null |
2023-04-03 | CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior | Jinbo Xing et.al. | 2301.02379 | link |
2023-04-01 | DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance | Longwen Zhang et.al. | 2304.03117 | null |
2023-04-01 | TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles | Yifeng Ma et.al. | 2304.00334 | null |
2023-03-31 | FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions | Jin Liu et.al. | 2303.17789 | null |
2023-03-29 | Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert | Jiadong Wang et.al. | 2303.17480 | link |
2023-03-27 | OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis | Hongyi Xu et.al. | 2303.15539 | null |
2023-03-27 | Accurate and Interpretable Solution of the Inverse Rig for Realistic Blendshape Models with Quadratic Corrective Terms | Stevo Racković et.al. | 2302.04843 | null |
2023-03-27 | MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation | Bowen Zhang et.al. | 2212.08062 | link |
2023-03-27 | A Majorization-Minimization Based Method for Nonconvex Inverse Rig Problems in Facial Animation: Algorithm Derivation | Stevo Racković et.al. | 2205.04289 | null |
2023-03-26 | OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering | Zhiyuan Ma et.al. | 2303.14662 | link |
2023-03-26 | Emotionally Enhanced Talking Face Generation | Sahil Goyal et.al. | 2303.11548 | link |
2023-03-26 | Distributed Solution of the Inverse Rig Problem in Blendshape Facial Animation | Stevo Racković et.al. | 2303.06370 | null |
2023-03-24 | Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement | Siddarth Ravichandran et.al. | 2209.01320 | null |
2023-03-23 | PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360 |
Sizhe An et.al. | 2303.13071 | null |
2023-03-22 | Style Transfer for 2D Talking Head Animation | Trong-Thang Pham et.al. | 2303.09799 | link |
2023-03-22 | MARLIN: Masked Autoencoder for facial video Representation LearnINg | Zhixi Cai et.al. | 2211.06627 | link |
2023-03-14 | DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions | Geumbyeol Hwang et.al. | 2303.07697 | link |
2023-03-13 | SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation | Wenxuan Zhang et.al. | 2211.12194 | link |
2023-03-09 | FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning | Kazi Injamamul Haque et.al. | 2303.05416 | link |
2023-03-09 | Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation | Qi Chen et.al. | 2303.05322 | link |
2023-03-07 | DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video | Zhimeng Zhang et.al. | 2303.03988 | link |
2023-03-05 | Cyber Vaccine for Deepfake Immunity | Ching-Chun Chang et.al. | 2303.02659 | null |
2023-03-04 | High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors | Yunpeng Bai et.al. | 2211.15064 | null |
2023-03-01 | DPE: Disentanglement of Pose and Expression for General Video Portrait Editing | Youxin Pang et.al. | 2301.06281 | link |
2023-02-27 | Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video | Minsu Kim et.al. | 2303.08670 | null |
2023-02-27 | Memory-augmented Contrastive Learning for Talking Head Generation | Jianrong Wang et.al. | 2302.13469 | link |
2023-02-24 | Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertex Attention | Bin Liu et.al. | 2302.12532 | null |
2023-02-16 | OPT: One-shot Pose-Controllable Talking Head Generation | Jin Liu et.al. | 2302.08197 | null |
2023-02-14 | Expressive Talking Head Video Encoding in StyleGAN2 Latent-Space | Trevine Oorloff et.al. | 2203.14512 | link |
2023-01-31 | GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis | Zhenhui Ye et.al. | 2301.13430 | null |
2023-01-23 | Data standardization for robust lip sync | Chun Wang et.al. | 2202.06198 | null |
2023-01-20 | Neural Volumetric Blendshapes: Computationally Efficient Physics-Based Facial Blendshapes | Nicolas Wagner et.al. | 2212.14784 | null |
2023-01-15 | Learning Audio-Driven Viseme Dynamics for 3D Face Animation | Linchao Bao et.al. | 2301.06059 | null |
2022-12-30 | Imitator: Personalized Speech-driven 3D Facial Animation | Balamurugan Thambiraja et.al. | 2301.00023 | null |
2022-12-28 | All's well that FID's well? Result quality and metric scores in GAN models for lip-sychronization tasks | Carina Geldhauser et.al. | 2212.13810 | null |
2022-12-23 | Dubbing in Practice: A Large Scale Study of Human Localization With Insights for Automatic Dubbing | William Brannon et.al. | 2212.12137 | null |
2022-12-09 | Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers | Yasheng Sun et.al. | 2212.04970 | null |
2022-12-07 | Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors | Zhentao Yu et.al. | 2212.04248 | null |
2022-12-07 | SPACE: Speech-driven Portrait Animation with Controllable Expression | Siddharth Gururani et.al. | 2211.09809 | null |
2022-11-30 | Extracting Semantic Knowledge from GANs with Unsupervised Learning | Jianjin Xu et.al. | 2211.16710 | null |
2022-11-27 | VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild | Kun Cheng et.al. | 2211.14758 | null |
2022-11-26 | Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis | Duomin Wang et.al. | 2211.14506 | link |
2022-11-22 | Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition | Jiaxiang Tang et.al. | 2211.12368 | null |
2022-11-10 | On the role of Lip Articulation in Visual Speech Perception | Zakaria Aldeneh et.al. | 2203.10117 | null |
2022-11-03 | SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory | Se Jin Park et.al. | 2211.00924 | null |
2022-10-21 | Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection | Alexandros Haliassos et.al. | 2201.07131 | link |
2022-10-13 | Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors | Vladimir Iashin et.al. | 2210.07055 | link |
2022-10-13 | Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar | Aolan Sun et.al. | 2210.06877 | null |
2022-10-07 | Compressing Video Calls using Synthetic Talking Heads | Madhav Agarwal et.al. | 2210.03692 | null |
2022-10-07 | A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis | Yichen Han et.al. | 2210.03335 | null |
2022-10-06 | Audio-Visual Face Reenactment | Madhav Agarwal et.al. | 2210.02755 | link |
2022-10-06 | Finding Directions in GAN's Latent Space for Neural Face Reenactment | Stella Bounareli et.al. | 2202.00046 | link |
2022-10-04 | Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at Scale | Aditya Agarwal et.al. | 2208.09796 | null |
2022-09-29 | Facial Landmark Predictions with Applications to Metaverse | Qiao Han et.al. | 2209.14698 | link |
2022-09-27 | StyleMask: Disentangling the Style Space of StyleGAN2 for Neural Face Reenactment | Stella Bounareli et.al. | 2209.13375 | link |
2022-09-23 | EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model | Xinya Ji et.al. | 2205.15278 | null |
2022-09-21 | FNeVR: Neural Volume Rendering for Face Animation | Bohan Zeng et.al. | 2209.10340 | link |
2022-09-19 | AutoLV: Automatic Lecture Video Generator | Wenbin Wang et.al. | 2209.08795 | null |
2022-09-09 | Talking Head from Speech Audio using a Pre-trained Image Generator | Mohammed M. Alghamdi et.al. | 2209.04252 | null |
2022-09-07 | Restructurable Activation Networks | Kartikeya Bhardwaj et.al. | 2208.08562 | link |
2022-08-29 | StableFace: Analyzing and Improving Motion Stability for Talking Face Generation | Jun Ling et.al. | 2208.13717 | null |
2022-08-17 | Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors | Sindhu B Hegde et.al. | 2208.08118 | link |
2022-08-03 | Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control | Michail Christos Doukas et.al. | 2208.02210 | null |
2022-08-02 | Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer | Ailin Huang et.al. | 2206.12837 | link |
2022-08-01 | A Feasibility Study on Image Inpainting for Non-cleft Lip Generation from Patients with Cleft Lip | Shuang Chen et.al. | 2208.01149 | link |
2022-07-27 | A Hybrid Deep Animation Codec for Low-bitrate Video Conferencing | Goluck Konuko et.al. | 2207.13530 | null |
2022-07-24 | Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis | Shuai Shen et.al. | 2207.11770 | link |
2022-07-22 | Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos | Panagiotis P. Filntisis et.al. | 2207.11094 | link |
2022-07-20 | NARRATE: A Normal Assisted Free-View Portrait Stylizer | Youjia Wang et.al. | 2207.00974 | null |
2022-07-20 | VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection | Joanna Hong et.al. | 2206.07458 | null |
2022-07-20 | Responsive Listening Head Generation: A Benchmark Dataset and Baseline | Mohan Zhou et.al. | 2112.13548 | null |
2022-07-13 | FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis | Yongqi Wang et.al. | 2207.03800 | link |
2022-06-29 | Cut Inner Layers: A Structured Pruning Strategy for Efficient U-Net GANs | Bo-Kyeong Kim et.al. | 2206.14658 | null |
2022-06-09 | Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos | Alexander Waibel et.al. | 2206.04523 | null |
2022-05-31 | Text/Speech-Driven Full-Body Animation | Wenlin Zhuang et.al. | 2205.15573 | null |
2022-05-27 | Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast | Boqing Zhu et.al. | 2204.14057 | link |
2022-05-26 | One-Shot Face Reenactment on Megapixels | Wonjun Kang et.al. | 2205.13368 | null |
2022-05-24 | Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts | Debjoy Saha et.al. | 2205.12194 | link |
2022-05-20 | MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement | Alexander Richard et.al. | 2104.08223 | link |
2022-05-13 | Talking Face Generation with Multilingual TTS | Hyoung-Kyu Song et.al. | 2205.06421 | null |
2022-05-02 | Emotion-Controllable Generalized Talking Face Generation | Sanjana Sinha et.al. | 2205.01155 | null |
2022-05-02 | A Novel Speech-Driven Lip-Sync Model with CNN and LSTM | Xiaohong Li et.al. | 2205.00916 | null |
2022-04-27 | Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion | Sen Chen et.al. | 2204.12756 | null |
2022-04-25 | Fast Facial Landmark Detection and Applications: A Survey | Kostiantyn Khabarlak et.al. | 2101.10808 | null |
2022-04-13 | Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions | Zipeng Ye et.al. | 2204.06180 | null |
2022-04-06 | Transformer-S2A: Robust and Efficient Speech-to-Animation | Liyang Chen et.al. | 2111.09771 | null |
2022-04-03 | Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text | Pulkit Tandon et.al. | 2106.14014 | link |
2022-03-30 | End to End Lip Synchronization with a Temporal AutoEncoder | Yoav Shalev et.al. | 2203.16224 | link |
2022-03-29 | Thin-Plate Spline Motion Model for Image Animation | Jian Zhao et.al. | 2203.14367 | link |
2022-03-17 | StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN | Fei Yin et.al. | 2203.04036 | link |
2022-03-17 | FaceFormer: Speech-Driven 3D Facial Animation with Transformers | Yingruo Fan et.al. | 2112.05329 | link |
2022-03-16 | Efficient conditioned face animation using frontally-viewed embedding | Maxime Oquab et.al. | 2203.08765 | null |
2022-03-15 | Depth-Aware Generative Adversarial Network for Talking Head Video Generation | Fa-Ting Hong et.al. | 2203.06605 | link |
2022-03-10 | An Audio-Visual Attention Based Multimodal Network for Fake Talking Face Videos Detection | Ganglai Wang et.al. | 2203.05178 | null |
2022-03-08 | Attention-Based Lip Audio-Visual Synthesis for Talking Face Generation in the Wild | Ganglai Wang et.al. | 2203.03984 | null |
2022-03-04 | Multi-modality Deep Restoration of Extremely Compressed Face Videos | Xi Zhang et.al. | 2107.05548 | null |
2022-03-01 | FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset | Hasam Khalid et.al. | 2108.05080 | link |
2022-02-25 | FSGANv2: Improved Subject Agnostic Face Swapping and Reenactment | Yuval Nirkin et.al. | 2202.12972 | null |
2022-02-22 | Thinking the Fusion Strategy of Multi-reference Face Reenactment | Takuya Yashima et.al. | 2202.10758 | null |
2022-01-24 | Selective Listening by Synchronizing Speech with Lips | Zexu Pan et.al. | 2106.07150 | link |
2022-01-22 | Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary | Sibo Zhang et.al. | 2104.14631 | null |
2022-01-21 | Stitch it in Time: GAN-Based Facial Editing of Real Videos | Rotem Tzaban et.al. | 2201.08361 | link |
2022-01-17 | Towards Realistic Visual Dubbing with Heterogeneous Sources | Tianyi Xie et.al. | 2201.06260 | null |
2022-01-16 | Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels | Zipeng Ye et.al. | 2201.05986 | null |
2022-01-03 | DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering | Shunyu Yao et.al. | 2201.00791 | null |
2021-12-20 | Parallel and High-Fidelity Text-to-Lip Generation | Jinglin Liu et.al. | 2107.06831 | link |
2021-12-19 | Initiative Defense against Facial Manipulation | Qidong Huang et.al. | 2112.10098 | link |
2021-12-07 | Joint Audio-Text Model for Expressive Speech-Driven 3D Facial Animation | Yingruo Fan et.al. | 2112.02214 | null |
2021-12-06 | One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning | Suzhen Wang et.al. | 2112.02749 | null |
2021-11-29 | Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates | Shenhan Qian et.al. | 2108.08020 | link |
2021-11-04 | FEAFA+: An Extended Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation | Wei Gan et.al. | 2111.02751 | null |
2021-11-02 | BiosecurID: a multimodal biometric database | Julian Fierrez et.al. | 2111.03472 | null |
2021-10-30 | Imitating Arbitrary Talking Style for Realistic Audio-DrivenTalking Face Synthesis | Haozhe Wu et.al. | 2111.00203 | link |
2021-10-26 | Emotion recognition in talking-face videos using persistent entropy and neural networks | Eduardo Paluzo-Hidalgo et.al. | 2110.13571 | link |
2021-10-26 | ViDA-MAN: Visual Dialog with Digital Humans | Tong Shen et.al. | 2110.13384 | null |
2021-10-22 | Invertible Frowns: Video-to-Video Facial Emotion Translation | Ian Magnusson et.al. | 2109.08061 | null |
2021-10-19 | Talking Head Generation with Audio and Speech Related Facial Action Units | Sen Chen et.al. | 2110.09951 | null |
2021-10-16 | Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor | Anchit Gupta et.al. | 2110.08580 | null |
2021-10-12 | Fine-grained Identity Preserving Landmark Synthesis for Face Reenactment | Haichao Zhang et.al. | 2110.04708 | null |
2021-10-07 | Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution | Yangyang Shi et.al. | 2110.05241 | null |
2021-09-24 | Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation | Yuanxun Lu et.al. | 2109.10595 | null |
2021-09-20 | Accurate, Interpretable, and Fast Animation: An Iterative, Sparse, and Nonconvex Approach | Stevo Rackovic et.al. | 2109.08356 | null |
2021-09-17 | Detection of GAN-synthesized street videos | Omran Alamayreh et.al. | 2109.04991 | null |
2021-08-30 | Audiovisual Speech Synthesis using Tacotron2 | Ahmed Hussen Abdelaziz et.al. | 2008.00620 | null |
2021-08-23 | KoDF: A Large-scale Korean DeepFake Detection Dataset | Patrick Kwon et.al. | 2103.10094 | null |
2021-08-23 | HeadGAN: One-shot Neural Head Synthesis and Editing | Michail Christos Doukas et.al. | 2012.08261 | null |
2021-08-19 | AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis | Yudong Guo et.al. | 2103.11078 | link |
2021-08-18 | DeepFake MNIST+: A DeepFake Facial Animation Dataset | Jiajun Huang et.al. | 2108.07949 | link |
2021-08-18 | FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning | Chenxu Zhang et.al. | 2108.07938 | link |
2021-08-12 | UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing | Meng Cao et.al. | 2108.05650 | null |
2021-08-11 | AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person | Xinsheng Wang et.al. | 2108.04325 | null |
2021-08-06 | SofGAN: A Portrait Image Generator with Dynamic Styling | Anpei Chen et.al. | 2007.03780 | link |
2021-07-27 | Beyond Voice Identity Conversion: Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations | Laurent Benaroya et.al. | 2107.12346 | null |
2021-07-21 | Speech Driven Talking Face Generation from a Single Image and an Emotion Condition | Sefik Emre Eskimez et.al. | 2008.03592 | link |
2021-07-20 | Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion | Suzhen Wang et.al. | 2107.09293 | link |
2021-07-10 | Speech2Video: Cross-Modal Distillation for Speech to Video Generation | Shijing Si et.al. | 2107.04806 | null |
2021-07-07 | Egocentric Videoconferencing | Mohamed Elgharib et.al. | 2107.03109 | null |
2021-06-08 | LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization | Avisek Lahiri et.al. | 2106.04185 | null |
2021-05-20 | Audio-Driven Emotional Video Portraits | Xinya Ji et.al. | 2104.07452 | null |
2021-05-07 | Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation | Lincheng Li et.al. | 2104.07995 | link |
2021-05-05 | A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors | Ruobing Zheng et.al. | 2002.08700 | null |
2021-04-29 | Learned Spatial Representations for Few-shot Talking-Head Synthesis | Moustafa Meshry et.al. | 2104.14557 | null |
2021-04-26 | One-shot Face Reenactment Using Appearance Adaptive Normalization | Guangming Yao et.al. | 2102.03984 | null |
2021-04-25 | 3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head | Qianyun Wang et.al. | 2104.12051 | null |
2021-04-22 | Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation | Hang Zhou et.al. | 2104.11116 | link |
2021-04-07 | Single Source One Shot Reenactment using Weighted motion From Paired Feature Points | Soumya Tripathy et.al. | 2104.03117 | null |
2021-04-07 | Everything's Talkin': Pareidolia Face Reenactment | Linsen Song et.al. | 2104.03061 | link |
2021-04-07 | LI-Net: Large-Pose Identity-Preserving Face Reenactment Network | Jin Liu et.al. | 2104.02850 | null |
2021-04-02 | One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing | Ting-Chun Wang et.al. | 2011.15126 | null |
2021-03-20 | Not made for each other- Audio-Visual Dissonance-based Deepfake Detection and Localization | Komal Chugh et.al. | 2005.14405 | link |
2021-03-19 | End-to-End Lip Synchronisation Based on Pattern Classification | You Jin Kim et.al. | 2005.08606 | null |
2021-03-05 | Real-time RGBD-based Extended Body Pose Estimation | Renat Bashirov et.al. | 2103.03663 | link |
2021-03-03 | Estimating Uniqueness of I-Vector Representation of Human Voice | Erkam Sinan Tandogan et.al. | 2008.11985 | null |
2021-02-25 | MakeItTalk: Speaker-Aware Talking-Head Animation | Yang Zhou et.al. | 2004.12992 | null |
2021-02-19 | One Shot Audio to Animated Video Generation | Neeraj Kumar et.al. | 2102.09737 | null |
2021-02-18 | AudioVisual Speech Synthesis: A brief literature review | Efthymios Georgiou et.al. | 2103.03927 | null |
2020-12-14 | Robust One Shot Audio to Video Generation | Neeraj Kumar et.al. | 2012.07842 | null |
2020-12-14 | Multi Modal Adaptive Normalization for Audio to Video Generation | Neeraj Kumar et.al. | 2012.07304 | null |
2020-11-30 | Adaptive Compact Attention For Few-shot Video-to-video Translation | Risheng Huang et.al. | 2011.14695 | null |
2020-11-21 | Stochastic Talking Face Generation Using Latent Distribution Matching | Ravindra Yadav et.al. | 2011.10727 | link |
2020-11-21 | Iterative Text-based Editing of Talking-heads Using Neural Retargeting | Xinwei Yao et.al. | 2011.10688 | null |
2020-11-09 | FACEGAN: Facial Attribute Controllable rEenactment GAN | Soumya Tripathy et.al. | 2011.04439 | null |
2020-11-06 | Large-scale multilingual audio visual dubbing | Yi Yang et.al. | 2011.03530 | null |
2020-11-02 | Facial Keypoint Sequence Generation from Audio | Prateek Manocha et.al. | 2011.01114 | null |
2020-10-25 | APB2FaceV2: Real-Time Audio-Guided Multi-Face Reenactment | Jiangning Zhang et.al. | 2010.13017 | link |
2020-10-12 | Intuitive Facial Animation Editing Based On A Generative RNN Framework | Eloïse Berson et.al. | 2010.05655 | null |
2020-10-05 | SMILE: Semantically-guided Multi-attribute Image and Layout Editing | Andrés Romero et.al. | 2010.02315 | link |
2020-10-05 | Dynamic Facial Asset and Rig Generation from a Single Scan | Jiaman Li et.al. | 2010.00560 | null |
2020-09-20 | An Improved Approach of Intention Discovery with Machine Learning for POMDP-based Dialogue Management | Ruturaj Raval et.al. | 2009.09354 | null |
2020-09-18 | Mesh Guided One-shot Face Reenactment using Graph Convolutional Networks | Guangming Yao et.al. | 2008.07783 | null |
2020-09-12 | DualLip: A System for Joint Lip Reading and Generation | Weicong Chen et.al. | 2009.05784 | null |
2020-09-02 | Seeing wake words: Audio-visual Keyword Spotting | Liliane Momeni et.al. | 2009.01225 | null |
2020-08-29 | "It took me almost 30 minutes to practice this". Performance and Production Practices in Dance Challenge Videos on TikTok | Daniel Klug et.al. | 2008.13040 | null |
2020-08-23 | A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild | K R Prajwal et.al. | 2008.10010 | link |
2020-08-11 | Audio- and Gaze-driven Facial Animation of Codec Avatars | Alexander Richard et.al. | 2008.05023 | null |
2020-08-04 | Speaker dependent acoustic-to-articulatory inversion using real-time MRI of the vocal tract | Tamás Gábor Csapó et.al. | 2008.02098 | link |
2020-08-04 | Real-Time Cleaning and Refinement of Facial Animation Signals | Eloïse Berson et.al. | 2008.01332 | null |
2020-08-02 | Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos | Yanhui Guo et.al. | 2008.01652 | null |
2020-07-29 | Neural Voice Puppetry: Audio-driven Facial Reenactment | Justus Thies et.al. | 1912.05566 | link |
2020-07-20 | Deformable Style Transfer | Sunnie S. Y. Kim et.al. | 2003.11038 | link |
2020-07-18 | A Robust Interactive Facial Animation Editing System | Eloïse Berson et.al. | 2007.09367 | null |
2020-07-16 | Talking-head Generation with Rhythmic Head Motion | Lele Chen et.al. | 2007.08547 | link |
2020-07-08 | Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision | Abhinav Shukla et.al. | 2007.04134 | null |
2020-06-20 | Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams | Huirong Huang et.al. | 2006.11610 | null |
2020-05-27 | Modality Dropout for Improved Performance-driven Talking Faces | Ahmed Hussen Abdelaziz et.al. | 2005.13616 | null |
2020-05-25 | Identity-Preserving Realistic Talking Face Generation | Sanjana Sinha et.al. | 2005.12318 | null |
2020-05-22 | Head2Head: Video-based Neural Head Synthesis | Mohammad Rami Koujan et.al. | 2005.10954 | null |
2020-05-16 | FReeNet: Multi-Identity Face Reenactment | Jiangning Zhang et.al. | 1905.11805 | null |
2020-05-13 | FaR-GAN for One-Shot Face Reenactment | Hanxiang Hao et.al. | 2005.06402 | null |
2020-05-13 | Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning | Hao Zhu et.al. | 1812.06589 | null |
2020-05-11 | Dancing to the Partisan Beat: A First Analysis of Political Communication on TikTok | Juan Carlos Medina Serrano et.al. | 2004.05478 | link |
2020-05-07 | What comprises a good talking-head video generation?: A Survey and Benchmark | Lele Chen et.al. | 2005.03201 | link |
2020-05-04 | Disentangled Speech Embeddings using Cross-modal Self-supervision | Arsha Nagrani et.al. | 2002.08742 | null |
2020-04-30 | APB2Face: Audio-guided face reenactment with auxiliary pose and blink signals | Jiangning Zhang et.al. | 2004.14569 | null |
2020-03-30 | ActGAN: Flexible and Efficient One-shot Face Reenactment | Ivan Kosarevych et.al. | 2003.13840 | null |
2020-03-29 | Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose | Xianfang Zeng et.al. | 2003.12957 | null |
2020-03-26 | High-Accuracy Facial Depth Models derived from 3D Synthetic Data | Faisal Khan et.al. | 2003.06211 | null |
2020-03-05 | Talking-Heads Attention | Noam Shazeer et.al. | 2003.02436 | link |
2020-03-05 | Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose | Ran Yi et.al. | 2002.10137 | link |
2020-03-01 | Towards Automatic Face-to-Face Translation | Prajwal K R et.al. | 2003.00418 | link |
2020-02-19 | Speech-driven facial animation using polynomial fusion of features | Triantafyllos Kefalas et.al. | 1912.05833 | null |
2020-01-17 | ICface: Interpretable and Controllable Face Reenactment Using GANs | Soumya Tripathy et.al. | 1904.01909 | null |
2019-12-20 | Disentangling Style and Content in Anime Illustrations | Sitao Xiang et.al. | 1905.10742 | null |
2019-11-21 | FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis | Kuangxiao Gu et.al. | 1911.09224 | null |
2019-11-19 | MarioNETte: Few-shot Face Reenactment Preserving Identity of Unseen Targets | Sungjoo Ha et.al. | 1911.08139 | null |
2019-10-28 | Few-shot Video-to-Video Synthesis | Ting-Chun Wang et.al. | 1910.12713 | null |
2019-10-19 | Real-Time Lip Sync for Live 2D Animation | Deepali Aneja et.al. | 1910.08685 | link |
2019-10-16 | Designing Style Matching Conversational Agents | Deepali Aneja et.al. | 1910.07514 | null |
2019-10-15 | A High-Fidelity Open Embodied Avatar with Lip Syncing and Expression Capabilities | Deepali Aneja et.al. | 1909.08766 | link |
2019-10-09 | EmoCo: Visual Analysis of Emotion Coherence in Presentation Videos | Haipeng Zeng et.al. | 1907.12918 | null |
2019-10-02 | Animating Face using Disentangled Audio Representations | Gaurav Mittal et.al. | 1910.00726 | null |
2019-09-25 | Few-Shot Adversarial Learning of Realistic Neural Talking Head Models | Egor Zakharov et.al. | 1905.08233 | null |
2019-09-06 | Neural Style-Preserving Visual Dubbing | Hyeongwoo Kim et.al. | 1909.02518 | null |
2019-08-29 | 3D Face Pose and Animation Tracking via Eigen-Decomposition based Bayesian Approach | Ngoc-Trung Tran et.al. | 1908.11039 | null |
2019-08-20 | Prosodic Phrase Alignment for Machine Dubbing | Alp Öktem et.al. | 1908.07226 | link |
2019-08-16 | FSGAN: Subject Agnostic Face Swapping and Reenactment | Yuval Nirkin et.al. | 1908.05932 | link |
2019-08-11 | Emotion Dependent Facial Animation from Affective Speech | Rizwan Sadiq et.al. | 1908.03904 | null |
2019-08-05 | One-shot Face Reenactment | Yunxuan Zhang et.al. | 1908.03251 | link |
2019-07-25 | Talking Face Generation by Conditional Recurrent Adversarial Network | Yang Song et.al. | 1804.04786 | link |
2019-07-24 | Data-Driven Physical Face Inversion | Yeara Kozlov et.al. | 1907.10402 | null |
2019-07-23 | A system for efficient 3D printed stop-motion face animation | Rinat Abdrashitov et.al. | 1907.10163 | null |
2019-06-14 | Realistic Speech-Driven Facial Animation with GANs | Konstantinos Vougioukas et.al. | 1906.06337 | null |
2019-06-04 | Text-based Editing of Talking-head Video | Ohad Fried et.al. | 1906.01524 | null |
2019-05-27 | Audio2Face: Generating Speech/Face Animation from Single Audio with Attention-Based Bidirectional LSTM Networks | Guanzhong Tian et.al. | 1905.11142 | null |
2019-05-09 | Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss | Lele Chen et.al. | 1905.03820 | link |
2019-05-08 | Capture, Learning, and Synthesis of 3D Speaking Styles | Daniel Cudeiro et.al. | 1905.03079 | link |
2019-04-23 | Talking Face Generation by Adversarially Disentangled Audio-Visual Representation | Hang Zhou et.al. | 1807.07860 | null |
2019-04-02 | FEAFA: A Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation | Yanfu Yan et.al. | 1904.01509 | null |
2019-03-13 | Animating an Autonomous 3D Talking Avatar | Dominik Borer et.al. | 1903.05448 | null |
2018-12-22 | Deep Audio-Visual Speech Recognition | Triantafyllos Afouras et.al. | 1809.02108 | null |
2018-12-20 | DeepFakes: a New Threat to Face Recognition? Assessment and Detection | Pavel Korshunov et.al. | 1812.08685 | null |
2018-11-22 | Towards Highly Accurate and Stable Face Alignment for High-Resolution Videos | Ying Tai et.al. | 1811.00342 | link |
2018-11-16 | Influence of visual cues on head and eye movements during listening tasks in multi-talker audiovisual environments with animated characters | Maartje M. E. Hendrikse et.al. | 1812.02088 | null |
2018-08-28 | GANimation: Anatomically-aware Facial Animation from a Single Image | Albert Pumarola et.al. | 1807.09251 | link |
2018-08-19 | Dynamic Temporal Alignment of Speech to Lips | Tavi Halperin et.al. | 1808.06250 | link |
2018-07-29 | ReenactGAN: Learning to Reenact Faces via Boundary Transfer | Wayne Wu et.al. | 1807.11079 | link |
2018-07-26 | Learnable PINs: Cross-Modal Embeddings for Person Identity | Arsha Nagrani et.al. | 1805.00833 | null |
2018-07-19 | End-to-End Speech-Driven Facial Animation with Temporal GANs | Konstantinos Vougioukas et.al. | 1805.09313 | null |
2018-05-29 | Deep Video Portraits | Hyeongwoo Kim et.al. | 1805.11714 | null |
2018-05-24 | VisemeNet: Audio-Driven Animator-Centric Speech Animation | Yang Zhou et.al. | 1805.09488 | null |
2018-05-21 | Anime Style Space Exploration Using Metric Learning and Generative Adversarial Networks | Sitao Xiang et.al. | 1805.07997 | null |
2018-04-23 | Generating Talking Face Landmarks from Speech | Sefik Emre Eskimez et.al. | 1803.09803 | null |
2018-03-28 | Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network | Hai X. Pham et.al. | 1803.07716 | null |
2018-03-20 | Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks | Seyed Ali Jalalifar et.al. | 1803.07461 | null |
2017-12-07 | End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech | Hai X. Pham et.al. | 1710.00920 | null |
2017-12-06 | ObamaNet: Photo-realistic lip-sync from text | Rithesh Kumar et.al. | 1801.01442 | null |
2017-07-30 | Kernel Projection of Latent Structures Regression for Facial Animation Retargeting | Christos Ouzounis et.al. | 1707.09629 | null |
2017-07-26 | Fast Deep Matting for Portrait Animation on Mobile Phone | Bingke Zhu et.al. | 1707.08289 | null |
2017-07-21 | Multichannel Attention Network for Analyzing Visual Behavior in Public Speaking | Rahul Sharma et.al. | 1707.06830 | null |
2017-07-18 | You said that? | Joon Son Chung et.al. | 1705.02966 | null |
2017-01-30 | Lip Reading Sentences in the Wild | Joon Son Chung et.al. | 1611.05358 | link |
2016-10-28 | Galaxy gas as obscurer: II. Separating the galaxy-scale and nuclear obscurers of Active Galactic Nuclei | Johannes Buchner et.al. | 1610.09380 | link |
2016-07-11 | Large-Scale MIMO is Capable of Eliminating Power-Thirsty Channel Coding for Wireless Transmission of HEVC/H.265 Video | Shaoshi Yang et.al. | 1601.06684 | null |
2016-05-22 | Improving Facial Analysis and Performance Driven Animation through Disentangling Identity and Expression | David Rim et.al. | 1512.08212 | null |
2016-02-08 | Automatic Face Reenactment | Pablo Garrido et.al. | 1602.02651 | null |
2015-11-20 | ExpressionBot: An Emotive Lifelike Robotic Face for Face-to-Face Communication | Ali Mollahosseini et.al. | 1511.06502 | null |
2014-09-03 | Visual Speech Recognition | Ahmad B. A. Hassanat et.al. | 1409.1411 | null |
2012-09-22 | Using multimodal speech production data to evaluate articulatory animation for audiovisual speech synthesis | Ingmar Steiner et.al. | 1209.4982 | null |
2012-03-30 | Face Expression Recognition and Analysis: The State of the Art | Vinay Bettadapura et.al. | 1203.6722 | null |
2012-01-19 | Progress in animation of an EMA-controlled tongue model for acoustic-visual speech synthesis | Ingmar Steiner et.al. | 1201.4080 | null |
2010-03-01 | Re-verification of a Lip Synchronization Protocol using Robust Reachability | Piotr Kordy et.al. | 1003.0431 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-05-30 | MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation | Yanbo Ding et.al. | 2505.10238 | link |
2025-05-29 | HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions | Shuolin Xu et.al. | 2505.22977 | link |
2025-05-24 | EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation | Qiang Qu et.al. | 2503.18552 | null |
2025-05-18 | DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation | Haoyu Zhao et.al. | 2503.21246 | link |
2025-05-13 | TT-DF: A Large-Scale Diffusion-Based Dataset and Benchmark for Human Body Forgery Detection | Wenkui Yang et.al. | 2505.08437 | link |
2025-04-28 | AnimateAnywhere: Rouse the Background in Human Image Animation | Xiaoyu Liu et.al. | 2504.19834 | null |
2025-04-20 | DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance | Yuxuan Luo et.al. | 2504.01724 | null |
2025-04-15 | UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer | Xiang Wang et.al. | 2504.11289 | link |
2025-04-15 | Taming Consistency Distillation for Accelerated Human Image Animation | Xiang Wang et.al. | 2504.11143 | null |
2025-04-05 | Multi-identity Human Image Animation with Structural Video Diffusion | Zhenzhi Wang et.al. | 2504.04126 | null |
2025-04-04 | Optimizing 4D Gaussians for Dynamic Scene Video from Single Landscape Images | In-Hwan Jin et.al. | 2504.05458 | link |
2025-04-01 | VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer | Xinyu Liu et.al. | 2502.05979 | null |
2025-03-23 | MotiF: Making Text Count in Image Animation with Motion Focal Loss | Shijie Wang et.al. | 2412.16153 | null |
2025-03-13 | Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer | Jiahao Cui et.al. | 2412.00733 | link |
2025-03-10 | Perception-as-Control: Fine-grained Controllable Image Animation with 3D-aware Motion Representation | Yingjie Chen et.al. | 2501.05020 | null |
2025-02-25 | DisPose: Disentangling Pose Guidance for Controllable Human Image Animation | Hongxiang Li et.al. | 2412.09349 | link |
2025-02-24 | X-Dancer: Expressive Music to Human Dance Video Generation | Zeyuan Chen et.al. | 2502.17414 | null |
2025-02-15 | SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers | Di Qiu et.al. | 2502.10841 | link |
2025-02-10 | Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance | Li Hu et.al. | 2502.06145 | null |
2025-02-06 | MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation | Jinbo Xing et.al. | 2502.04299 | null |
2025-01-30 | Every Image Listens, Every Image Dances: Music-Driven Image Animation | Zhikang Dong et.al. | 2501.18801 | null |
2025-01-20 | X-Dyna: Expressive Dynamic Human Image Animation | Di Chang et.al. | 2501.10021 | link |
2025-01-15 | Joint Learning of Depth and Appearance for Portrait Image Animation | Xinya Ji et.al. | 2501.08649 | null |
2024-12-11 | Animate-X: Universal Character Image Animation with Enhanced Motion Representation | Shuai Tan et.al. | 2410.10306 | null |
2024-12-04 | FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait | Taekyung Ki et.al. | 2412.01064 | null |
2024-11-30 | DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses | Yatian Pang et.al. | 2412.00397 | null |
2024-11-28 | JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation | Xuyang Cao et.al. | 2411.09209 | link |
2024-11-27 | StableAnimator: High-Quality Identity-Preserving Human Image Animation | Shuyuan Tu et.al. | 2411.17697 | link |
2024-11-24 | LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis | Haojie Zhang et.al. | 2411.16748 | null |
2024-11-21 | HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation | Zhenzhi Wang et.al. | 2407.17438 | link |
2024-10-31 | TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation | Sunjae Yoon et.al. | 2410.24037 | null |
2024-10-20 | FrameBridge: Improving Image-to-Video Generation with Bridge Models | Yuji Wang et.al. | 2410.15371 | null |
2024-10-14 | Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation | Jiahao Cui et.al. | 2410.07718 | link |
2024-09-30 | Illustrious: an Open Advanced Illustration Model | Sang Hyun Park et.al. | 2409.19946 | null |
2024-09-29 | High Quality Human Image Animation using Regional Supervision and Motion Blur Condition | Zhongcong Xu et.al. | 2409.19580 | null |
2024-09-22 | Dormant: Defending against Pose-driven Human Image Animation | Jiachen Zhou et.al. | 2409.14424 | link |
2024-07-23 | Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models | Xin Ma et.al. | 2407.15642 | link |
2024-07-12 | TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models | Jeongho Kim et.al. | 2407.09012 | null |
2024-07-12 | EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions | Zhiyuan Chen et.al. | 2407.08136 | link |
2024-07-11 | MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model | Muyao Niu et.al. | 2405.20222 | link |
2024-06-16 | Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation | Mingwang Xu et.al. | 2406.08801 | null |
2024-06-13 | Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control | Jingyun Xue et.al. | 2406.03035 | null |
2024-06-03 | UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation | Xiang Wang et.al. | 2406.01188 | null |
2024-06-01 | Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance | Shenhao Zhu et.al. | 2403.14781 | link |
2024-05-29 | Evaluating the efectiveness of sonifcation in science education using Edukoi | Lucrezia Guiotto Nai Fovino et.al. | 2405.18908 | null |
2024-05-28 | VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation | Qilin Wang et.al. | 2405.18156 | null |
2024-05-28 | Controllable Longer Image Animation with Diffusion Models | Qiang Wang et.al. | 2405.17306 | null |
2024-03-25 | PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models | Yiming Zhang et.al. | 2312.13964 | link |
2024-03-13 | Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts | Yue Ma et.al. | 2403.08268 | link |
2024-03-08 | Audio-Synchronized Visual Animation | Lin Zhang et.al. | 2403.05659 | link |
2024-03-05 | Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation | Weijie Li et.al. | 2403.02827 | null |
2024-01-17 | Continuous Piecewise-Affine Based Motion Model for Image Animation | Hexiang Wang et.al. | 2401.09146 | link |
2024-01-03 | Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions | David Junhao Zhang et.al. | 2401.01827 | link |
2023-12-06 | AnimateZero: Video Diffusion Models are Zero-Shot Image Animators | Jiwen Yu et.al. | 2312.03793 | link |
2023-12-05 | LivePhoto: Real Image Animation with Text-guided Motion Control | Xi Chen et.al. | 2312.02928 | null |
2023-12-04 | AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance | Zuozhuo Dai et.al. | 2311.12886 | link |
2023-11-30 | Motion-Conditioned Image Animation for Video Editing | Wilson Yan et.al. | 2311.18827 | null |
2023-11-27 | MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model | Zhongcong Xu et.al. | 2311.16498 | null |
2023-11-27 | DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors | Jinbo Xing et.al. | 2310.12190 | link |
2023-11-19 | Differential Motion Evolution for Fine-Grained Motion Deformation in Unsupervised Image Animation | Peirong Liu et.al. | 2110.04658 | null |
2023-10-16 | LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation | Ruiqi Wu et.al. | 2310.10769 | link |
2023-10-11 | LEO: Generative Latent Image Animator for Human Video Synthesis | Yaohui Wang et.al. | 2305.03989 | link |
2023-09-26 | Text-Guided Synthesis of Eulerian Cinemagraphs | Aniruddha Mahapatra et.al. | 2307.03190 | link |
2023-09-25 | Automatic Animation of Hair Blowing in Still Portrait Photos | Wenpeng Xiao et.al. | 2309.14207 | null |
2023-07-10 | AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning | Yuwei Guo et.al. | 2307.04725 | link |
2023-07-09 | Predictive Coding For Animation-Based Video Compression | Goluck Konuko et.al. | 2307.04187 | null |
2023-04-12 | VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs | Moayed Haji Ali et.al. | 2304.06020 | null |
2023-03-10 | 3D Cinemagraphy from a Single Image | Xingyi Li et.al. | 2303.05724 | null |
2023-02-02 | Dreamix: Video Diffusion Models are General Video Editors | Eyal Molad et.al. | 2302.01329 | null |
2023-01-14 | Continuous odor profile monitoring to study olfactory navigation in small animals | Kevin S. Chen et.al. | 2301.05905 | null |
2022-11-30 | NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-shot Real Image Animation | Yu Yin et.al. | 2211.17235 | null |
2022-10-04 | Implicit Warping for Animation with Image Sets | Arun Mallya et.al. | 2210.01794 | null |
2022-09-28 | Motion Transformer for Unsupervised Image Animation | Jiale Tao et.al. | 2209.14024 | link |
2022-07-19 | Single Stage Virtual Try-on via Deformable Attention Flows | Shuai Bai et.al. | 2207.09161 | link |
2022-07-08 | Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation | Yucheng Suo et.al. | 2207.03714 | null |
2022-06-11 | Bayesian Statistics Guided Label Refurbishment Mechanism: Mitigating Label Noise in Medical Image Classification | Mengdi Gao et.al. | 2106.12284 | link |
2022-04-05 | Neural Fields in Visual Computing and Beyond | Yiheng Xie et.al. | 2111.11426 | null |
2022-03-29 | Thin-Plate Spline Motion Model for Image Animation | Jian Zhao et.al. | 2203.14367 | link |
2022-03-29 | Image Animation with Perturbed Masks | Yoav Shalev et.al. | 2011.06922 | link |
2022-03-25 | 3D GAN Inversion for Controllable Portrait Image Animation | Connor Z. Lin et.al. | 2203.13441 | null |
2022-03-17 | Latent Image Animator: Learning to Animate Images via Latent Space Navigation | Yaohui Wang et.al. | 2203.09043 | null |
2021-12-21 | Image Animation with Keypoint Mask | Or Toledano et.al. | 2112.10457 | link |
2021-12-19 | Move As You Like: Image Animation in E-Commerce Scenario | Borun Xu et.al. | 2112.13647 | null |
2021-12-17 | AI-Empowered Persuasive Video Generation: A Survey | Chang Liu et.al. | 2112.09401 | null |
2021-10-26 | Incremental Learning for Animal Pose Estimation using RBF k-DPP | Gaurav Kumar Nayak et.al. | 2110.13598 | null |
2021-09-03 | Sparse to Dense Motion Transfer for Face Image Animation | Ruiqi Zhao et.al. | 2109.00471 | null |
2021-08-18 | DeepFake MNIST+: A DeepFake Facial Animation Dataset | Jiajun Huang et.al. | 2108.07949 | link |
2021-06-23 | Analisis Kualitas Layanan Website E-Commerce Bukalapak Terhadap Kepuasan Pengguna Mahasiswa Universitas Bina Darma Menggunakan Metode Webqual 4.0 | Adellia et.al. | 2106.15342 | null |
2021-04-07 | Single Source One Shot Reenactment using Weighted motion From Paired Feature Points | Soumya Tripathy et.al. | 2104.03117 | null |
2021-03-22 | PriorityCut: Occlusion-guided Regularization for Warp-based Image Animation | Wai Ting Cheung et.al. | 2103.11600 | null |
2020-12-01 | Ultra-low bitrate video conferencing using deep image animation | Goluck Konuko et.al. | 2012.00346 | null |
2020-10-01 | First Order Motion Model for Image Animation | Aliaksandr Siarohin et.al. | 2003.00196 | link |
2020-08-27 | Deep Spatial Transformation for Pose-Guided Person Image Generation and Animation | Yurui Ren et.al. | 2008.12606 | link |
2019-08-30 | Animating Arbitrary Objects via Deep Motion Transfer | Aliaksandr Siarohin et.al. | 1812.08861 | link |
2018-10-09 | 3D model silhouette-based tracking in depth images for puppet suit dynamic video-mapping | Guillaume Caron et.al. | 1810.03956 | null |
2018-06-24 | A Design of FPGA Based Small Animal PET Real Time Digital Signal Processing and Correction Logic | Jiaming Lu et.al. | 1806.09117 | null |
2018-01-31 | RAPTOR I: Time-dependent radiative transfer in arbitrary spacetimes | Thomas Bronzwaer et.al. | 1801.10452 | null |
2016-06-23 | Gender and Interest Targeting for Sponsored Post Advertising at Tumblr | Mihajlo Grbovic et.al. | 1606.07189 | null |
2015-03-16 | Use of Effective Audio in E-learning Courseware | Kisor Ray et.al. | 1503.04837 | null |
2015-02-04 | Multimedia-Video for Learning | Kah Hean Chua et.al. | 1502.01090 | null |
2013-01-25 | Measurements of Martian Dust Devil Winds with HiRISE | David S. Choi et.al. | 1301.6130 | null |
2010-01-04 | Tutoring System for Dance Learning | Rajkumar Kannan et.al. | 1001.0440 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-06-25 | Video Perception Models for 3D Scene Synthesis | Rui Huang et.al. | 2506.20601 | null |
2025-06-25 | BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos | Jiahao Lin et.al. | 2506.20103 | null |
2025-06-24 | Radial Attention: |
Xingyang Li et.al. | 2506.19852 | null |
2025-06-24 | GenHSI: Controllable Generation of Human-Scene Interaction Videos | Zekun Li et.al. | 2506.19840 | null |
2025-06-24 | SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution | Liangbin Xie et.al. | 2506.19838 | null |
2025-06-24 | Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router | Yubo Huang et.al. | 2506.19833 | null |
2025-06-24 | Training-Free Motion Customization for Distilled Video Generators with Adaptive Test-Time Distillation | Jintao Rong et.al. | 2506.19348 | null |
2025-06-23 | VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory | Runjia Li et.al. | 2506.18903 | null |
2025-06-23 | From Virtual Games to Real-World Play | Wenqiang Sun et.al. | 2506.18901 | null |
2025-06-23 | FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation | Kaiyi Huang et.al. | 2506.18899 | null |
2025-06-23 | MinD: Unified Visual Imagination and Control via Hierarchical World Models | Xiaowei Chi et.al. | 2506.18897 | null |
2025-06-23 | OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation | Qijun Gan et.al. | 2506.18866 | null |
2025-06-23 | Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset | Zhuowei Chen et.al. | 2506.18851 | null |
2025-06-23 | Matrix-Game: Interactive World Foundation Model | Yifan Zhang et.al. | 2506.18701 | null |
2025-06-23 | RDPO: Real Data Preference Optimization for Physics Consistency Video Generation | Wenxu Qian et.al. | 2506.18655 | null |
2025-06-23 | BulletGen: Improving 4D Reconstruction with Bullet-Time Generation | Denys Rozumnyi et.al. | 2506.18601 | null |
2025-06-23 | VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning | Xuanyu Zhang et.al. | 2506.18564 | null |
2025-06-23 | Emergent Temporal Correspondences from Video Diffusion Transformers | Jisu Nam et.al. | 2506.17220 | link |
2025-06-21 | STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation | Jiamin Wang et.al. | 2506.13138 | null |
2025-06-20 | Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition | Jiaqi Li et.al. | 2506.17201 | null |
2025-06-20 | Seeing What Matters: Generalizable AI-generated Video Detection with Forensic-Oriented Augmentation | Riccardo Corvi et.al. | 2506.16802 | null |
2025-06-20 | Sekai: A Video Dataset towards World Exploration | Zhen Li et.al. | 2506.15675 | null |
2025-06-20 | Show-o2: Improved Native Unified Multimodal Models | Jinheng Xie et.al. | 2506.15564 | link |
2025-06-19 | VideoGAN-based Trajectory Proposal for Automated Vehicles | Annajoyce Mariani et.al. | 2506.16209 | link |
2025-06-19 | FastInit: Fast Noise Initialization for Temporally Consistent Video Generation | Chengyu Bai et.al. | 2506.16119 | null |
2025-06-19 | PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models | Tianchen Zhao et.al. | 2506.16054 | null |
2025-06-19 | Advanced Sign Language Video Generation with Compressed and Quantized Multi-Condition Tokenization | Cong Wang et.al. | 2506.15980 | link |
2025-06-18 | VideoMAR: Autoregressive Video Generatio with Continuous Tokens | Hu Yu et.al. | 2506.14168 | null |
2025-06-18 | Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models | Xuanchi Ren et.al. | 2506.09042 | link |
2025-06-17 | Causally Steered Diffusion for Automated Video Counterfactual Generation | Nikos Spyrou et.al. | 2506.14404 | link |
2025-06-17 | CausalDiffTab: Mixed-Type Causal-Aware Diffusion for Tabular Data Generation | Jia-Chen Zhang et.al. | 2506.14206 | null |
2025-06-16 | EchoShot: Multi-Shot Portrait Video Generation | Jiahao Wang et.al. | 2506.15838 | null |
2025-06-16 | UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions | Zhucun Xue et.al. | 2506.13691 | null |
2025-06-15 | iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer | Zhelun Shen et.al. | 2506.12847 | null |
2025-06-13 | SignAligner: Harmonizing Complementary Pose Modalities for Coherent Sign Language Generation | Xu Wang et.al. | 2506.11621 | null |
2025-06-12 | GenWorld: Towards Detecting AI-generated Real-world Simulation Videos | Weiliang Chen et.al. | 2506.10975 | null |
2025-06-12 | M4V: Multi-Modal Mamba for Text-to-Video Generation | Jiancheng Huang et.al. | 2506.10915 | null |
2025-06-12 | GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning | Xiaoyi Bao et.al. | 2506.10639 | null |
2025-06-12 | DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers | Lizhen Wang et.al. | 2506.10568 | null |
2025-06-12 | AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation | Haoyuan Shi et.al. | 2506.10540 | null |
2025-06-11 | AlignHuman: Improving Motion and Fidelity via Timestep-Segment Preference Optimization for Audio-Driven Human Animation | Chao Liang et.al. | 2506.11144 | null |
2025-06-11 | PlayerOne: Egocentric World Simulator | Yuanpeng Tu et.al. | 2506.09995 | null |
2025-06-11 | InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions | Zhenzhi Wang et.al. | 2506.09984 | null |
2025-06-11 | ReSim: Reliable World Simulation for Autonomous Driving | Jiazhi Yang et.al. | 2506.09981 | null |
2025-06-11 | DGAE: Diffusion-Guided Autoencoder for Efficient Latent Representation Learning | Dongxu Liu et.al. | 2506.09644 | null |
2025-06-11 | Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation | Shanchuan Lin et.al. | 2506.09350 | null |
2025-06-10 | Seedance 1.0: Exploring the Boundaries of Video Generation Models | Yu Gao et.al. | 2506.09113 | null |
2025-06-10 | FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation | Zheqi He et.al. | 2506.09081 | null |
2025-06-10 | VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks | Xinlong Chen et.al. | 2506.09079 | null |
2025-06-10 | MagCache: Fast Video Generation with Magnitude-Aware Cache | Zehong Ma et.al. | 2506.09045 | link |
2025-06-10 | Product of Experts for Visual Generation | Yunzhi Zhang et.al. | 2506.08894 | null |
2025-06-10 | HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation | Ziyao Huang et.al. | 2506.08797 | null |
2025-06-10 | RoboSwap: A GAN-driven Video Diffusion Framework For Unsupervised Robot Arm Swapping | Yang Bai et.al. | 2506.08632 | null |
2025-06-10 | How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models | Huixuan Zhang et.al. | 2506.08351 | null |
2025-06-10 | From Generation to Generalization: Emergent Few-Shot Learning in Video Diffusion Models | Pablo Acuaviva et.al. | 2506.07280 | null |
2025-06-09 | Seeing Voices: Generating A-Roll Video from Audio with Mirage | Aditi Sundararaman et.al. | 2506.08279 | null |
2025-06-09 | Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion | Xun Huang et.al. | 2506.08009 | null |
2025-06-09 | Dreamland: Controllable World Creation with Simulator and Generative Models | Sicheng Mo et.al. | 2506.08006 | null |
2025-06-09 | Audio-Sync Video Generation with Multi-Stream Temporal Control | Shuchen Weng et.al. | 2506.08003 | null |
2025-06-09 | Generative Modeling of Weights: Generalization or Memorization? | Boya Zeng et.al. | 2506.07998 | link |
2025-06-09 | Video Unlearning via Low-Rank Refusal Vector | Simone Facchiano et.al. | 2506.07891 | null |
2025-06-09 | EgoM2P: Egocentric Multimodal Multitask Pretraining | Gen Li et.al. | 2506.07886 | null |
2025-06-09 | PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement | Teng Hu et.al. | 2506.07848 | null |
2025-06-09 | Consistent Video Editing as Flow-Driven Image-to-Video Generation | Ge Wang et.al. | 2506.07713 | null |
2025-06-09 | Evaluating Robustness in Latent Diffusion Models via Embedding Level Augmentation | Boris Martirosyan et.al. | 2506.07706 | null |
2025-06-09 | Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers | Haosong Liu et.al. | 2506.05096 | null |
2025-06-08 | TV-LiVE: Training-Free, Text-Guided Video Editing via Layer Informed Vitality Exploitation | Min-Jung Kim et.al. | 2506.07205 | null |
2025-06-08 | Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models | Sangwon Jang et.al. | 2506.07177 | null |
2025-06-08 | Hi-VAE: Efficient Video Autoencoding with Global and Detailed Motion | Huaize Liu et.al. | 2506.07136 | null |
2025-06-07 | Self-Adapting Improvement Loops for Robotic Learning | Calvin Luo et.al. | 2506.06658 | null |
2025-06-06 | Restereo: Diffusion stereo video generation and restoration | Xingchang Huang et.al. | 2506.06023 | null |
2025-06-06 | LLIA -- Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models | Haojie Yu et.al. | 2506.05806 | null |
2025-06-06 | FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion | Akide Liu et.al. | 2506.04648 | null |
2025-06-05 | EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh | Tao Hu et.al. | 2506.05554 | null |
2025-06-05 | ContentV: Efficient Training of Video Generation Models with Limited Compute | Wenfeng Lin et.al. | 2506.05343 | null |
2025-06-05 | FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation | Huihan Wang et.al. | 2506.04956 | null |
2025-06-05 | DualX-VSR: Dual Axial Spatial |
Shuo Cao et.al. | 2506.04830 | null |
2025-06-05 | Follow-Your-Creation: Empowering 4D Creation through Video Inpainting | Yue Ma et.al. | 2506.04590 | null |
2025-06-05 | FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers | Xuanhua He et.al. | 2506.04213 | null |
2025-06-05 | SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios | Lingwei Dang et.al. | 2506.02444 | link |
2025-06-04 | LayerFlow: A Unified Model for Layer-aware Video Generation | Sihui Ji et.al. | 2506.04228 | null |
2025-06-04 | UNIC: Unified In-Context Video Editing | Zixuan Ye et.al. | 2506.04216 | null |
2025-06-04 | DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models | Ziyi Wu et.al. | 2506.03517 | null |
2025-06-03 | Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas | Austin Silveria et.al. | 2506.03275 | null |
2025-06-03 | IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation | Yuanze Lin et.al. | 2506.03150 | null |
2025-06-03 | Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval | Jiwen Yu et.al. | 2506.03141 | null |
2025-06-03 | CamCloneMaster: Enabling Reference-based Camera Control for Video Generation | Yawen Luo et.al. | 2506.03140 | null |
2025-06-03 | AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation | Lu Qiu et.al. | 2506.03126 | null |
2025-06-03 | DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation | Zhengyao Lv et.al. | 2506.03123 | null |
2025-06-03 | TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models | Chetwin Low et.al. | 2506.03099 | null |
2025-06-03 | SG2VID: Scene Graphs Enable Fine-Grained Control for Video Synthesis | Ssharvien Kumar Sivakumar et.al. | 2506.03082 | null |
2025-06-03 | ORV: 4D Occupancy-centric Robot Video Generation | Xiuyu Yang et.al. | 2506.03079 | link |
2025-06-03 | Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers | Pengtao Chen et.al. | 2506.03065 | null |
2025-06-03 | LinkTo-Anime: A 2D Animation Optical Flow Dataset from 3D Model Rendering | Xiaoyi Feng et.al. | 2506.02733 | null |
2025-06-03 | LumosFlow: Motion-Guided Long Video Generation | Jiahao Chen et.al. | 2506.02497 | null |
2025-06-02 | Motion aware video generative model | Bowen Xue et.al. | 2506.02244 | null |
2025-06-02 | Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control | Xiao Fu et.al. | 2506.01943 | null |
2025-06-02 | OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation | Sen Liang et.al. | 2506.01801 | null |
2025-06-02 | Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks | Tao Yang et.al. | 2506.01758 | null |
2025-06-02 | Respond Beyond Language: A Benchmark for Video Generation in Response to Realistic User Intents | Shuting Wang et.al. | 2506.01689 | null |
2025-06-02 | LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model | Xiaodong Wang et.al. | 2506.01546 | null |
2025-06-02 | Towards Scalable Video Anomaly Retrieval: A Synthetic Video-Text Benchmark | Shuyu Yang et.al. | 2506.01466 | null |
2025-06-02 | DiffuseSlide: Training-Free High Frame Rate Video Generation Diffusion | Geunmin Hwang et.al. | 2506.01454 | null |
2025-05-30 | MiniMax-Remover: Taming Bad Noise Helps Video Object Removal | Bojia Zi et.al. | 2505.24873 | null |
2025-05-30 | DreamDance: Animating Character Art via Inpainting Stable Gaussian Worlds | Jiaxu Zhang et.al. | 2505.24733 | null |
2025-05-30 | UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation | Yang-Tian Sun et.al. | 2505.24521 | null |
2025-05-30 | Interactive Video Generation via Domain Adaptation | Ishaan Rawal et.al. | 2505.24253 | null |
2025-05-30 | STORK: Improving the Fidelity of Mid-NFE Sampling for Diffusion and Flow Matching Models | Zheng Tan et.al. | 2505.24210 | link |
2025-05-29 | MAGREF: Masked Guidance for Any-Reference Video Generation | Yufan Deng et.al. | 2505.23742 | link |
2025-05-29 | VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos | Tingyu Song et.al. | 2505.23693 | link |
2025-05-29 | VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models | Xiangdong Zhang et.al. | 2505.23656 | link |
2025-05-29 | VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation | Shi-Xue Zhang et.al. | 2505.23484 | link |
2025-05-29 | Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis | Hengyuan Cao et.al. | 2505.23325 | null |
2025-05-29 | RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer | Liu Liu et.al. | 2505.23171 | null |
2025-05-29 | Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing | Tongtong Su et.al. | 2505.23134 | link |
2025-05-29 | MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation | Siyuan Wang et.al. | 2505.23120 | link |
2025-05-29 | GeoMan: Temporally Consistent Human Geometry Estimation using Image-to-Video Diffusion | Gwanghyun Kim et.al. | 2505.23085 | null |
2025-05-29 | MOVi: Training-free Text-conditioned Multi-Object Video Generation | Aimon Rahman et.al. | 2505.22980 | null |
2025-05-29 | HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions | Shuolin Xu et.al. | 2505.22977 | link |
2025-05-29 | Minute-Long Videos with Dual Parallelisms | Zeqing Wang et.al. | 2505.21070 | link |
2025-05-28 | ATI: Any Trajectory Instruction for Controllable Video Generation | Angtian Wang et.al. | 2505.22944 | null |
2025-05-28 | Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation | Zhe Kong et.al. | 2505.22647 | link |
2025-05-28 | Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers | Weilun Feng et.al. | 2505.22167 | null |
2025-05-28 | FaceEditTalker: Interactive Talking Head Generation with Facial Attribute Editing | Guanwen Feng et.al. | 2505.22141 | null |
2025-05-28 | LatentMove: Towards Complex Human Movement Video Generation | Ashkan Taghipour et.al. | 2505.22046 | null |
2025-05-28 | PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms | Yifei Xia et.al. | 2505.22016 | null |
2025-05-28 | Learning World Models for Interactive Video Generation | Taiye Chen et.al. | 2505.21996 | null |
2025-05-28 | SageAttention2++: A More Efficient Implementation of SageAttention2 | Jintao Zhang et.al. | 2505.21136 | link |
2025-05-28 | OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation | Shenghai Yuan et.al. | 2505.20292 | link |
2025-05-27 | HDRSDR-VQA: A Subjective Video Quality Dataset for HDR and SDR Comparative Evaluation | Bowen Chen et.al. | 2505.21831 | null |
2025-05-27 | Think Before You Diffuse: LLMs-Guided Physics-Aware Video Generation | Ke Zhang et.al. | 2505.21653 | null |
2025-05-27 | VideoMarkBench: Benchmarking Robustness of Video Watermarking | Zhengyuan Jiang et.al. | 2505.21620 | link |
2025-05-27 | Frame In-N-Out: Unbounded Controllable Image-to-Video Generation | Boyang Wang et.al. | 2505.21491 | null |
2025-05-27 | Dynamic Vision from EEG Brain Recordings: How much does EEG know? | Prajwal Singh et.al. | 2505.21385 | null |
2025-05-27 | RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy | Aiyue Chen et.al. | 2505.21036 | null |
2025-05-27 | Frame-Level Captions for Long Video Generation with Complex Multi Scenes | Guangcong Zheng et.al. | 2505.20827 | null |
2025-05-27 | Learning Generalizable Robot Policy with Human Demonstration Video as a Prompt | Xiang Zhu et.al. | 2505.20795 | null |
2025-05-27 | Photography Perspective Composition: Towards Aesthetic Perspective Recommendation | Lujian Yao et.al. | 2505.20655 | null |
2025-05-27 | Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training | Bolin Lai et.al. | 2505.20629 | null |
2025-05-27 | Dynamic-I2V: Exploring Image-to-Video Generation Models via Multimodal LLM | Peng Liu et.al. | 2505.19901 | null |
2025-05-26 | MotionPro: A Precise Motion Controller for Image-to-Video Generation | Zhongwei Zhang et.al. | 2505.20287 | null |
2025-05-26 | DriveCamSim: Generalizable Camera Simulation via Explicit Camera Modeling for Autonomous Driving | Wenchao Sun et.al. | 2505.19692 | link |
2025-05-26 | TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs | Juntong Wang et.al. | 2505.19535 | null |
2025-05-26 | The Role of Video Generation in Enhancing Data-Limited Action Understanding | Wei Li et.al. | 2505.19495 | null |
2025-05-26 | Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals | Nate Gillman et.al. | 2505.19386 | null |
2025-05-25 | From Single Images to Motion Policies via Video-Generation Environment Representations | Weiming Zhi et.al. | 2505.19306 | null |
2025-05-25 | SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation | Shenggan Cheng et.al. | 2505.19151 | null |
2025-05-25 | WorldEval: World Model as Real-World Robot Policies Evaluator | Yaxuan Li et.al. | 2505.19017 | null |
2025-05-25 | Geometry-guided Online 3D Video Synthesis with Multi-View Temporal Consistency | Hyunho Ha et.al. | 2505.18932 | null |
2025-05-25 | Interspatial Attention for Efficient 4D Human Video Generation | Ruizhi Shao et.al. | 2505.15800 | null |
2025-05-24 | Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation | Shuo Yang et.al. | 2505.18875 | null |
2025-05-24 | VORTA: Efficient Video Diffusion via Routing Sparse Attention | Wenhao Sun et.al. | 2505.18809 | link |
2025-05-24 | DVD-Quant: Data-free Video Diffusion Transformers Quantization | Zhiteng Li et.al. | 2505.18663 | link |
2025-05-24 | ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos | Xiaodong Wang et.al. | 2505.18650 | null |
2025-05-23 | WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions | Zizhang Li et.al. | 2505.18151 | null |
2025-05-23 | DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation | Junhao Chen et.al. | 2505.18078 | null |
2025-05-23 | SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain | Jiawei Zhou et.al. | 2505.17727 | null |
2025-05-23 | Scaling Image and Video Generation via Test-Time Evolutionary Search | Haoran He et.al. | 2505.17618 | null |
2025-05-23 | InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO | Xueji Fang et.al. | 2505.17574 | link |
2025-05-23 | Challenger: Affordable Adversarial Driving Video Generation | Zhiyuan Xu et.al. | 2505.15880 | null |
2025-05-22 | Temporal Differential Fields for 4D Motion Modeling via Image-to-Video Synthesis | Xin You et.al. | 2505.17333 | null |
2025-05-22 | Training-Free Efficient Video Generation via Dynamic Token Carving | Yuechen Zhang et.al. | 2505.16864 | link |
2025-05-22 | Action2Dialogue: Generating Character-Centric Narratives from Scene-Level Prompts | Taewon Kang et.al. | 2505.16819 | null |
2025-05-22 | MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM | Siwei Meng et.al. | 2505.16456 | null |
2025-05-21 | Generative AI for Autonomous Driving: A Review | Katharina Winter et.al. | 2505.15863 | null |
2025-05-21 | AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection | Zhipei Xu et.al. | 2505.15173 | null |
2025-05-21 | CineTechBench: A Benchmark for Cinematographic Technique Understanding and Generation | Xinran Wang et.al. | 2505.15145 | link |
2025-05-21 | BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation | Haiquan Wen et.al. | 2505.12620 | link |
2025-05-21 | Video-GPT via Next Clip Diffusion | Shaobin Zhuang et.al. | 2505.12489 | null |
2025-05-20 | Programmatic Video Prediction Using Large Language Models | Hao Tang et.al. | 2505.14948 | link |
2025-05-20 | Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers | Sucheng Ren et.al. | 2505.14687 | link |
2025-05-20 | LMP: Leveraging Motion Prior in Zero-Shot Video Generation with Diffusion Transformer | Changgu Chen et.al. | 2505.14167 | null |
2025-05-20 | Hunyuan-Game: Industrial-grade Intelligent Game Creation Model | Ruihuang Li et.al. | 2505.14135 | null |
2025-05-20 | MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation | Yanbo Ding et.al. | 2505.10238 | link |
2025-05-19 | FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance | Dian Shao et.al. | 2505.13437 | null |
2025-05-19 | MAGI-1: Autoregressive Video Generation at Scale | Sand. ai et.al. | 2505.13211 | link |
2025-05-19 | DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories | Joel Jang et.al. | 2505.12705 | link |
2025-05-19 | Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking | Zihan Su et.al. | 2505.12667 | null |
2025-05-18 | EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models | Hu Yue et.al. | 2505.09694 | link |
2025-05-17 | FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge | Xuan Shen et.al. | 2505.14709 | link |
2025-05-17 | DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance | Xuan Shen et.al. | 2505.14708 | link |
2025-05-17 | LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation | Jiarui Wang et.al. | 2505.12098 | link |
2025-05-17 | VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption | Tianxiong Zhong et.al. | 2505.12053 | null |
2025-05-17 | STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives | Bo Wang et.al. | 2505.08350 | null |
2025-05-16 | QVGen: Pushing the Limit of Quantized Video Generative Models | Yushi Huang et.al. | 2505.11497 | null |
2025-05-16 | Face Consistency Benchmark for GenAI Video | Michal Podstawski et.al. | 2505.11425 | null |
2025-05-16 | Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model | Wei Li et.al. | 2505.07449 | link |
2025-05-15 | ToonifyGB: StyleGAN-based Gaussian Blendshapes for 3D Stylized Head Avatars | Rui-Yang Ju et.al. | 2505.10072 | null |
2025-05-15 | Generating time-consistent dynamics with discriminator-guided image diffusion models | Philipp Hess et.al. | 2505.09089 | null |
2025-05-15 | Generative Pre-trained Autoregressive Diffusion Transformer | Yuan Zhang et.al. | 2505.07344 | null |
2025-05-14 | Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios | Huafeng Shi et.al. | 2505.10584 | null |
2025-05-13 | Generative AI for Autonomous Driving: Frontiers and Opportunities | Yuping Wang et.al. | 2505.08854 | link |
2025-05-13 | Symbolically-Guided Visual Plan Inference from Uncurated Video Data | Wenyan Yang et.al. | 2505.08444 | null |
2025-05-12 | DanceGRPO: Unleashing GRPO on Visual Generation | Zeyue Xue et.al. | 2505.07818 | null |
2025-05-12 | ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models | Ozgur Kara et.al. | 2505.07652 | null |
2025-05-11 | DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models | Junhao Xia et.al. | 2505.07057 | null |
2025-05-11 | BridgeIV: Bridging Customized Image and Video Generation through Test-Time Autoregressive Identity Propagation | Panwen Hu et.al. | 2505.06985 | null |
2025-05-10 | Jailbreaking the Text-to-Video Generative Models | Jiayang Liu et.al. | 2505.06679 | null |
2025-05-10 | ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images | Xianghao Kong et.al. | 2505.06537 | null |
2025-05-08 | 3D Scene Generation: A Survey | Beichen Wen et.al. | 2505.05474 | link |
2025-05-08 | T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models | Xuyang Guo et.al. | 2505.04946 | null |
2025-05-08 | HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation | Teng Hu et.al. | 2505.04512 | null |
2025-05-06 | Real-Time Person Image Synthesis Using a Flow Matching Model | Jiwoo Jeong et.al. | 2505.03562 | link |
2025-05-06 | Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights | Zhaiming Shen et.al. | 2505.03205 | null |
2025-05-04 | DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization | Wenchuan Wang et.al. | 2505.02192 | null |
2025-05-03 | GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting | Anushka Agarwal et.al. | 2505.01928 | null |
2025-05-03 | PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth | Bu Jin et.al. | 2505.01729 | null |
2025-05-02 | VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos | Zongxia Li et.al. | 2505.01481 | link |
2025-05-02 | FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis | Jiangtong Tan et.al. | 2505.01172 | link |
2025-05-01 | Controllable Weather Synthesis and Removal with Video Diffusion Models | Chih-Hao Lin et.al. | 2505.00704 | null |
2025-05-01 | T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation | Xuyang Guo et.al. | 2505.00337 | null |
2025-04-30 | Direct Motion Models for Assessing Generated Videos | Kelsey Allen et.al. | 2505.00209 | null |
2025-04-30 | Eye2Eye: A Simple Approach for Monocular-to-Stereo Video Synthesis | Michal Geyer et.al. | 2505.00135 | null |
2025-04-30 | ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction | Qihao Liu et.al. | 2504.21855 | null |
2025-04-30 | HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation | Haiyang Zhou et.al. | 2504.21650 | link |
2025-04-30 | Simple Visual Artifact Detection in Sora-Generated Videos | Misora Sugiyama et.al. | 2504.21334 | null |
2025-04-30 | Capturing Conditional Dependence via Auto-regressive Diffusion Models | Xunpeng Huang et.al. | 2504.21314 | null |
2025-04-29 | TesserAct: Learning 4D Embodied World Models | Haoyu Zhen et.al. | 2504.20995 | null |
2025-04-29 | DDPS: Discrete Diffusion Posterior Sampling for Paths in Layered Graphs | Hao Luan et.al. | 2504.20754 | null |
2025-04-29 | Advance Fake Video Detection via Vision Transformers | Joy Battocchio et.al. | 2504.20669 | null |
2025-04-28 | CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition | Quynh Phung et.al. | 2504.19894 | null |
2025-04-28 | DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer | Junpeng Jiang et.al. | 2504.19614 | null |
2025-04-26 | Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning | Yifan Xie et.al. | 2504.18810 | null |
2025-04-26 | Stealing Creator's Workflow: A Creator-Inspired Agentic Framework with Iterative Feedback Loop for Improved Scientific Short-form Generation | Jong Inn Park et.al. | 2504.18805 | null |
2025-04-25 | NoiseController: Towards Consistent Multi-view Video Generation via Noise Decomposition and Collaboration | Haotian Dong et.al. | 2504.18448 | null |
2025-04-25 | We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback | Minkyu Choi et.al. | 2504.17180 | null |
2025-04-24 | Dynamic Camera Poses and Where to Find Them | Chris Rockwell et.al. | 2504.17788 | null |
2025-04-24 | MV-Crafter: An Intelligent System for Music-guided Video Generation | Chuer Chen et.al. | 2504.17267 | null |
2025-04-24 | DIVE: Inverting Conditional Diffusion Models for Discriminative Tasks | Yinqi Li et.al. | 2504.17253 | link |
2025-04-23 | Subject-driven Video Generation via Disentangled Identity and Motion | Daneul Kim et.al. | 2504.17816 | null |
2025-04-23 | BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation | Ruotong Wang et.al. | 2504.16907 | null |
2025-04-23 | ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance | Ying Li et.al. | 2504.16464 | null |
2025-04-23 | VideoMark: A Distortion-Free Robust Watermarking Framework for Video Diffusion Models | Xuming Hu et.al. | 2504.16359 | null |
2025-04-22 | DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment | Xiaofan Li et.al. | 2504.18576 | link |
2025-04-22 | Survey of Video Diffusion Models: Foundations, Implementations, and Applications | Yimu Wang et.al. | 2504.16081 | link |
2025-04-22 | Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework | Xinyuan Song et.al. | 2504.16016 | null |
2025-04-22 | Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning | Wang Lin et.al. | 2504.15932 | null |
2025-04-22 | Satellite to GroundScape -- Large-scale Consistent Ground View Generation from Satellite Views | Ningli Xu et.al. | 2504.15786 | null |
2025-04-22 | DiTPainter: Efficient Video Inpainting with Diffusion Transformers | Xian Wu et.al. | 2504.15661 | null |
2025-04-21 | Solving New Tasks by Adapting Internet Video Knowledge | Calvin Luo et.al. | 2504.15369 | null |
2025-04-21 | Tiger200K: Manually Curated High Visual Quality Video Dataset from UGC Platform | Xianpan Zhou et.al. | 2504.15182 | null |
2025-04-21 | DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation | Weijie He et.al. | 2504.15032 | null |
2025-04-21 | Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation | Chenjie Cao et.al. | 2504.14899 | link |
2025-04-21 | SkyReels-V2: Infinite-length Film Generative Model | Guibin Chen et.al. | 2504.13074 | link |
2025-04-21 | Packing Input Frame Context in Next-Frame Prediction Models for Video Generation | Lvmin Zhang et.al. | 2504.12626 | link |
2025-04-20 | Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis | Jingjing Ren et.al. | 2504.14470 | null |
2025-04-19 | SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video Generation via Spherical Latent Representation | Minho Park et.al. | 2504.14396 | link |
2025-04-18 | Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting | Jiaxin Huang et.al. | 2504.11092 | null |
2025-04-17 | Understanding Attention Mechanism in Video Diffusion Models | Bingyan Liu et.al. | 2504.12027 | null |
2025-04-17 | VideoPanda: Video Panoramic Diffusion with Multi-view Attention | Kevin Xie et.al. | 2504.11389 | null |
2025-04-16 | VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate | Zhihang Yuan et.al. | 2504.12259 | link |
2025-04-16 | Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM | Zirui Pan et.al. | 2504.12048 | null |
2025-04-16 | The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation | Bingjie Gao et.al. | 2504.11739 | null |
2025-04-15 | InterAnimate: Taming Region-aware Diffusion Model for Realistic Human Interaction Animation | Yukang Lin et.al. | 2504.10905 | null |
2025-04-15 | OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding | Dianbing Xi et.al. | 2504.10825 | null |
2025-04-14 | H-MoRe: Learning Human-centric Motion Representation for Action Analysis | Zhanbo Huang et.al. | 2504.10676 | link |
2025-04-14 | H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models | Yushu Wu et.al. | 2504.10567 | null |
2025-04-14 | FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos | Rui Chen et.al. | 2504.10358 | null |
2025-04-14 | Aligning Anime Video Generation with Human Feedback | Bingwen Zhu et.al. | 2504.10044 | null |
2025-04-14 | EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise | Chao Liu et.al. | 2504.09789 | null |
2025-04-13 | CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models | Pooja Guhan et.al. | 2504.09472 | null |
2025-04-11 | Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model | Team Seawead et.al. | 2504.08685 | null |
2025-04-11 | Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization | Jialu Li et.al. | 2504.08641 | null |
2025-04-11 | Diffusion Models for Robotic Manipulation: A Survey | Rosa Wolf et.al. | 2504.08438 | null |
2025-04-11 | EasyGenNet: An Efficient Framework for Audio-Driven Gesture Video Generation Based on Diffusion Model | Renda Li et.al. | 2504.08344 | null |
2025-04-11 | RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements | Guangcong Zheng et.al. | 2504.08212 | link |
2025-04-11 | TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation | Ruineng Li et.al. | 2504.08181 | null |
2025-04-10 | Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction | Zeren Jiang et.al. | 2504.07961 | link |
2025-04-10 | Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos | Rundong Luo et.al. | 2504.07940 | null |
2025-04-10 | Diffusion Transformers for Tabular Data Time Series Generation | Fabrizio Garuti et.al. | 2504.07566 | link |
2025-04-09 | EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation | Diljeet Jagpal et.al. | 2504.06861 | null |
2025-04-09 | DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation | Wangbo Zhao et.al. | 2504.06803 | link |
2025-04-09 | RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism | Elia Peruzzo et.al. | 2504.06672 | null |
2025-04-09 | Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception | Ruotian Peng et.al. | 2504.06666 | null |
2025-04-08 | CamContextI2V: Context-aware Controllable Video Generation | Luis Denninger et.al. | 2504.06022 | link |
2025-04-08 | Physics-aware generative models for turbulent fluid flows through energy-consistent stochastic interpolants | Nikolaj T. Mücke et.al. | 2504.05852 | link |
2025-04-07 | One-Minute Video Generation with Test-Time Training | Karan Dalal et.al. | 2504.05298 | null |
2025-04-07 | Video-Bench: Human-Aligned Video Generation Benchmark | Hui Han et.al. | 2504.04907 | null |
2025-04-07 | Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation | Fa-Ting Hong et.al. | 2504.02542 | link |
2025-04-05 | Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization | Yikai Wang et.al. | 2504.04153 | link |
2025-04-05 | Multi-identity Human Image Animation with Structural Video Diffusion | Zhenzhi Wang et.al. | 2504.04126 | null |
2025-04-05 | Can You Count to Nine? A Human Evaluation Benchmark for Counting Limits in Modern Text-to-Video Models | Xuyang Guo et.al. | 2504.04051 | null |
2025-04-05 | DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion | Maksim Siniukov et.al. | 2504.04010 | null |
2025-04-04 | Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models | Xuran Ma et.al. | 2504.03140 | link |
2025-04-04 | MG-Gen: Single Image to Motion Graphics Generation with Layer Decomposition | Takahiro Shirakawa et.al. | 2504.02361 | null |
2025-04-03 | How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models | Pascal Chang et.al. | 2504.03072 | null |
2025-04-03 | Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments | Chenyu Zhang et.al. | 2504.02918 | null |
2025-04-03 | Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets | Chuning Zhu et.al. | 2504.02792 | null |
2025-04-03 | Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model | Shengjun Zhang et.al. | 2504.02764 | null |
2025-04-03 | ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer | Jiayi Gao et.al. | 2504.02451 | link |
2025-04-03 | SkyReels-A2: Compose Anything in Video Diffusion Transformers | Zhengcong Fei et.al. | 2504.02436 | link |
2025-04-03 | OmniCam: Unified Multimodal Video Generation via Camera Control | Xiaoda Yang et.al. | 2504.02312 | null |
2025-04-03 | VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step | Hanyang Wang et.al. | 2504.01956 | null |
2025-04-02 | Proof of Humanity: A Multi-Layer Network Framework for Certifying Human-Originated Content in an AI-Dominated Internet | Sebastian Barros et.al. | 2504.03752 | null |
2025-04-02 | WorldPrompter: Traversable Text-to-Scene Generation | Zhaoyang Zhang et.al. | 2504.02045 | null |
2025-04-02 | Towards Physically Plausible Video Generation via VLM Planning | Xindi Yang et.al. | 2503.23368 | null |
2025-04-01 | AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction | Junhao Cheng et.al. | 2504.01014 | link |
2025-04-01 | WorldScore: A Unified Evaluation Benchmark for World Generation | Haoyi Duan et.al. | 2504.00983 | null |
2025-04-01 | DecoFuse: Decomposing and Fusing the "What", "Where", and "How" for Brain-Inspired fMRI-to-Video Decoding | Chong Li et.al. | 2504.00432 | null |
2025-04-01 | HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation | Boyuan Wang et.al. | 2503.24026 | null |
2025-04-01 | On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices | Bosung Kim et.al. | 2503.23796 | link |
2025-03-31 | GazeLLM: Multimodal LLMs incorporating Human Visual Attention | Jun Rekimoto et.al. | 2504.00221 | null |
2025-03-31 | Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation | Shengqiong Wu et.al. | 2503.24379 | null |
2025-03-31 | JointTuner: Appearance-Motion Adaptive Joint Training for Customized Video Generation | Fangda Chen et.al. | 2503.23951 | null |
2025-03-31 | HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation | Kun Liu et.al. | 2503.23715 | null |
2025-03-30 | VideoGen-Eval: Agent-based System for Video Generation Evaluation | Yuhang Yang et.al. | 2503.23452 | link |
2025-03-30 | JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization | Kai Liu et.al. | 2503.23377 | null |
2025-03-30 | MoCha: Towards Movie-Grade Talking Character Synthesis | Cong Wei et.al. | 2503.23307 | null |
2025-03-30 | SketchVideo: Sketch-based Video Generation and Editing | Feng-Lin Liu et.al. | 2503.23284 | null |
2025-03-29 | Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models | Prin Phunyaphibarn et.al. | 2503.20240 | null |
2025-03-28 | Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model | Jangho Park et.al. | 2503.22622 | null |
2025-03-28 | EchoFlow: A Foundation Model for Cardiac Ultrasound Image and Video Generation | Hadrien Reynaud et.al. | 2503.22357 | null |
2025-03-28 | CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving | Yishen Ji et.al. | 2503.22231 | null |
2025-03-27 | VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models | Chi-Pin Huang et.al. | 2503.21781 | null |
2025-03-27 | Exploring the Evolution of Physics Cognition in Video Generation: A Survey | Minghui Lin et.al. | 2503.21765 | link |
2025-03-27 | VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness | Dian Zheng et.al. | 2503.21755 | link |
2025-03-27 | Audio-driven Gesture Generation via Deviation Feature in the Latent Space | Jiahui Chen et.al. | 2503.21616 | null |
2025-03-27 | ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model | Jinwei Qi et.al. | 2503.21144 | null |
2025-03-26 | Protecting Your Video Content: Disrupting Automated Video-based LLM Annotations | Haitong Liu et.al. | 2503.21824 | link |
2025-03-26 | Synthetic Video Enhances Physical Fidelity in Video Synthesis | Qi Zhao et.al. | 2503.20822 | null |
2025-03-26 | RecTable: Fast Modeling Tabular Data with Rectified Flow | Masane Fuchi et.al. | 2503.20731 | link |
2025-03-26 | AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports | Xiangwen Zhang et.al. | 2503.20654 | null |
2025-03-26 | GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving | Lloyd Russell et.al. | 2503.20523 | null |
2025-03-26 | VPO: Aligning Text-to-Video Generation Models with Prompt Optimization | Jiale Cheng et.al. | 2503.20491 | link |
2025-03-26 | Wan: Open and Advanced Large-Scale Video Generative Models | WanTeam et.al. | 2503.20314 | link |
2025-03-26 | Video Motion Graphs | Haiyang Liu et.al. | 2503.20218 | null |
2025-03-26 | Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing | Jaihoon Kim et.al. | 2503.19385 | null |
2025-03-26 | EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models | Yufei Cai et.al. | 2503.19369 | link |
2025-03-25 | Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors | Yuke Lou et.al. | 2503.20118 | null |
2025-03-25 | Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals | Stefan Stojanov et.al. | 2503.19953 | null |
2025-03-25 | FuXi-RTM: A Physics-Guided Prediction Framework with Radiative Transfer Modeling | Qiusheng Huang et.al. | 2503.19940 | null |
2025-03-25 | FullDiT: Multi-Task Video Generative Foundation Model with Full Attention | Xuan Ju et.al. | 2503.19907 | null |
2025-03-25 | Mask |
Tianhao Qi et.al. | 2503.19881 | null |
2025-03-25 | AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers | Jiazhi Guan et.al. | 2503.19824 | null |
2025-03-25 | AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset | Haiyu Zhang et.al. | 2503.19462 | null |
2025-03-25 | MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation | Yukang Lin et.al. | 2503.19383 | null |
2025-03-25 | Long-Context Autoregressive Video Modeling with Next-Frame Prediction | Yuchao Gu et.al. | 2503.19325 | link |
2025-03-25 | Aether: Geometric-Aware Unified World Modeling | Aether Team et.al. | 2503.18945 | null |
2025-03-25 | AMD-Hummingbird: Towards an Efficient Text-to-Video Model | Takashi Isobe et.al. | 2503.18559 | link |
2025-03-25 | Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model | Yingying Fan et.al. | 2503.16942 | null |
2025-03-24 | Video-T1: Test-Time Scaling for Video Generation | Fangfu Liu et.al. | 2503.18942 | null |
2025-03-24 | Training-free Diffusion Acceleration with Bottleneck Sampling | Ye Tian et.al. | 2503.18940 | null |
2025-03-24 | EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation | Qiang Qu et.al. | 2503.18552 | null |
2025-03-24 | Can Text-to-Video Generation help Video-Language Alignment? | Luca Zanella et.al. | 2503.18507 | null |
2025-03-24 | Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation | Dingcheng Zhen et.al. | 2503.18429 | null |
2025-03-24 | Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance | Sicong Feng et.al. | 2503.18386 | null |
2025-03-23 | LongDiff: Training-Free Long Video Generation in One Go | Zhuoling Li et.al. | 2503.18150 | null |
2025-03-23 | TransAnimate: Taming Layer Diffusion to Generate RGBA Video | Xuewei Chen et.al. | 2503.17934 | null |
2025-03-22 | RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation | Zhiqiang Yuan et.al. | 2503.17735 | null |
2025-03-21 | Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks | Bhishma Dedhia et.al. | 2503.17539 | null |
2025-03-21 | Position: Interactive Generative Video as Next-Generation Game Engine | Jiwen Yu et.al. | 2503.17359 | null |
2025-03-21 | AnimatePainter: A Self-Supervised Rendering Framework for Reconstructing Painting Process | Junjie Hu et.al. | 2503.17029 | null |
2025-03-21 | Enabling Versatile Controls for Video Diffusion Models | Xu Zhang et.al. | 2503.16983 | link |
2025-03-21 | SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation | Chun-Han Yao et.al. | 2503.16396 | null |
2025-03-20 | A Recipe for Generating 3D Worlds From a Single Image | Katja Schwarz et.al. | 2503.16611 | null |
2025-03-20 | XAttention: Block Sparse Attention with Antidiagonal Scoring | Ruyi Xu et.al. | 2503.16428 | link |
2025-03-20 | MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance | Quanhao Li et.al. | 2503.16421 | null |
2025-03-20 | ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos | Haolin Yang et.al. | 2503.16400 | null |
2025-03-20 | PoseTraj: Pose-Aware Trajectory Control in Video Diffusion | Longbin Ji et.al. | 2503.16068 | null |
2025-03-20 | Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models | Marc Benedí San Millán et.al. | 2503.15996 | null |
2025-03-20 | MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving | Haiguang Wang et.al. | 2503.15875 | link |
2025-03-20 | VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling | Hyojun Go et.al. | 2503.15855 | null |
2025-03-20 | VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention | Mingzhe Zheng et.al. | 2503.15138 | null |
2025-03-19 | Temporal Regularization Makes Your Video Generator Stronger | Harold Haodong Chen et.al. | 2503.15417 | null |
2025-03-19 | Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models | Tingxiu Chen et.al. | 2503.14966 | link |
2025-03-18 | MusicInfuser: Making Video Diffusion Listen and Dance | Susung Hong et.al. | 2503.14505 | null |
2025-03-18 | MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation | Hongyu Zhang et.al. | 2503.14428 | null |
2025-03-18 | Impossible Videos | Zechen Bai et.al. | 2503.14378 | null |
2025-03-18 | LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models | Yu Cheng et.al. | 2503.14325 | link |
2025-03-18 | Concat-ID: Towards Universal Identity-Preserving Video Synthesis | Yong Zhong et.al. | 2503.14151 | null |
2025-03-18 | Fast Autoregressive Video Generation with Diagonal Decoding | Yang Ye et.al. | 2503.14070 | null |
2025-03-18 | AIGVE-Tool: AI-Generated Video Evaluation Toolkit with Multifaceted Benchmark | Xinhao Xiang et.al. | 2503.14064 | link |
2025-03-17 | MagicDistillation: Weak-to-Strong Video Distillation for Large-Scale Portrait Few-Step Synthesis | Shitong Shao et.al. | 2503.13319 | null |
2025-03-17 | Language-guided Open-world Video Anomaly Detection | Zihao Liu et.al. | 2503.13160 | null |
2025-03-17 | Frame-wise Conditioning Adaptation for Fine-Tuning Diffusion Models in Text-to-Video Prediction | Zheyuan Liu et.al. | 2503.12953 | null |
2025-03-17 | AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations | Quang Trung Truong et.al. | 2503.12828 | null |
2025-03-17 | Long-Video Audio Synthesis with Multi-Agent Collaboration | Yehang Zhang et.al. | 2503.10719 | null |
2025-03-16 | SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse Inputs | Guibiao Liao et.al. | 2503.12535 | null |
2025-03-16 | VMBench: A Benchmark for Perception-Aligned Video Motion Generation | Xinran Ling et.al. | 2503.10076 | link |
2025-03-15 | ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis | Yu Fang et.al. | 2503.14526 | null |
2025-03-15 | A Speech-to-Video Synthesis Approach Using Spatio-Temporal Diffusion for Vocal Tract MRI | Paula Andrea Pérez-Toro et.al. | 2503.12102 | null |
2025-03-15 | SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering | Byeongjun Park et.al. | 2503.12024 | link |
2025-03-14 | ReCamMaster: Camera-Controlled Generative Rendering from A Single Video | Jianhong Bai et.al. | 2503.11647 | null |
2025-03-14 | HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models | Ziqin Zhou et.al. | 2503.11513 | null |
2025-03-14 | TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation | Hongxiang Zhao et.al. | 2503.11423 | null |
2025-03-14 | Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model | Haoyang Huang et.al. | 2503.11251 | link |
2025-03-14 | Cross-Modal Learning for Music-to-Music-Video Description Generation | Zhuoyuan Mao et.al. | 2503.11190 | null |
2025-03-14 | On the Limitations of Vision-Language Models in Understanding Image Transforms | Ahmad Mustafa Anis et.al. | 2503.09837 | null |
2025-03-13 | CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models | Hao He et.al. | 2503.10592 | null |
2025-03-13 | Long Context Tuning for Video Generation | Yuwei Guo et.al. | 2503.10589 | null |
2025-03-13 | CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance | Yufan Deng et.al. | 2503.10391 | null |
2025-03-13 | Semantic Latent Motion for Portrait Video Generation | Qiyuan Zhang et.al. | 2503.10096 | null |
2025-03-13 | UVE: Are MLLMs Unified Evaluators for AI-Generated Videos? | Yuanxin Liu et.al. | 2503.09949 | link |
2025-03-13 | Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers | Yasheng Sun et.al. | 2503.09942 | null |
2025-03-13 | VideoMerge: Towards Training-free Long Video Generation | Siyang Zhang et.al. | 2503.09926 | null |
2025-03-13 | WonderVerse: Extendable 3D Scene Generation with Video Generative Models | Hao Feng et.al. | 2503.09160 | null |
2025-03-12 | Error Analyses of Auto-Regressive Video Diffusion Models: A Unified Framework | Jing Wang et.al. | 2503.10704 | null |
2025-03-12 | LuciBot: Automated Robot Policy Learning from Generated Videos | Xiaowen Qiu et.al. | 2503.09871 | null |
2025-03-12 | I2V3D: Controllable image-to-video generation with 3D guidance | Zhiyuan Zhang et.al. | 2503.09733 | null |
2025-03-12 | Accelerating Diffusion Sampling via Exploiting Local Transition Coherence | Shangwen Zhu et.al. | 2503.09675 | null |
2025-03-12 | Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k | Xiangyu Peng et.al. | 2503.09642 | link |
2025-03-12 | PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop | Chenyu Li et.al. | 2503.09595 | link |
2025-03-12 | Unified Dense Prediction of Video Diffusion | Lehan Yang et.al. | 2503.09344 | null |
2025-03-12 | Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latant Space | Jian Zhu et.al. | 2503.09215 | null |
2025-03-12 | SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video | Chengshu Zhao et.al. | 2503.09154 | link |
2025-03-12 | Reangle-A-Video: 4D Video Generation as Video-to-Video Translation | Hyeonho Jeong et.al. | 2503.09151 | null |
2025-03-12 | Alex Ergasti et.al. | 2503.08307 | link | |
2025-03-12 | Object-Centric World Model for Language-Guided Manipulation | Youngjoon Jeong et.al. | 2503.06170 | null |
2025-03-11 | V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video | Jianqi Chen et.al. | 2503.09631 | null |
2025-03-11 | REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder | Yitian Zhang et.al. | 2503.08665 | null |
2025-03-11 | Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling | Subin Kim et.al. | 2503.08605 | null |
2025-03-11 | WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation | Jing Wang et.al. | 2503.08153 | null |
2025-03-11 | ObjectMover: Generative Object Movement with Video Prior | Xin Yu et.al. | 2503.08037 | null |
2025-03-11 | How Can Video Generative AI Transform K-12 Education? Examining Teachers' Perspectives through TPACK and TAM | Unggi Lee et.al. | 2503.08003 | null |
2025-03-11 | VACE: All-in-One Video Creation and Editing | Zeyinzi Jiang et.al. | 2503.07598 | null |
2025-03-11 | LightMotion: A Light and Tuning-free Method for Simulating Camera Motion in Video Generation | Quanjian Song et.al. | 2503.06508 | link |
2025-03-10 | DreamRelation: Relation-Centric Video Customization | Yujie Wei et.al. | 2503.07602 | null |
2025-03-10 | AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion | Mingzhen Sun et.al. | 2503.07418 | null |
2025-03-10 | Automated Movie Generation via Multi-Agent CoT Planning | Weijia Wu et.al. | 2503.07314 | link |
2025-03-10 | From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers | Jiacheng Liu et.al. | 2503.06923 | link |
2025-03-09 | VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation | Hritik Bansal et.al. | 2503.06800 | null |
2025-03-09 | TR-DQ: Time-Rotation Diffusion Quantization | Yihua Shao et.al. | 2503.06564 | null |
2025-03-09 | QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation | Junyi Wu et.al. | 2503.06545 | link |
2025-03-09 | Generative Video Bi-flow | Chen Liu et.al. | 2503.06364 | null |
2025-03-08 | Text2Story: Advancing Video Storytelling with Text Guidance | Taewon Kang et.al. | 2503.06310 | null |
2025-03-08 | ROCM: RLHF on consistency models | Shivanshu Shekhar et.al. | 2503.06171 | null |
2025-03-08 | VACT: A Video Automatic Causal Testing System and a Benchmark | Haotong Yang et.al. | 2503.06163 | null |
2025-03-08 | GSV3D: Gaussian Splatting-based Geometric Distillation with Stable Video Diffusion for Single-Image 3D Object Generation | Ye Tao et.al. | 2503.06136 | null |
2025-03-08 | DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation | Runze Zhang et.al. | 2503.06053 | null |
2025-03-08 | The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation | Aoxiong Yin et.al. | 2503.04606 | link |
2025-03-08 | Rethinking Video Tokenization: A Conditioned Diffusion-based Approach | Nianzu Yang et.al. | 2503.03708 | link |
2025-03-07 | MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice | Hongwei Yi et.al. | 2503.05978 | null |
2025-03-07 | MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio | Xuenan Xu et.al. | 2503.05242 | link |
2025-03-07 | Unified Reward Model for Multimodal Understanding and Generation | Yibin Wang et.al. | 2503.05236 | null |
2025-03-07 | Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos | Zhiyu Tan et.al. | 2502.21314 | null |
2025-03-06 | Toward Lightweight and Fast Decoders for Diffusion Models in Image and Video Generation | Alexey Buzovkin et.al. | 2503.04871 | link |
2025-03-06 | FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video | Yue Gao et.al. | 2503.04720 | null |
2025-03-06 | What Are You Doing? A Closer Look at Controllable Human Video Generation | Emanuele Bugliarello et.al. | 2503.04666 | null |
2025-03-05 | ProReflow: Progressive Reflow with Decomposed Velocity | Lei Ke et.al. | 2503.04824 | null |
2025-03-05 | GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control | Xuanchi Ren et.al. | 2503.03751 | link |
2025-03-05 | DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance | Zhao Yang et.al. | 2503.03689 | link |
2025-03-05 | High-Quality Virtual Single-Viewpoint Surgical Video: Geometric Autocalibration of Multiple Cameras in Surgical Lights | Yuna Kato et.al. | 2503.03558 | link |
2025-03-05 | Video Super-Resolution: All You Need is a Video Diffusion Model | Zhihao Zhan et.al. | 2503.03355 | null |
2025-03-04 | GRADEO: Towards Human-Like Evaluation for Text-to-Video Generation via Multi-Step Reasoning | Zhun Mou et.al. | 2503.02341 | null |
2025-03-04 | Unified Video Action Model | Shuang Li et.al. | 2503.00200 | null |
2025-03-03 | VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation | Wenhao Wang et.al. | 2503.01739 | link |
2025-03-03 | VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors | Juil Koo et.al. | 2503.01107 | null |
2025-03-03 | TransVDM: Motion-Constrained Video Diffusion Model for Transparent Video Synthesis | Menghao Li et.al. | 2502.19454 | null |
2025-03-02 | Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think | Jie Tian et.al. | 2503.00948 | link |
2025-03-01 | Learning to Animate Images from A Few Videos to Portray Delicate Human Actions | Haoxin Li et.al. | 2503.00276 | null |
2025-02-28 | Training-free and Adaptive Sparse Attention for Efficient Long Video Generation | Yifei Xia et.al. | 2502.21079 | null |
2025-02-28 | HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models | Xiao Wang et.al. | 2502.20811 | null |
2025-02-28 | WorldModelBench: Judging Video Generation Models As World Models | Dacheng Li et.al. | 2502.20694 | null |
2025-02-28 | RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers | Ke Cao et.al. | 2502.14377 | null |
2025-02-27 | Mobius: Text to Seamless Looping Video Generation via Latent Shift | Xiuli Bi et.al. | 2502.20307 | link |
2025-02-27 | FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute | Sotiris Anagnostidis et.al. | 2502.20126 | null |
2025-02-27 | C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation | Yuhao Li et.al. | 2502.19868 | link |
2025-02-26 | Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical Analysis | Long Cheng et.al. | 2503.01873 | null |
2025-02-26 | Glad: A Streaming Scene Generator for Autonomous Driving | Bin Xie et.al. | 2503.00045 | null |
2025-02-26 | FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion mode | Lingzhou Mu et.al. | 2502.19455 | null |
2025-02-25 | SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference | Jintao Zhang et.al. | 2502.18137 | link |
2025-02-25 | ASurvey: Spatiotemporal Consistency in Video Generation | Zhiyu Yin et.al. | 2502.17863 | null |
2025-02-24 | X-Dancer: Expressive Music to Human Dance Video Generation | Zeyuan Chen et.al. | 2502.17414 | null |
2025-02-24 | VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing | Xiangpeng Yang et.al. | 2502.17258 | null |
2025-02-24 | Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions | Zhong Li et.al. | 2502.17119 | link |
2025-02-21 | RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers | Min Zhao et.al. | 2502.15894 | null |
2025-02-21 | VaViM and VaVAM: Autonomous Driving through Video Generative Modeling | Florent Bartoccioni et.al. | 2502.15672 | link |
2025-02-21 | LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities | Florian Sestak et.al. | 2502.12128 | link |
2025-02-20 | Hardware-Friendly Static Quantization Method for Video Diffusion Transformers | Sanghyun Yi et.al. | 2502.15077 | null |
2025-02-20 | LAVID: An Agentic LVLM Framework for Diffusion-Generated Video Detection | Qingyuan Liu et.al. | 2502.14994 | null |
2025-02-20 | Improving the Diffusability of Autoencoders | Ivan Skorokhodov et.al. | 2502.14831 | null |
2025-02-20 | Designing Parameter and Compute Efficient Diffusion Transformers using Distillation | Vignesh Sundaresha et.al. | 2502.14226 | null |
2025-02-19 | FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation | Yunpeng Zhang et.al. | 2502.13995 | link |
2025-02-19 | LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation | Junchen Fu et.al. | 2502.12945 | null |
2025-02-18 | VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation | Xinlong Chen et.al. | 2502.12782 | link |
2025-02-18 | MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation | Sihyun Yu et.al. | 2502.12632 | null |
2025-02-17 | DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation | Zhihang Yuan et.al. | 2502.11897 | link |
2025-02-17 | Object-Centric Image to Video Generation with Language Guidance | Angel Villar-Corrales et.al. | 2502.11655 | null |
2025-02-17 | Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model | Guoqing Ma et.al. | 2502.10248 | link |
2025-02-17 | Magic 1-For-1: Generating One Minute Video Clips within One Minute | Hongwei Yi et.al. | 2502.07701 | link |
2025-02-16 | MaskFlow: Discrete Flows For Flexible and Efficient Long Video Generation | Michael Fuest et.al. | 2502.11234 | null |
2025-02-16 | Phantom: Subject-consistent video generation via cross-modal alignment | Lijie Liu et.al. | 2502.11079 | null |
2025-02-15 | SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers | Di Qiu et.al. | 2502.10841 | link |
2025-02-14 | RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control | Teng Li et.al. | 2502.10059 | null |
2025-02-14 | GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation | Hongyin Zhang et.al. | 2502.09268 | null |
2025-02-13 | Enhance-A-Video: Better Generated Video for Free | Yang Luo et.al. | 2502.07508 | link |
2025-02-12 | CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation | Qinghe Wang et.al. | 2502.08639 | null |
2025-02-12 | FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis | Wonjoon Jin et.al. | 2502.08244 | null |
2025-02-12 | Learning Human Skill Generators at Key-Step Levels | Yilu Wu et.al. | 2502.08234 | null |
2025-02-12 | AnyCharV: Bootstrap Controllable Character Video Generation with Fine-to-Coarse Guidance | Zhao Wang et.al. | 2502.08189 | null |
2025-02-12 | Next Block Prediction: Video Generation via Semi-Autoregressive Modeling | Shuhuai Ren et.al. | 2502.07737 | null |
2025-02-12 | VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation | Sixiao Zheng et.al. | 2502.07531 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-06-23 | InstructAttribute: Fine-grained Object Attributes editing with Instruction | Xingxi Yin et.al. | 2505.00751 | null |
2025-06-14 | Real-Time Per-Garment Virtual Try-On with Temporal Consistency for Loose-Fitting Garments | Zaiqiang Wu et.al. | 2506.12348 | link |
2025-06-13 | HF-VTON: High-Fidelity Virtual Try-On via Consistent Geometric and Semantic Alignment | Ming Meng et.al. | 2505.19638 | null |
2025-06-12 | Low-Barrier Dataset Collection with Real Human Body for Interactive Per-Garment Virtual Try-On | Zaiqiang Wu et.al. | 2506.10468 | link |
2025-06-06 | ChronoTailor: Harnessing Attention Guidance for Fine-Grained Video Virtual Try-On | Jinjuan Wang et.al. | 2506.05858 | null |
2025-06-02 | OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation | Sen Liang et.al. | 2506.01801 | null |
2025-06-01 | DS-VTON: High-Quality Virtual Try-on via Disentangled Dual-Scale Generation | Xianbing Sun et.al. | 2506.00908 | null |
2025-05-29 | VITON-DRR: Details Retention Virtual Try-on via Non-rigid Registration | Ben Li et.al. | 2505.23439 | link |
2025-05-28 | MagicTryOn: Harnessing Diffusion Transformer for Garment-Preserving Video Virtual Try-on | Guangyuan Li et.al. | 2505.21325 | null |
2025-05-27 | Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals | Davide Lobba et.al. | 2505.21062 | link |
2025-05-26 | VTBench: Comprehensive Benchmark Suite Towards Real-World Virtual Try-on Models | Hu Xiaobin et.al. | 2505.19571 | link |
2025-05-22 | Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction | Dong Li et.al. | 2505.16980 | null |
2025-05-22 | Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On | Siqi Wan et.al. | 2505.16977 | link |
2025-05-15 | Single View Garment Reconstruction Using Diffusion Mapping Via Pattern Coordinates | Ren Li et.al. | 2504.08353 | link |
2025-04-29 | Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting | Hanxi Liu et.al. | 2504.20403 | null |
2025-04-24 | FashionM3: Multimodal, Multitask, and Multiround Fashion Assistant based on Unified Vision-Language Model | Kaicheng Pang et.al. | 2504.17826 | null |
2025-04-24 | 3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models | Min Wei et.al. | 2504.17414 | null |
2025-04-21 | Shape-Guided Clothing Warping for Virtual Try-On | Xiaoyu Han et.al. | 2504.15232 | link |
2025-04-21 | Insert Anything: Image Insertion via In-Context Editing in DiT | Wensong Song et.al. | 2504.15009 | null |
2025-04-19 | Flux Already Knows -- Activating Subject-Driven Image Generation without Training | Hao Kang et.al. | 2504.11478 | link |
2025-04-19 | Concat-ID: Towards Universal Identity-Preserving Video Synthesis | Yong Zhong et.al. | 2503.14151 | null |
2025-04-18 | Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation | Fulvio Sanguigni et.al. | 2504.14011 | null |
2025-04-17 | Enhancing Person-to-Person Virtual Try-On with Multi-Garment Virtual Try-Off | Riza Velioglu et.al. | 2504.13078 | link |
2025-04-15 | ReZero: Enhancing LLM search ability by trying one-more-time | Alan Dao et.al. | 2504.11001 | null |
2025-04-11 | VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction | Zijian He et.al. | 2503.12165 | null |
2025-04-04 | From Keypoints to Realism: A Realistic and Accurate Virtual Try-on Network from 2D Images | Maliheh Toozandehjani et.al. | 2504.03807 | null |
2025-04-03 | MAD: Makeup All-in-One with Cross-Domain Diffusion Model | Bo-Kai Ruan et.al. | 2504.02545 | null |
2025-04-01 | Diffusion Model-Based Size Variable Virtual Try-On Technology and Evaluation Method | Shufang Zhang et.al. | 2504.00562 | null |
2025-03-26 | ITA-MDT: Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On | Ji Woo Hong et.al. | 2503.20418 | null |
2025-03-26 | Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks | Hailong Guo et.al. | 2501.15891 | null |
2025-03-25 | Exploring Disentangled and Controllable Human Image Synthesis: From End-to-End to Stage-by-Stage | Zhengwentai Sun et.al. | 2503.19486 | null |
2025-03-20 | Shining Yourself: High-Fidelity Ornaments Virtual Try-on with Diffusion Model | Yingmao Miao et.al. | 2503.16065 | null |
2025-03-18 | Limb-Aware Virtual Try-On Network with Progressive Clothing Warping | Shengping Zhang et.al. | 2503.14074 | link |
2025-03-16 | Progressive Limb-Aware Virtual Try-On | Xiaoyu Han et.al. | 2503.12588 | link |
2025-03-15 | ITVTON: Virtual Try-On Diffusion Transformer Based on Integrated Image and Text | Haifeng Ni et.al. | 2501.16757 | null |
2025-03-11 | MF-VITON: High-Fidelity Mask-Free Virtual Try-On with Minimal Input | Zhenchen Wan et.al. | 2503.08650 | null |
2025-03-11 | RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency | Siqi Li et.al. | 2501.08682 | null |
2025-02-20 | CrossVTON: Mimicking the Logic Reasoning on Cross-category Virtual Try-on guided by Tri-zone Priors | Donghao Luo et.al. | 2502.14373 | null |
2025-02-05 | Dress-1-to-3: Single Image to Simulation-Ready 3D Outfit with Diffusion Prior and Differentiable Physics | Xuan Li et.al. | 2502.03449 | null |
2025-02-03 | MFP-VTON: Enhancing Mask-Free Person-to-Person Virtual Try-On via Diffusion Transformer | Le Shen et.al. | 2502.01626 | null |
2025-01-26 | IPVTON: Image-based 3D Virtual Try-on with Image Prompt Adapter | Xiaojing Zhong et.al. | 2501.15616 | null |
2025-01-26 | Cross-Cultural Fashion Design via Interactive Large Language Models and Diffusion Models | Spencer Ramsey et.al. | 2501.15571 | null |
2025-01-20 | EfficientVITON: An Efficient Virtual Try-On Model using Optimized Diffusion Process | Mostafa Atef et.al. | 2501.11776 | null |
2025-01-20 | CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation | Zheng Chong et.al. | 2501.11325 | link |
2025-01-17 | Disharmony: Forensics using Reverse Lighting Harmonization | Philip Wootaek Shin et.al. | 2501.10212 | null |
2025-01-12 | ODPG: Outfitting Diffusion with Pose Guided Condition | Seohyun Lee et.al. | 2501.06769 | null |
2025-01-10 | MC-VTON: Minimal Control Virtual Try-On Diffusion Transformer | Junsheng Luan et.al. | 2501.03630 | null |
2025-01-09 | 1-2-1: Renaissance of Single-Network Paradigm for Virtual Try-On | Shuliang Ning et.al. | 2501.05369 | null |
2025-01-08 | Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling | Nannan Li et.al. | 2501.04666 | null |
2025-01-07 | HYB-VITON: A Hybrid Approach to Virtual Try-On Combining Explicit and Implicit Warping | Kosuke Takemoto et.al. | 2501.03910 | link |
2025-01-07 | VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control | Yuanpeng Tu et.al. | 2501.01427 | null |
2024-12-25 | DRDM: A Disentangled Representations Diffusion Model for Synthesizing Realistic Person Images | Enbo Huang et.al. | 2412.18797 | null |
2024-12-22 | PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask | Jeongho Kim et.al. | 2412.16978 | link |
2024-12-19 | DiffusionTrend: A Minimalist Approach to Virtual Fashion Try-On | Wengyi Zhan et.al. | 2412.14465 | null |
2024-12-19 | FashionComposer: Compositional Fashion Image Generation | Sihui Ji et.al. | 2412.14168 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-06-25 | EditP23: 3D Editing via Propagation of Image Prompts to Multi-View | Roi Bar-On et.al. | 2506.20652 | null |
2025-06-25 | Towards Efficient Exemplar Based Image Editing with Multimodal VLMs | Avadhoot Jadhav et.al. | 2506.20155 | null |
2025-06-25 | OmniGen2: Exploration to Advanced Multimodal Generation | Chenyuan Wu et.al. | 2506.18871 | null |
2025-06-24 | SceneCrafter: Controllable Multi-View Driving Scene Editing | Zehao Zhu et.al. | 2506.19488 | null |
2025-06-24 | LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning | Chenjian Gao et.al. | 2506.10082 | null |
2025-06-23 | Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models | Ilia Beletskii et.al. | 2506.19103 | null |
2025-06-23 | Let Your Video Listen to Your Music! | Xinyu Zhang et.al. | 2506.18881 | null |
2025-06-23 | CPAM: Context-Preserving Adaptive Manipulation for Zero-Shot Real Image Editing | Dinh-Khoi Vo et.al. | 2506.18438 | null |
2025-06-23 | Instability in Diffusion ODEs: An Explanation for Inaccurate Image Reconstruction | Han Zhang et.al. | 2506.18290 | null |
2025-06-20 | BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing | Jiacheng Chen et.al. | 2506.17450 | null |
2025-06-20 | FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation | Fan Yang et.al. | 2506.16806 | null |
2025-06-19 | Arch-Router: Aligning LLM Routing with Human Preferences | Co Tran et.al. | 2506.16655 | null |
2025-06-18 | VectorEdits: A Dataset and Benchmark for Instruction-Based Editing of Vector Graphics | Josef Kuchař et.al. | 2506.15903 | null |
2025-06-17 | Causally Steered Diffusion for Automated Video Counterfactual Generation | Nikos Spyrou et.al. | 2506.14404 | link |
2025-06-16 | AttentionDrag: Exploiting Latent Correlation Knowledge in Pre-trained Diffusion Models for Image Editing | Biao Yang et.al. | 2506.13301 | null |
2025-06-15 | Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing | Zhuoying Li et.al. | 2506.13827 | null |
2025-06-15 | ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies | Chenglin Wang et.al. | 2506.12830 | null |
2025-06-14 | Good Noise Makes Good Edits: A Training-Free Diffusion-Based Video Editing with Image and Text Prompts | Saemee Choi et.al. | 2506.12520 | null |
2025-06-13 | SphereDrag: Spherical Geometry-Aware Panoramic Image Editing | Zhiao Feng et.al. | 2506.11863 | null |
2025-06-13 | Consistent Video Editing as Flow-Driven Image-to-Video Generation | Ge Wang et.al. | 2506.07713 | null |
2025-06-12 | VINCIE: Unlocking In-context Image Editing from Video | Leigang Qu et.al. | 2506.10941 | null |
2025-06-12 | Edit360: 2D Image Edits to 3D Assets from Any Angle | Junchao Huang et.al. | 2506.10507 | null |
2025-06-12 | Towards Reliable Identification of Diffusion-based Image Manipulations | Alex Costanzino et.al. | 2506.05466 | null |
2025-06-11 | EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits | Ron Yosef et.al. | 2506.09988 | null |
2025-06-11 | ELBO-T2IAlign: A Generic ELBO-Based Method for Calibrating Pixel-level Text-Image Alignment in Diffusion Models | Qin Zhou et.al. | 2506.09740 | null |
2025-06-11 | Ming-Omni: A Unified Multimodal Model for Perception and Generation | Inclusion AI et.al. | 2506.09344 | link |
2025-06-11 | Fine-Grained Spatially Varying Material Selection in Images | Julia Guerrero-Viu et.al. | 2506.09023 | null |
2025-06-10 | Do Concept Replacement Techniques Really Erase Unacceptable Concepts? | Anudeep Das et.al. | 2506.08991 | null |
2025-06-10 | RoboSwap: A GAN-driven Video Diffusion Framework For Unsupervised Robot Arm Swapping | Yang Bai et.al. | 2506.08632 | null |
2025-06-09 | Highly Compressed Tokenizer Can Generate Without Training | L. Lao Beyer et.al. | 2506.08257 | link |
2025-06-09 | PairEdit: Learning Semantic Variations for Exemplar-based Image Editing | Haoguang Lu et.al. | 2506.07992 | link |
2025-06-09 | Diffusion Counterfactual Generation with Semantic Abduction | Rajat Rasal et.al. | 2506.07883 | link |
2025-06-09 | DragNeXt: Rethinking Drag-Based Image Editing | Yuan Zhou et.al. | 2506.07611 | null |
2025-06-09 | Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding | Boyu Chen et.al. | 2506.07576 | null |
2025-06-08 | Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning | Tianyi Bai et.al. | 2506.07227 | null |
2025-06-08 | TV-LiVE: Training-Free, Text-Guided Video Editing via Layer Informed Vitality Exploitation | Min-Jung Kim et.al. | 2506.07205 | null |
2025-06-06 | Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models | Yifu Qiu et.al. | 2506.06006 | link |
2025-06-06 | FADE: Frequency-Aware Diffusion Model Factorization for Video Editing | Yixuan Zhu et.al. | 2506.05934 | link |
2025-06-06 | SeedEdit 3.0: Fast and High-Quality Generative Image Editing | Peng Wang et.al. | 2506.05083 | null |
2025-06-05 | FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing | Guangzhao Li et.al. | 2506.05046 | null |
2025-06-05 | Invisible Backdoor Triggers in Image Editing Model via Deep Watermarking | Yu-Feng Chen et.al. | 2506.04879 | null |
2025-06-05 | FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers | Xuanhua He et.al. | 2506.04213 | null |
2025-06-04 | HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation | Hermann Kumbong et.al. | 2506.04421 | null |
2025-06-04 | Is Perturbation-Based Image Protection Disruptive to Image Editing? | Qiuyu Tang et.al. | 2506.04394 | null |
2025-06-04 | UNIC: Unified In-Context Video Editing | Zixuan Ye et.al. | 2506.04216 | null |
2025-06-04 | Image Editing As Programs with Diffusion Models | Yujia Hu et.al. | 2506.04158 | null |
2025-06-04 | UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation | Bin Lin et.al. | 2506.03147 | null |
2025-06-04 | MedEBench: Revisiting Text-instructed Image Editing on Medical Domain | Minghao Liu et.al. | 2506.01921 | null |
2025-06-03 | RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions | Bimsara Pathiraja et.al. | 2506.03448 | null |
2025-06-03 | ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions | Di Chang et.al. | 2506.03107 | null |
2025-06-03 | DCI: Dual-Conditional Inversion for Boosting Diffusion-Based Image Editing | Zixiang Li et.al. | 2506.02560 | null |
2025-06-03 | RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers | Yan Gong et.al. | 2506.02528 | null |
2025-06-02 | IMAGHarmony: Controllable Image Editing with Consistent Object Quantity and Layout | Fei Shen et.al. | 2506.01949 | null |
2025-06-02 | OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation | Sen Liang et.al. | 2506.01801 | null |
2025-06-02 | Unlocking Aha Moments via Reinforcement Learning: Advancing Collaborative Visual Comprehension and Generation | Kaihang Pan et.al. | 2506.01480 | null |
2025-06-02 | DNAEdit: Direct Noise Alignment for Text-Guided Rectified Flow Editing | Chenxi Xie et.al. | 2506.01430 | null |
2025-06-01 | Motion-Aware Concept Alignment for Consistent Video Editing | Tong Zhang et.al. | 2506.01004 | null |
2025-05-31 | Concept-Centric Token Interpretation for Vector-Quantized Generative Models | Tianze Yang et.al. | 2506.00698 | null |
2025-05-30 | MiniMax-Remover: Taming Bad Noise Helps Video Object Removal | Bojia Zi et.al. | 2505.24873 | null |
2025-05-29 | Cora: Correspondence-aware image editing using few step diffusion | Amirhossein Almohammadi et.al. | 2505.23907 | null |
2025-05-29 | LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers | Yusuf Dalva et.al. | 2505.23758 | null |
2025-05-29 | Weakly-supervised Localization of Manipulated Image Regions Using Multi-resolution Learned Features | Ziyong Wang et.al. | 2505.23586 | null |
2025-05-29 | Video Editing for Audio-Visual Dubbing | Binyamin Manela et.al. | 2505.23406 | link |
2025-05-29 | FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing | Jeongsol Kim et.al. | 2505.23145 | link |
2025-05-29 | Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing | Tongtong Su et.al. | 2505.23134 | link |
2025-05-28 | HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer | Qi Cai et.al. | 2505.22705 | link |
2025-05-28 | VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use | Mingyuan Wu et.al. | 2505.19255 | null |
2025-05-27 | Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion | Yang Yang et.al. | 2505.21593 | null |
2025-05-27 | Imago Obscura: An Image Privacy AI Co-pilot to Enable Identification and Mitigation of Risks | Kyzyl Monteiro et.al. | 2505.20916 | null |
2025-05-27 | InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling | Xiaoxiao Jiang et.al. | 2505.20600 | null |
2025-05-26 | What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models | Lorenzo Baraldi et.al. | 2505.20405 | null |
2025-05-26 | ImgEdit: A Unified Image Editing Dataset and Benchmark | Yang Ye et.al. | 2505.20275 | link |
2025-05-26 | StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation | Yi Wu et.al. | 2505.19874 | null |
2025-05-26 | TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs | Juntong Wang et.al. | 2505.19535 | null |
2025-05-26 | Understanding Generative AI Capabilities in Everyday Image Editing Tasks | Mohammad Reza Taesiri et.al. | 2505.16181 | null |
2025-05-25 | Beyond Editing Pairs: Fine-Grained Instructional Image Editing via Multi-Scale Learnable Regions | Chenrui Ma et.al. | 2505.19352 | null |
2025-05-25 | SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation | Shenggan Cheng et.al. | 2505.19151 | null |
2025-05-25 | MIND-Edit: MLLM Insight-Driven Editing via Language-Vision Projection | Shuyu Wang et.al. | 2505.19149 | null |
2025-05-24 | REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing | Weihan Xu et.al. | 2505.18880 | null |
2025-05-24 | Affective Image Editing: Shaping Emotional Factors via Text Descriptions | Peixuan Zhang et.al. | 2505.18699 | null |
2025-05-24 | Improved Immiscible Diffusion: Accelerate Diffusion Training by Reducing Its Miscibility | Yiheng Li et.al. | 2505.18521 | link |
2025-05-23 | DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval | Yuxin Yang et.al. | 2505.17796 | null |
2025-05-23 | R-Genie: Reasoning-Guided Generative Image Editing | Dong Zhang et.al. | 2505.17768 | null |
2025-05-22 | KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models | Yongliang Wu et.al. | 2505.16707 | null |
2025-05-21 | FragFake: A Dataset for Fine-Grained Detection of Edited Images with Vision Language Models | Zhen Sun et.al. | 2505.15644 | link |
2025-05-20 | DragLoRA: Online Optimization of LoRA Adapters for Drag-based Image Editing in Diffusion Model | Siwei Xia et.al. | 2505.12427 | link |
2025-05-20 | CompBench: Benchmarking Complex Instruction-guided Image Editing | Bohan Jia et.al. | 2505.12200 | null |
2025-05-18 | From Shots to Stories: LLM-Assisted Video Editing with Unified Language Representations | Yuzhi Li et.al. | 2505.12237 | null |
2025-05-16 | X-Edit: Detecting and Localizing Edits in Images Altered by Text-Guided Diffusion Models | Valentina Bazyleva et.al. | 2505.11753 | null |
2025-05-16 | GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing | Yusu Qian et.al. | 2505.11493 | null |
2025-05-15 | 3D-Fixup: Advancing Photo Editing with 3D Priors | Yen-Chi Cheng et.al. | 2505.10566 | null |
2025-05-15 | IntrinsicEdit: Precise generative image manipulation in intrinsic space | Linjie Lyu et.al. | 2505.08889 | null |
2025-05-14 | Don't Forget your Inverse DDIM for Image Editing | Guillermo Gomez-Trenado et.al. | 2505.09571 | null |
2025-05-12 | MDE-Edit: Masked Dual-Editing for Multi-Object Image Editing via Diffusion Models | Hongyang Zhu et.al. | 2505.05101 | null |
2025-05-11 | DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models | Junhao Xia et.al. | 2505.07057 | null |
2025-05-11 | Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation | Chao Liao et.al. | 2505.05472 | null |
2025-05-08 | GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing | Tong Wang et.al. | 2505.04915 | null |
2025-05-07 | Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers | Divyansh Srivastava et.al. | 2505.04718 | null |
2025-05-07 | Multi-turn Consistent Image Editing | Zijun Zhou et.al. | 2505.04320 | null |
2025-05-07 | Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction | Inclusion AI et.al. | 2505.02471 | link |
2025-05-06 | MambaStyle: Efficient StyleGAN Inversion for Real Image Editing with State-Space Models | Jhon Lopez et.al. | 2505.15822 | null |
2025-05-06 | Step1X-Edit: A Practical Framework for General Image Editing | Shiyu Liu et.al. | 2504.17761 | link |
2025-05-05 | SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing | Ming Li et.al. | 2505.02370 | link |
2025-05-04 | Video Forgery Detection for Surveillance Cameras: A Review | Noor B. Tayfor et.al. | 2505.03832 | null |
2025-05-02 | Improving Editability in Image Generation with Layer-wise Memory | Daneul Kim et.al. | 2505.01079 | null |
2025-05-02 | A Rusty Link in the AI Supply Chain: Detecting Evil Configurations in Model Repositories | Ziqi Ding et.al. | 2505.01067 | null |
2025-05-02 | Photoshop Batch Rendering Using Actions for Stylistic Video Editing | Tessa De La Fuente et.al. | 2505.01001 | null |
2025-05-01 | InstructAttribute: Fine-grained Object Attributes editing with Instruction | Xingxi Yin et.al. | 2505.00751 | null |
2025-05-01 | Controllable Weather Synthesis and Removal with Video Diffusion Models | Chih-Hao Lin et.al. | 2505.00704 | null |
2025-05-01 | Towards Scalable Human-aligned Benchmark for Text-guided Image Editing | Suho Ryu et.al. | 2505.00502 | link |
2025-04-30 | PixelHacker: Image Inpainting with Structural and Semantic Consistency | Ziyang Xu et.al. | 2504.20438 | null |
2025-04-29 | In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer | Zechuan Zhang et.al. | 2504.20690 | null |
2025-04-27 | CapsFake: A Multimodal Capsule Network for Detecting Instruction-Guided Deepfakes | Tuan Nguyen et.al. | 2504.19212 | null |
2025-04-26 | REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models | Gal Almog et.al. | 2504.18989 | link |
2025-04-24 | DCT-Shield: A Robust Frequency Domain Defense against Malicious Image Editing | Aniruddha Bala et.al. | 2504.17894 | null |
2025-04-24 | VEU-Bench: Towards Comprehensive Understanding of Video Editing | Bozheng Li et.al. | 2504.17828 | null |
2025-04-24 | Generative Fields: Uncovering Hierarchical Feature Control for StyleGAN via Inverted Receptive Fields | Zhuo He et.al. | 2504.17712 | null |
2025-04-24 | Enhancing Variational Autoencoders with Smooth Robust Latent Encoding | Hyomin Lee et.al. | 2504.17219 | null |
2025-04-24 | Vidi: Large Multimodal Models for Video Understanding and Editing | Vidi Team et.al. | 2504.15681 | null |
2025-04-22 | Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework | Xinyuan Song et.al. | 2504.16016 | null |
2025-04-22 | Structure-Preserving Zero-Shot Image Editing via Stage-Wise Latent Injection in Diffusion Models | Dasol Jeong et.al. | 2504.15723 | null |
2025-04-21 | MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World | Ankit Dhiman et.al. | 2504.15397 | null |
2025-04-21 | Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach | Lvpan Cai et.al. | 2504.11922 | link |
2025-04-20 | MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation | Siyi Jiao et.al. | 2504.14606 | null |
2025-04-19 | Visual Prompting for One-shot Controllable Video Editing without Inversion | Zhengbo Zhang et.al. | 2504.14335 | null |
2025-04-19 | PRISM: A Unified Framework for Photorealistic Reconstruction and Intrinsic Scene Modeling | Alara Dirik et.al. | 2504.14219 | null |
2025-04-18 | Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation | Fulvio Sanguigni et.al. | 2504.14011 | null |
2025-04-18 | Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing | Joowon Kim et.al. | 2504.13490 | null |
2025-04-17 | Image Editing with Diffusion Models: A Survey | Jia Wang et.al. | 2504.13226 | null |
2025-04-17 | Siwei Yang et.al. | 2504.13143 | null | |
2025-04-17 | UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models | Guanlong Jiao et.al. | 2504.13109 | null |
2025-04-17 | Image-Editing Specialists: An RLAIF Approach for Diffusion Models | Elior Benarous et.al. | 2504.12833 | link |
2025-04-17 | SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding | Qianqian Sun et.al. | 2504.12704 | null |
2025-04-17 | DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency | Mengshi Qi et.al. | 2504.12080 | link |
2025-04-17 | Understanding Attention Mechanism in Video Diffusion Models | Bingyan Liu et.al. | 2504.12027 | null |
2025-04-14 | Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing | Taihang Hu et.al. | 2504.10434 | link |
2025-04-14 | Analysis of Attention in Video Diffusion Transformers | Yuxin Wen et.al. | 2504.10317 | null |
2025-04-14 | TAPNext: Tracking Any Point (TAP) as Next Token Prediction | Artem Zholus et.al. | 2504.05579 | null |
2025-04-13 | SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow | Kenan Tang et.al. | 2504.09697 | link |
2025-04-13 | CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models | Pooja Guhan et.al. | 2504.09472 | null |
2025-04-11 | CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model | Ruohao Zhan et.al. | 2504.08259 | null |
2025-04-10 | POEM: Precise Object-level Editing via MLLM control | Marco Schouten et.al. | 2504.08111 | null |
2025-04-10 | Learning Universal Features for Generalizable Image Forgery Localization | Hengrun Zhao et.al. | 2504.07462 | link |
2025-04-10 | Routing to the Right Expertise: A Trustworthy Judge for Instruction-based Image Editing | Chenxi Sun et.al. | 2504.07424 | null |
2025-04-09 | FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution | Gene Chou et.al. | 2504.07093 | link |
2025-04-08 | VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing | Juan Luis Gonzalez Bello et.al. | 2504.07146 | null |
2025-04-08 | Transfer between Modalities with MetaQueries | Xichen Pan et.al. | 2504.06256 | null |
2025-04-08 | Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model | Qi Mao et.al. | 2504.05594 | null |
2025-04-08 | Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing | Xiangyu Zhao et.al. | 2504.02826 | link |
2025-04-07 | CREA: A Collaborative Multi-Agent Framework for Creative Content Generation with Diffusion Models | Kavana Venkatesh et.al. | 2504.05306 | null |
2025-04-07 | Disentangling Instruction Influence in Diffusion Transformers for Parallel Multi-Instruction-Guided Image Editing | Hui Liu et.al. | 2504.04784 | null |
2025-04-07 | MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models | Wulin Xie et.al. | 2504.03641 | null |
2025-04-04 | Synthesizing Optimal Object Selection Predicates for Image Editing using Lattices | Yang He et.al. | 2504.03155 | null |
2025-04-03 | How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models | Pascal Chang et.al. | 2504.03072 | null |
2025-04-03 | VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning | Xianwei Zhuang et.al. | 2504.02949 | link |
2025-04-03 | Concept Lancet: Image Editing with Compositional Representation Transplant | Jinqi Luo et.al. | 2504.02828 | null |
2025-04-03 | GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation | Zhiyuan Yan et.al. | 2504.02782 | link |
2025-04-03 | ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement | Runhui Huang et.al. | 2504.01934 | null |
2025-04-02 | FreSca: Unveiling the Scaling Space in Diffusion Models | Chao Huang et.al. | 2504.02154 | null |
2025-04-02 | A Diffusion-Based Framework for Occluded Object Movement | Zheng-Peng Duan et.al. | 2504.01873 | null |
2025-03-31 | AI2Agent: An End-to-End Framework for Deploying AI Projects as Autonomous Agents | Jiaxiang Chen et.al. | 2503.23948 | link |
2025-03-31 | Training-Free Text-Guided Image Editing with Visual Autoregressive Model | Yufei Wang et.al. | 2503.23897 | link |
2025-03-30 | Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging | Amar Kumar et.al. | 2503.23618 | null |
2025-03-30 | ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025 | Tianming Liang et.al. | 2503.23509 | link |
2025-03-30 | SketchVideo: Sketch-based Video Generation and Editing | Feng-Lin Liu et.al. | 2503.23284 | null |
2025-03-29 | FreeInv: Free Lunch for Improving DDIM Inversion | Yuxiang Bao et.al. | 2503.23035 | null |
2025-03-29 | FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model | Jun Zhou et.al. | 2503.19839 | null |
2025-03-28 | Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance | Haijie Yang et.al. | 2503.22225 | null |
2025-03-28 | LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing | Achint Soni et.al. | 2503.21541 | link |
2025-03-26 | Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising | Yan-Bo Lin et.al. | 2503.20782 | null |
2025-03-26 | EditCLIP: Representation Learning for Image Editing | Qian Wang et.al. | 2503.20318 | link |
2025-03-26 | Wan: Open and Advanced Large-Scale Video Generative Models | WanTeam et.al. | 2503.20314 | link |
2025-03-26 | InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction | Yuhui Wu et.al. | 2503.20287 | link |
2025-03-25 | Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning | Sherry X. Chen et.al. | 2503.18406 | link |
2025-03-25 | Shot Sequence Ordering for Video Editing: Benchmarks, Metrics, and Cinematology-Inspired Computing Methods | Yuzhi Li et.al. | 2503.17975 | null |
2025-03-24 | FDS: Frequency-Aware Denoising Score for Text-Guided Latent Diffusion Image Editing | Yufan Ren et.al. | 2503.19191 | null |
2025-03-24 | Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance | Sicong Feng et.al. | 2503.18386 | null |
2025-03-24 | MaSS13K: A Matting-level Semantic Segmentation Benchmark | Chenxi Xie et.al. | 2503.18364 | link |
2025-03-23 | Collaborating with AI Agents: Field Experiments on Teamwork, Productivity, and Performance | Harang Ju et.al. | 2503.18238 | link |
2025-03-23 | What Time Tells Us? An Explorative Study of Time Awareness Learned from Static Images | Dongheng Lin et.al. | 2503.17899 | null |
2025-03-23 | Multi-focal Conditioned Latent Diffusion for Person Image Synthesis | Jiaqi Liu et.al. | 2503.15686 | link |
2025-03-22 | InstructVEdit: A Holistic Approach for Instructional Video Editing | Chi Zhang et.al. | 2503.17641 | null |
2025-03-22 | Guidance Free Image Editing via Explicit Conditioning | Mehdi Noroozi et.al. | 2503.17593 | null |
2025-03-21 | HyperNVD: Accelerating Neural Video Decomposition via Hypernetworks | Maria Pilligua et.al. | 2503.17276 | null |
2025-03-21 | DCEdit: Dual-Level Controlled Image Editing via Precisely Localized Semantics | Yihan Hu et.al. | 2503.16795 | null |
2025-03-20 | FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing | Tianyi Wei et.al. | 2503.16153 | null |
2025-03-20 | Single Image Iterative Subject-driven Generation and Editing | Yair Shpitzer et.al. | 2503.16025 | link |
2025-03-19 | VEGGIE: Instructional Editing and Reasoning of Video Concepts with Grounded Generation | Shoubin Yu et.al. | 2503.14350 | null |
2025-03-18 | ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing | Yulin Pan et.al. | 2503.14482 | null |
2025-03-18 | TarPro: Targeted Protection against Malicious Image Editing | Kaixin Shen et.al. | 2503.13994 | null |
2025-03-17 | FiVE: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models | Minghan Li et.al. | 2503.13684 | null |
2025-03-17 | Unified Autoregressive Visual Generation and Understanding with Continuous Tokens | Lijie Fan et.al. | 2503.13436 | null |
2025-03-17 | Edit Transfer: Learning Image Editing via Vision In-Context Relations | Lan Chen et.al. | 2503.13327 | null |
2025-03-17 | GIFT: Generated Indoor video frames for Texture-less point tracking | Jianzheng Huang et.al. | 2503.12944 | null |
2025-03-17 | DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode | Junjia Huang et.al. | 2503.12838 | null |
2025-03-16 | UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing | Tsu-Jui Fu et.al. | 2503.12652 | null |
2025-03-16 | Personalize Anything for Free with Diffusion Transformer | Haoran Feng et.al. | 2503.12590 | null |
2025-03-14 | Upcycling Text-to-Image Diffusion Models for Multi-Task Capabilities | Ruchika Chavhan et.al. | 2503.11905 | null |
2025-03-14 | RASA: Replace Anyone, Say Anything -- A Training-Free Framework for Audio-Driven and Universal Portrait Video Editing | Tianrui Pan et.al. | 2503.11571 | null |
2025-03-14 | LUSD: Localized Update Score Distillation for Text-Guided Image Editing | Worameth Chinchuthakun et.al. | 2503.11054 | link |
2025-03-14 | V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes | Yanming Zhang et.al. | 2503.10634 | null |
2025-03-14 | On the Limitations of Vision-Language Models in Understanding Image Transforms | Ahmad Mustafa Anis et.al. | 2503.09837 | null |
2025-03-13 | Fine-Tuning Diffusion Generative Models via Rich Preference Optimization | Hanyang Zhao et.al. | 2503.11720 | null |
2025-03-13 | CoSTA |
Advait Gupta et.al. | 2503.10613 | link |
2025-03-13 | EEdit : Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing | Zexuan Yan et.al. | 2503.10270 | link |
2025-03-13 | MoEdit: On Learning Quantity Perception for Multi-object Image Editing | Yanfeng Li et.al. | 2503.10112 | link |
2025-03-13 | Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models | Armando Fortes et.al. | 2503.08434 | null |
2025-03-12 | Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space | Yifan Zhou et.al. | 2503.09419 | link |
2025-03-12 | InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images | Jiun Tian Hoe et.al. | 2503.09130 | null |
2025-03-12 | OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting | Yongsheng Yu et.al. | 2503.08677 | null |
2025-03-11 | Aligning Text to Image in Diffusion Models is Easier Than You Think | Jaa-Yeon Lee et.al. | 2503.08250 | link |
2025-03-11 | ObjectMover: Generative Object Movement with Video Prior | Xin Yu et.al. | 2503.08037 | null |
2025-03-11 | CAD-VAE: Leveraging Correlation-Aware Latents for Comprehensive Fair Disentanglement | Chenrui Ma et.al. | 2503.07938 | null |
2025-03-11 | VACE: All-in-One Video Creation and Editing | Zeyinzi Jiang et.al. | 2503.07598 | null |
2025-03-10 | Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model | Lixue Gong et.al. | 2503.07703 | null |
2025-03-10 | TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation | Victor Shea-Jay Huang et.al. | 2503.07050 | null |
2025-03-10 | Interactive Tumor Progression Modeling via Sketch-Based Image Editing | Gexin Huang et.al. | 2503.06809 | null |
2025-03-10 | VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control | Yuxuan Bian et.al. | 2503.05639 | link |
2025-03-09 | Consistent Image Layout Editing with Diffusion Models | Tao Xia et.al. | 2503.06419 | null |
2025-03-08 | Get In Video: Add Anything You Want to the Video | Shaobin Zhuang et.al. | 2503.06268 | null |
2025-03-08 | X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation | Jian Ma et.al. | 2503.06134 | link |
2025-03-07 | Towards Locally Explaining Prediction Behavior via Gradual Interventions and Measuring Property Gradients | Niklas Penzel et.al. | 2503.05424 | null |
2025-03-06 | Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models | Rui Jiang et.al. | 2503.04215 | null |
2025-03-05 | GuardDoor: Safeguarding Against Malicious Diffusion Editing via Protective Backdoors | Yaopei Zeng et.al. | 2503.03944 | null |
2025-03-04 | h-Edit: Effective and Flexible Diffusion-Based Editing via Doob's h-Transform | Toan Nguyen et.al. | 2503.02187 | link |
2025-03-03 | VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors | Juil Koo et.al. | 2503.01107 | null |
2025-03-01 | GenVDM: Generating Vector Displacement Maps From a Single Image | Yuezhi Yang et.al. | 2503.00605 | null |
2025-02-27 | Tight Inversion: Image-Conditioned Inversion for Real Image Editing | Edo Kadosh et.al. | 2502.20376 | null |
2025-02-27 | Identity-preserving Distillation Sampling by Fixed-Point Iterator | SeonHwa Kim et.al. | 2502.19930 | null |
2025-02-26 | SVGEditBench V2: A Benchmark for Instruction-based SVG Editing | Kunato Nishina et.al. | 2502.19453 | link |
2025-02-26 | Bayesian Optimization for Controlled Image Editing via LLMs | Chengkun Cai et.al. | 2502.18116 | null |
2025-02-25 | KV-Edit: Training-Free Image Editing for Precise Background Preservation | Tianrui Zhu et.al. | 2502.17363 | link |
2025-02-24 | VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing | Xiangpeng Yang et.al. | 2502.17258 | null |
2025-02-23 | PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data | Shijie Huang et.al. | 2502.14397 | link |
2025-02-22 | DualNeRF: Text-Driven 3D Scene Editing via Dual-Field Representation | Yuxuan Xiong et.al. | 2502.16302 | null |
2025-02-18 | AnyRefill: A Unified, Data-Efficient Framework for Left-Prompt-Guided Vision Tasks | Ming Xie et.al. | 2502.11158 | null |
2025-02-14 | PromptArtisan: Multi-instruction Image Editing in Single Pass with Complete Attention Control | Kunal Swami et.al. | 2502.10258 | null |
2025-02-14 | VideoDiff: Human-AI Video Co-Creation with Alternatives | Mina Huh et.al. | 2502.10190 | null |
2025-02-14 | Hands-off Image Editing: Language-guided Editing without any Task-specific Labeling, Masking or even Training | Rodrigo Santos et.al. | 2502.10064 | null |
2025-02-14 | SportsBuddy: Designing and Evaluating an AI-Powered Sports Video Storytelling Tool Through Real-World Deployment | Tica Lin et.al. | 2502.08621 | null |
2025-02-10 | Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists | Bojia Zi et.al. | 2502.06734 | null |
2025-02-10 | Predictive Red Teaming: Breaking Policies Without Breaking Robots | Anirudha Majumdar et.al. | 2502.06575 | null |
2025-02-08 | AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming And Keyframe Selection | Shuheng Zhang et.al. | 2502.05433 | null |
2025-02-06 | MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation | Jinbo Xing et.al. | 2502.04299 | null |
2025-02-06 | PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models | Aleksandar Cvejic et.al. | 2502.04050 | null |
2025-02-06 | DICE: Distilling Classifier-Free Guidance into Text Embeddings | Zhenyu Zhou et.al. | 2502.03726 | null |
2025-02-05 | Lost in Edits? A |
Wenhao You et.al. | 2502.04364 | null |
2025-02-05 | REALEDIT: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations | Peter Sushko et.al. | 2502.03629 | null |
2025-02-04 | Exploring the latent space of diffusion models directly through singular value decomposition | Li Wang et.al. | 2502.02225 | null |
2025-02-04 | EditIQ: Automated Cinematic Editing of Static Wide-Angle Videos via Dialogue Interpretation and Saliency Cues | Rohit Girmaji et.al. | 2502.02172 | null |
2025-02-04 | Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation | JooHyun Kwon et.al. | 2502.02091 | null |
2025-01-30 | DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models | Ruofan Liang et.al. | 2501.18590 | null |
2025-01-24 | MATCHA:Towards Matching Anything | Fei Xue et.al. | 2501.14945 | null |
2025-01-24 | Training-Free Style and Content Transfer by Leveraging U-Net Skip Connections in Stable Diffusion 2.* | Ludovica Schaerf et.al. | 2501.14524 | null |
2025-01-23 | IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models | Jiayi Lei et.al. | 2501.13920 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-06-25 | A Computationally Aware Multi Objective Framework for Camera LiDAR Calibration | Venkat Karramreddy et.al. | 2506.20636 | null |
2025-06-25 | Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings | Ankit Shah et.al. | 2506.20609 | null |
2025-06-25 | Learning-Based Distance Estimation for 360° Single-Sensor Setups | Yitong Quan et.al. | 2506.20586 | null |
2025-06-25 | Communication-Aware Map Compression for Online Path-Planning: A Rate-Distortion Approach | Ali Reza Pedram et.al. | 2506.20579 | null |
2025-06-25 | HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction | Zhonghao Shi et.al. | 2506.20566 | null |
2025-06-25 | Reinforcement Learning Increases Wind Farm Power Production by Enabling Closed-Loop Collaborative Control | Andrew Mole et.al. | 2506.20554 | null |
2025-06-25 | Lightweight Multi-Frame Integration for Robust YOLO Object Detection in Videos | Yitong Quan et.al. | 2506.20550 | null |
2025-06-25 | {\tt RapidGBM}: An Efficient Tool for Fermi-GBM Visibility Checking and Data Analysis with a Case Study of EP240617a | Yun Wang et.al. | 2506.20532 | null |
2025-06-25 | Comparison between Causal and Acausal Diffusion: a Schwinger-Keldysh Effective Field Theory Perspective | Navid Abbasi et.al. | 2506.20500 | null |
2025-06-25 | Learning-based safety lifting monitoring system for cranes on construction sites | Hao Chen et.al. | 2506.20475 | null |
2025-06-25 | Enhanced Robotic Navigation in Deformable Environments using Learning from Demonstration and Dynamic Modulation | Lingyun Chen et.al. | 2506.20376 | null |
2025-06-25 | Producer-Fairness in Sequential Bundle Recommendation | Alexandre Rio et.al. | 2506.20329 | null |
2025-06-25 | Finding the Easy Way Through -- the Probabilistic Gap Planner for Social Robot Navigation | Malte Probst et.al. | 2506.20320 | null |
2025-06-25 | Computed tomography of propagating microwave photons | Qi-Ming Chen et.al. | 2506.20318 | null |
2025-06-25 | Real-Time Obstacle Avoidance Algorithms for Unmanned Aerial and Ground Vehicles | Jingwen Wei et.al. | 2506.20311 | null |
2025-06-25 | Analog OFDM based on Real-Time Fourier Transformation | Xiaolu Yang et.al. | 2506.20287 | null |
2025-06-25 | Dynamic Bandwidth Allocation for Hybrid Event-RGB Transmission | Pujing Yang et.al. | 2506.20222 | null |
2025-06-25 | Personalized Mental State Evaluation in Human-Robot Interaction using Federated Learning | Andrea Bussolan et.al. | 2506.20212 | null |
2025-06-25 | RaRa Clipper: A Clipper for Gaussian Splatting Based on Ray Tracer and Rasterizer | Da Li et.al. | 2506.20202 | null |
2025-06-25 | First experimental demonstration of plasma shape control in a tokamak through Model Predictive Control | Adriano Mele et.al. | 2506.20096 | null |
2025-06-25 | Adaptive Request Scheduling for CodeLLM Serving with SLA Guarantees | Shi Chang et.al. | 2506.19677 | null |
2025-06-24 | The Shape of Consumer Behavior: A Symbolic and Topological Analysis of Time Series | Pola Bereta et.al. | 2506.19759 | null |
2025-06-24 | MDR-DeePC: Model-Inspired Distributionally Robust Data-Enabled Predictive Control | Shihao Li et.al. | 2506.19744 | null |
2025-06-24 | NEAR |
Shenbin Qian et.al. | 2506.19743 | null |
2025-06-24 | Dual-energy extraction for proton therapy and imaging: validation on a clinical synchrotron-based facility | Alexander A. Pryanichnikov et.al. | 2506.19736 | null |
2025-06-24 | SIP-IFVM: An observation-based magnetohydrodynamic model of coronal mass ejection | Haopeng Wang et.al. | 2506.19711 | null |
2025-06-24 | Health Sentinel: An AI Pipeline For Real-time Disease Outbreak Detection | Devesh Pant et.al. | 2506.19548 | null |
2025-06-24 | NTRL: Encounter Generation via Reinforcement Learning for Dynamic Difficulty Adjustment in Dungeons and Dragons | Carlo Romeo et.al. | 2506.19530 | null |
2025-06-24 | MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications | Aleksandr Algazinov et.al. | 2506.19502 | null |
2025-06-24 | An analytical model of depth-dose distributions for carbon-ion beams | Fulya Halıcılar et.al. | 2506.19479 | null |
2025-06-24 | Can Movable Antenna-enabled Micro-Mobility Replace UAV-enabled Macro-Mobility? A Physical Layer Security Perspective | Kaixuan Li et.al. | 2506.19456 | null |
2025-06-24 | Enhanced Fault Ride-Through Grid Forming with Transient Synchronisation Stability and Current Saturation | Youcefa Brahim Elkhalil et.al. | 2506.19444 | null |
2025-06-24 | Mem4Nav: Boosting Vision-and-Language Navigation in Urban Environments with a Hierarchical Spatial-Cognition Long-Short Memory System | Lixuan He et.al. | 2506.19433 | null |
2025-06-24 | Virtual Memory for 3D Gaussian Splatting | Jonathan Haberl et.al. | 2506.19415 | null |
2025-06-24 | Can theory-driven learning analytics dashboard enhance human-AI collaboration in writing learning? Insights from an empirical experiment | Angxuan Chen et.al. | 2506.19364 | null |
2025-06-24 | OpticalAging: Real-time Presbyopia Simulation for Inclusive Design via Tunable Lenses | Qing Zhang et.al. | 2506.19307 | null |
2025-06-24 | Ontology Neural Network and ORTSF: A Framework for Topological Reasoning and Delay-Robust Control | Jaehong Oh et.al. | 2506.19277 | null |
2025-06-24 | High-throughput spin-bath characterization of spin-defects in semiconductors | Abigail N. Poteshman et.al. | 2506.19259 | null |
2025-06-24 | Behavioral Anomaly Detection in Distributed Systems via Federated Contrastive Learning | Renzi Meng et.al. | 2506.19246 | null |
2025-06-24 | PicoSAM2: Low-Latency Segmentation In-Sensor for Edge Vision Applications | Pietro Bonazzi et.al. | 2506.18807 | null |
2025-06-23 | PRISM: Perceptual Recognition for Identifying Standout Moments in Human-Centric Keyframe Extraction | Mert Can Cakmak et.al. | 2506.19168 | null |
2025-06-23 | MinD: Unified Visual Imagination and Control via Hierarchical World Models | Xiaowei Chi et.al. | 2506.18897 | null |
2025-06-23 | OmniGen2: Exploration to Advanced Multimodal Generation | Chenyuan Wu et.al. | 2506.18871 | null |
2025-06-23 | LIGHTHOUSE: Fast and precise distance to shoreline calculations from anywhere on earth | Patrick Beukema et.al. | 2506.18842 | null |
2025-06-23 | STU-PID: Steering Token Usage via PID Controller for Efficient Large Language Model Reasoning | Aryasomayajula Ram Bharadwaj et.al. | 2506.18831 | null |
2025-06-23 | MLLP-VRAIN UPV system for the IWSLT 2025 Simultaneous Speech Translation Translation task | Jorge Iranzo-Sánchez et.al. | 2506.18828 | null |
2025-06-23 | ModeliHub: A Web-based, Federated Analytics Platform for Modelica-centric, Model-based Systems Engineering | Mohamad Omar Nachawati et.al. | 2506.18790 | null |
2025-06-23 | Flow-Aware Diffusion for Real-Time VR Restoration: Enhancing Spatiotemporal Coherence and Efficiency | Yitong Zhu et.al. | 2506.18786 | null |
2025-06-23 | NOVA: Navigation via Object-Centric Visual Autonomy for High-Speed Target Tracking in Unstructured GPS-Denied Environments | Alessandro Saviolo et.al. | 2506.18689 | null |
2025-06-23 | Efficient and Generalizable Speaker Diarization via Structured Pruning of Self-Supervised Models | Jiangyu Han et.al. | 2506.18623 | null |
2025-06-23 | New Power Decoupling Method for Grid Forming Inverter Based on Adaptive Virtual-Synchronous Machine in Weak Grids | Waleed Breesam et.al. | 2506.18619 | null |
2025-06-23 | Frequency Control in Microgrids: An Adaptive Fuzzy-Neural-Network Virtual Synchronous Generator | Waleed Breesam et.al. | 2506.18611 | null |
2025-06-23 | PG-LIO: Photometric-Geometric fusion for Robust LiDAR-Inertial Odometry | Nikhil Khedekar et.al. | 2506.18583 | null |
2025-06-23 | Multi-Rank Subspace Change-Point Detection for Monitoring Robotic Swarms | Jonghyeok Lee et.al. | 2506.18562 | null |
2025-06-23 | Efficient Beam Selection for ISAC in Cell-Free Massive MIMO via Digital Twin-Assisted Deep Reinforcement Learning | Jiexin Zhang et.al. | 2506.18560 | null |
2025-06-23 | ADNF-Clustering: An Adaptive and Dynamic Neuro-Fuzzy Clustering for Leukemia Prediction | Marco Aruta et.al. | 2506.18396 | null |
2025-06-23 | Robots and Children that Learn Together : Improving Knowledge Retention by Teaching Peer-Like Interactive Robots | Imene Tarakli et.al. | 2506.18365 | null |
2025-06-23 | TritonZ: A Remotely Operated Underwater Rover with Manipulator Arm for Exploration and Rescue Operations | Kawser Ahmed et.al. | 2506.18343 | null |
2025-06-23 | Programmable electro-optic frequency comb empowers integrated parallel convolution processing | Jinze He et.al. | 2506.18310 | null |
2025-06-23 | LLM-Integrated Digital Twins for Hierarchical Resource Allocation in 6G Networks | Majumder Haider et.al. | 2506.18293 | null |
2025-06-20 | Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition | Jiaqi Li et.al. | 2506.17201 | null |
2025-06-20 | Judo: A User-Friendly Open-Source Package for Sampling-Based Model Predictive Control | Albert H. Li et.al. | 2506.17184 | null |
2025-06-20 | A Set-valued Impact Law Approach for Modeling and Analysis of Rigid Contact Universal Joint with Clearance | Junaid Ali et.al. | 2506.17183 | null |
2025-06-20 | A tutorial overview of model predictive control for continuous crystallization: current possibilities and future perspectives | Collin R. Johnson et.al. | 2506.17146 | null |
2025-06-20 | Real-time Broadband RFI Excision for the Upgraded GMRT | Ruta Kale et.al. | 2506.17131 | null |
2025-06-20 | Rapid and Continuous Trust Evaluation for Effective Task Collaboration Through Siamese Model | Botao Zhu et.al. | 2506.17128 | null |
2025-06-20 | RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and Tracking | Teng Guo et.al. | 2506.17119 | null |
2025-06-20 | JANUS: Resilient and Adaptive Data Transmission for Enabling Timely and Efficient Cross-Facility Scientific Workflows | Vladislav Esaulov et.al. | 2506.17084 | null |
2025-06-20 | Opportunities for real-time process control of electrode properties in lithium-ion battery manufacturing | Noël Hallemans et.al. | 2506.17048 | null |
2025-06-20 | Probing dynamical axion quasiparticles with two-photon correlations | Daniel Boyanovsky et.al. | 2506.17013 | null |
2025-06-20 | Prmpt2Adpt: Prompt-Based Zero-Shot Domain Adaptation for Resource-Constrained Environments | Yasir Ali Farrukh et.al. | 2506.16994 | null |
2025-06-20 | Wi-Fi Sensing Tool Release: Gathering 802.11ax Channel State Information from a Commercial Wi-Fi Access Point | Zisheng Wang et.al. | 2506.16957 | null |
2025-06-20 | Multimodal Fused Learning for Solving the Generalized Traveling Salesman Problem in Robotic Task Planning | Jiaqi Chen et.al. | 2506.16931 | null |
2025-06-20 | Single-shot thermometry of simulated Bose--Einstein condensates using artificial intelligence | Jack Griffiths et.al. | 2506.16925 | null |
2025-06-20 | Real-Time Black-Box Optimization for Dynamic Discrete Environments Using Embedded Ising Machines | Tomoya Kashimata et.al. | 2506.16924 | null |
2025-06-20 | ROS 2 Agnocast: Supporting Unsized Message Types for True Zero-Copy Publish/Subscribe IPC | Takahiro Ishikawa-Aso et.al. | 2506.16882 | null |
2025-06-20 | Revolutionizing Validation and Verification: Explainable Testing Methodologies for Intelligent Automotive Decision-Making Systems | Halit Eris et.al. | 2506.16876 | null |
2025-06-20 | RS-Coded Adaptive Dynamic Network for Reliable Long-Term Information Transmission in Disturbed Multimode Fiber | Yang Hu et.al. | 2506.16859 | null |
2025-06-20 | Robust Dynamic Material Handling via Adaptive Constrained Evolutionary Reinforcement Learning | Chengpeng Hu et.al. | 2506.16795 | null |
2025-06-20 | Reinforcement learning for hybrid charging stations planning and operation considering fixed and mobile chargers | Yanchen Zhu et.al. | 2506.16764 | null |
2025-06-18 | Vision in Action: Learning Active Perception from Human Demonstrations | Haoyu Xiong et.al. | 2506.15666 | null |
2025-06-18 | BoxFusion: Reconstruction-Free Open-Vocabulary 3D Object Detection via Real-Time Multi-View Box Fusion | Yuqing Lan et.al. | 2506.15610 | null |
2025-06-18 | MicroRicci: A Greedy and Local Ricci Flow Solver for Self-Tuning Mesh Smoothing | Le Vu Anh et.al. | 2506.15571 | null |
2025-06-18 | PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction | Shufan Li et.al. | 2506.15556 | null |
2025-06-18 | Real-Time Initialization of Unknown Anchors for UWB-aided Navigation | Giulio Delama et.al. | 2506.15518 | null |
2025-06-18 | Model Predictive Path-Following Control for a Quadrotor | David Leprich et.al. | 2506.15447 | null |
2025-06-18 | A Real-time Endoscopic Image Denoising System | Yu Xing et.al. | 2506.15395 | null |
2025-06-18 | Evaluation Pipeline for systematically searching for Anomaly Detection Systems | Florian Rokohl et.al. | 2506.15388 | null |
2025-06-18 | Efficient Navigation Among Movable Obstacles using a Mobile Manipulator via Hierarchical Policy Learning | Taegeun Yang et.al. | 2506.15380 | null |
2025-06-18 | J3DAI: A tiny DNN-Based Edge AI Accelerator for 3D-Stacked CMOS Image Sensor | Benoit Tain et.al. | 2506.15316 | null |
2025-06-18 | AI-driven visual monitoring of industrial assembly tasks | Mattia Nardon et.al. | 2506.15285 | null |
2025-06-18 | Multi-Agent Reinforcement Learning for Autonomous Multi-Satellite Earth Observation: A Realistic Case Study | Mohamad A. Hady et.al. | 2506.15207 | null |
2025-06-18 | In-Context Learning for Gradient-Free Receiver Adaptation: Principles, Applications, and Theory | Matteo Zecchin et.al. | 2506.15176 | null |
2025-06-18 | Human Locomotion Implicit Modeling Based Real-Time Gait Phase Estimation | Yuanlong Ji et.al. | 2506.15150 | null |
2025-06-18 | I Know You're Listening: Adaptive Voice for HRI | Paige Tuttösí et.al. | 2506.15107 | null |
2025-06-18 | EmojiVoice: Towards long-term controllable expressivity in robot speech | Paige Tuttösí et.al. | 2506.15085 | null |
2025-06-18 | Make Your AUV Adaptive: An Environment-Aware Reinforcement Learning Framework For Underwater Tasks | Yimian Ding et.al. | 2506.15082 | null |
2025-06-18 | ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies | Jinyan Yuan et.al. | 2506.14315 | null |
2025-06-17 | GCN-Driven Reinforcement Learning for Probabilistic Real-Time Guarantees in Industrial URLLC | Eman Alqudah et.al. | 2506.15011 | null |
2025-06-17 | Mixed Traffic: A Perspective from Long Duration Autonomy | Filippos Tzortzoglou et.al. | 2506.15004 | null |
2025-06-17 | CNN-Enabled Scheduling for Probabilistic Real-Time Guarantees in Industrial URLLC | Eman Alqudah et.al. | 2506.14987 | null |
2025-06-17 | CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion | Jiahua Ma et.al. | 2506.14769 | null |
2025-06-17 | Technosignature Searches with Real-time Alert Brokers | Eleanor M. Gallay et.al. | 2506.14744 | null |
2025-06-17 | Casper: Inferring Diverse Intents for Assistive Teleoperation with Vision Language Models | Huihan Liu et.al. | 2506.14727 | null |
2025-06-17 | SkinCells: Sparse Skinning using Voronoi Cells | Egor Larionov et.al. | 2506.14714 | null |
2025-06-17 | Deep Learning-Based Prediction of High Explosive Induced Fluid Dynamics | Francis G. VanGessel et.al. | 2506.14710 | null |
2025-06-17 | Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers | Daniel D'souza et.al. | 2506.14702 | null |
2025-06-17 | Design an Editable Speech-to-Sign-Language Transformer System: A Human-Centered AI Approach | Yingchao Li et.al. | 2506.14677 | null |
2025-06-17 | ASAP-FE: Energy-Efficient Feature Extraction Enabling Multi-Channel Keyword Spotting on Edge Processors | Jongin Choi et.al. | 2506.14657 | null |
2025-06-17 | 3DGS-IEval-15K: A Large-scale Image Quality Evaluation Database for 3D Gaussian-Splatting | Yuke Xing et.al. | 2506.14642 | null |
2025-06-17 | Low-code to fight climate change: the Climaborough project | Aaron Conrardy et.al. | 2506.14623 | null |
2025-06-17 | Deep Learning Surrogates for Real-Time Gas Emission Inversion | Thomas Newman et.al. | 2506.14597 | null |
2025-06-17 | Review of Machine Learning for Real-Time Analysis at the Large Hadron Collider experiments ALICE, ATLAS, CMS and LHCb | Laura Boggia et.al. | 2506.14578 | null |
2025-06-17 | GAMORA: A Gesture Articulated Meta Operative Robotic Arm for Hazardous Material Handling in Containment-Level Environments | Farha Abdul Wasay et.al. | 2506.14513 | null |
2025-06-17 | SimSpark: Interactive Simulation of Social Media Behaviors | Ziyue Lin et.al. | 2506.14476 | null |
2025-06-17 | MalGuard: Towards Real-Time, Accurate, and Actionable Detection of Malicious Packages in PyPI Ecosystem | Xingan Gao et.al. | 2506.14466 | null |
2025-06-17 | Active Digital Twins via Active Inference | Matteo Torzoni et.al. | 2506.14453 | null |
2025-06-17 | Socially Aware Robot Crowd Navigation via Online Uncertainty-Driven Risk Adaptation | Zhirui Sun et.al. | 2506.14305 | null |
2025-06-17 | Whole-Body Control Framework for Humanoid Robots with Heavy Limbs: A Model-Based Approach | Tianlin Zhang et.al. | 2506.14278 | null |
2025-06-17 | GHz spiking neuromorphic photonic chip with in-situ training | Jinlong Xiang et.al. | 2506.14272 | null |
2025-06-16 | Compact representation and long-time extrapolation of real-time data for quantum systems | Andre Erpenbeck et.al. | 2506.13760 | null |
2025-06-16 | Robust Recursive Fusion of Multiresolution Multispectral Images with Location-Aware Neural Networks | Haoqing Li et.al. | 2506.13733 | null |
2025-06-16 | BanditWare: A Contextual Bandit-based Framework for Hardware Prediction | Tainã Coleman et.al. | 2506.13730 | null |
2025-06-16 | How Real is CARLAs Dynamic Vision Sensor? A Study on the Sim-to-Real Gap in Traffic Object Detection | Kaiyuan Tan et.al. | 2506.13722 | null |
2025-06-16 | Direct visualization of visible-light hyperbolic plasmon polaritons in real space and time | Atreyie Ghosh et.al. | 2506.13719 | null |
2025-06-16 | HARMONI: Haptic-Guided Assistance for Unified Robotic Tele-Manipulation and Tele-Navigation | V. Sripada et.al. | 2506.13704 | null |
2025-06-16 | Photomagnetic-Chiral Anisotropy mediated by Chirality-Driven Asymmetric Spin Splitting | Tianwei Ouyang et.al. | 2506.13696 | null |
2025-06-16 | Integrated Pipeline for Monocular 3D Reconstruction and Finite Element Simulation in Industrial Applications | Bowen Zheng et.al. | 2506.13573 | null |
2025-06-16 | Controlled manipulation of solitons in a recirculating fiber loop using external potentials | François Copie et.al. | 2506.13544 | null |
2025-06-16 | UAV Object Detection and Positioning in a Mining Industrial Metaverse with Custom Geo-Referenced Data | Vasiliki Balaska et.al. | 2506.13505 | null |
2025-06-16 | Leveraging active learning-enhanced machine-learned interatomic potential for efficient infrared spectra prediction | Nitik Bhatia et.al. | 2506.13486 | null |
2025-06-16 | From Flat to Feeling: A Feasibility and Impact Study on Dynamic Facial Emotions in AI-Generated Avatars | Pegah Salehi et.al. | 2506.13477 | null |
2025-06-16 | SA-LUT: Spatial Adaptive 4D Look-Up Table for Photorealistic Style Transfer | Zerui Gong et.al. | 2506.13465 | link |
2025-06-16 | Block-wise Adaptive Caching for Accelerating Diffusion Policy | Kangye Ji et.al. | 2506.13456 | null |
2025-06-16 | Towards real-time additive-free dopamine detection at |
Ning Li et.al. | 2506.13447 | null |
2025-06-16 | Training Neural Networks by Optimizing Neuron Positions | Laura Erb et.al. | 2506.13410 | null |
2025-06-16 | HELENA: High-Efficiency Learning-based channel Estimation using dual Neural Attention | Miguel Camelo Botero et.al. | 2506.13408 | link |
2025-06-16 | A Model-Free Detection Method for Internal Short Circuits in Single Lithium-ion Cells Using Pseudo Open-Circuit Voltage Difference | Yangyang Xu et.al. | 2506.13394 | null |
2025-06-16 | Joint Optimization of Multi-UAV Deployment and 3D Positioning in Traffic-Aware Aerial Networks | Kamran Shafafi et.al. | 2506.13287 | null |
2025-06-16 | SONIC: Sound Optimization for Noise In Crowds | Pranav M N et.al. | 2506.13272 | null |
2025-06-13 | Reimagining Dance: Real-time Music Co-creation between Dancers and AI | Olga Vechtomova et.al. | 2506.12008 | null |
2025-06-13 | Robustness of Floquet topological phase at room temperature: a first-principles dynamics study | Ruiyi Zhou et.al. | 2506.12005 | null |
2025-06-13 | Learning Before Filtering: Real-Time Hardware Learning at the Detector Level | Boštjan Maček et.al. | 2506.11981 | null |
2025-06-13 | Secure API-Driven Research Automation to Accelerate Scientific Discovery | Tyler J. Skluzacek et.al. | 2506.11950 | null |
2025-06-13 | Palpation Alters Auditory Pain Expressions with Gender-Specific Variations in Robopatients | Chapa Sirithunge et.al. | 2506.11906 | null |
2025-06-13 | DMRS-Based Uplink Channel Estimation for MU-MIMO Systems with Location-Specific SCSI Acquisition | Jiawei Zhuang et.al. | 2506.11899 | null |
2025-06-13 | Enter: Graduated Realism: A Pedagogical Framework for AI-Powered Avatars in Virtual Reality Teacher Training | Judson Leroy Dean Haynes IV et.al. | 2506.11890 | null |
2025-06-13 | An Explainable AI Framework for Dynamic Resource Management in Vehicular Network Slicing | Haochen Sun et.al. | 2506.11882 | null |
2025-06-13 | Bistable random momentum transfer in a linear on-chip resonator | Tingyi Gu et.al. | 2506.11859 | null |
2025-06-13 | Framework of a multiscale data-driven digital twin of the muscle-skeletal system | Martina Paccini et.al. | 2506.11821 | null |
2025-06-13 | Diffusion-Based Electrocardiography Noise Quantification via Anomaly Detection | Tae-Seong Han et.al. | 2506.11815 | link |
2025-06-13 | SSPINNpose: A Self-Supervised PINN for Inertial Pose and Dynamics Estimation | Markus Gambietz et.al. | 2506.11786 | null |
2025-06-13 | Real-Time Feedback and Benchmark Dataset for Isometric Pose Evaluation | Abhishek Jaiswal et.al. | 2506.11774 | null |
2025-06-13 | Dynamic Collaborative Material Distribution System for Intelligent Robots In Smart Manufacturing | Ziren Xiao et.al. | 2506.11723 | null |
2025-06-13 | Modeling Urban Air Quality Using Taxis as Sensors | Anastasios Noulas et.al. | 2506.11720 | null |
2025-06-13 | Generalised Rate Control Approach For Stream Processing Applications | Ziren Xiao et.al. | 2506.11710 | null |
2025-06-13 | DMAF-Net: An Effective Modality Rebalancing Framework for Incomplete Multi-Modal Medical Image Segmentation | Libin Lan et.al. | 2506.11691 | null |
2025-06-13 | GraphRAG-Causal: A novel graph-augmented framework for causal reasoning and annotation in news | Abdul Haque et.al. | 2506.11600 | null |
2025-06-13 | Camera-based method for the detection of lifted truck axles using convolutional neural networks | Bachir Tchana Tankeu et.al. | 2506.11574 | null |
2025-06-13 | Scheduling Agile Earth Observation Satellites with Onboard Processing and Real-Time Monitoring | Antonio M. Mercado-Martínez et.al. | 2506.11556 | null |
2025-06-12 | InstaInpaint: Instant 3D-Scene Inpainting with Masked Large Reconstruction Model | Junqi You et.al. | 2506.10980 | null |
2025-06-12 | Discovery and Localization of the Swift-Observed FRB 20241228A in a Star-forming Host Galaxy | Alice P. Curtin et.al. | 2506.10961 | null |
2025-06-12 | Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors | Chen Yueh-Han et.al. | 2506.10949 | link |
2025-06-12 | Execution Guided Line-by-Line Code Generation | Boaz Lavon et.al. | 2506.10948 | link |
2025-06-12 | Non-Abelian dynamics on a cube: improving quantum compilation through qudit-based simulations | Jacky Jiang et.al. | 2506.10945 | null |
2025-06-12 | Building a Media Ecosystem Observatory from Scratch: Infrastructure, Methodology, and Insights | Zeynep Pehlivan et.al. | 2506.10942 | null |
2025-06-12 | MARS: Processing-In-Memory Acceleration of Raw Signal Genome Analysis Inside the Storage Subsystem | Melina Soysal et.al. | 2506.10931 | null |
2025-06-12 | Agentic Semantic Control for Autonomous Wireless Space Networks: Extending Space-O-RAN with MCP-Driven Distributed Intelligence | Eduardo Baena et.al. | 2506.10925 | null |
2025-06-12 | Adaptive Job Scheduling in Quantum Clouds Using Reinforcement Learning | Waylon Luo et.al. | 2506.10889 | null |
2025-06-12 | S3 Mirror: S3Mirror: Making Genomic Data Transfers Fast, Reliable, and Observable with DBOS | Steven Vasquez-Grinnell et.al. | 2506.10886 | null |
2025-06-12 | Modeling Trust Dynamics in Robot-Assisted Delivery: Impact of Trust Repair Strategies | Dong Hae Mangalindan et.al. | 2506.10884 | null |
2025-06-12 | Enhancing Medical Dialogue Generation through Knowledge Refinement and Dynamic Prompt Adjustment | Hongda Sun et.al. | 2506.10877 | link |
2025-06-12 | General Reference Frame Identification and Transformation in Unbalanced Power Systems | Francisco G. Montoya et.al. | 2506.10835 | null |
2025-06-12 | A novel visual data-based diagnostic approach for estimation of regime transition in pool boiling | Pranay Nirapure et.al. | 2506.10832 | null |
2025-06-12 | Efficiency Robustness of Dynamic Deep Learning Systems | Ravishka Rathnasuriya et.al. | 2506.10831 | link |
2025-06-12 | Grasp Prediction based on Local Finger Motion Dynamics | Dimitar Valkov et.al. | 2506.10818 | null |
2025-06-12 | Human-Robot Navigation using Event-based Cameras and Reinforcement Learning | Ignacio Bugueno-Cordova et.al. | 2506.10790 | null |
2025-06-12 | Hazel Deriver: A Live Editor for Constructing Rule-Based Derivations | Zhiyao Zhong et.al. | 2506.10781 | null |
2025-06-12 | Integrating Large Language Models into Text Animation: An Intelligent Editing System with Inline and Chat Interaction | Bao Zhang et.al. | 2506.10762 | null |
2025-06-12 | Grounded Vision-Language Navigation for UAVs with Open-Vocabulary Goal Understanding | Yuhang Zhang et.al. | 2506.10756 | null |
2025-06-11 | DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos | Chieh Hubert Lin et.al. | 2506.09997 | null |
2025-06-11 | Locomotion on Constrained Footholds via Layered Architectures and Model Predictive Control | Zachary Olkin et.al. | 2506.09979 | null |
2025-06-11 | SRLAgent: Enhancing Self-Regulated Learning Skills through Gamification and LLM Assistance | Wentao Ge et.al. | 2506.09968 | null |
2025-06-11 | Mechanism of Conductivity Enhancement of Polymers Employing Microbubble Lithography | Anand Dev Ranjan et.al. | 2506.09957 | null |
2025-06-11 | Microservices and Real-Time Processing in Retail IT: A Review of Open-Source Toolchains and Deployment Strategies | Aaditaa Vashisht et.al. | 2506.09938 | null |
2025-06-11 | Repeated ancilla reuse for logical computation on a neutral atom quantum computer | J. A. Muniz et.al. | 2506.09936 | null |
2025-06-11 | TransGI: Real-Time Dynamic Global Illumination With Object-Centric Neural Transfer Model | Yijie Deng et.al. | 2506.09909 | null |
2025-06-11 | Machine Learning-Based Classification of Oils Using Dielectric Properties and Microwave Resonant Sensing | Amit Baran Dey et.al. | 2506.09867 | null |
2025-06-11 | Multi-FPGA Synchronization and Data Communication for Quantum Control and Measurement | Yilun Xu et.al. | 2506.09856 | null |
2025-06-11 | Advancing Exchange Rate Forecasting: Leveraging Machine Learning and AI for Enhanced Accuracy in Global Financial Markets | Md. Yeasin Rahat et.al. | 2506.09851 | null |
2025-06-11 | Learning Quality from Complexity and Structure: A Feature-Fused XGBoost Model for Video Quality Assessment | Amritha Premkumar et.al. | 2506.09795 | null |
2025-06-11 | Human-robot collaborative transport personalization via Dynamic Movement Primitives and velocity scaling | Paolo Franceschi et.al. | 2506.09697 | null |
2025-06-11 | Searching for sub-TeV IceCube neutrinos correlated to sub-threshold GW events | Tista Mukherjee et.al. | 2506.09694 | null |
2025-06-11 | Early and Accurate Recession Detection Using Classifiers on the Anticipation-Precision Frontier | Pascal Michaillat et.al. | 2506.09664 | null |
2025-06-11 | Real-Time Network Traffic Forecasting with Missing Data: A Generative Model Approach | Lei Deng et.al. | 2506.09647 | null |
2025-06-11 | VAULT: A Mobile Mapping System for ROS 2-based Autonomous Robots | Miguel Á. González-Santamarta et.al. | 2506.09583 | null |
2025-06-11 | Real-time adaptive tracking of fluctuating relaxation rates in superconducting qubits | Fabrizio Berritta et.al. | 2506.09576 | null |
2025-06-11 | HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene | Jianing Chen et.al. | 2506.09518 | null |
2025-06-11 | A Survey on the Role of Artificial Intelligence and Machine Learning in 6G-V2X Applications | Donglin Wang et.al. | 2506.09512 | null |
2025-06-11 | ArcNeural: A Multi-Modal Database for the Gen-AI Era | Wu Min et.al. | 2506.09467 | null |
2025-06-10 | Rapid cardiac activation prediction for cardiac resynchronization therapy planning using geometric deep learning | Ehsan Naghavi et.al. | 2506.08987 | link |
2025-06-10 | Online Learning Control Strategies for Industrial Processes with Application for Loosening and Conditioning | Yue Wu et.al. | 2506.08983 | null |
2025-06-10 | Rethinking Range-View LiDAR Segmentation in Adverse Weather | Longyu Yang et.al. | 2506.08979 | null |
2025-06-10 | Yau-YauAL: A computer tool for solving nonlinear filtering problems | Yu Wang et.al. | 2506.08976 | null |
2025-06-10 | WIP: Large Language Model-Enhanced Smart Tutor for Undergraduate Circuit Analysis | Liangliang Chen et.al. | 2506.08962 | null |
2025-06-10 | CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks | Yixuan Li et.al. | 2506.08931 | null |
2025-06-10 | Implementing Keyword Spotting on the MCUX947 Microcontroller with Integrated NPU | Petar Jakuš et.al. | 2506.08911 | null |
2025-06-10 | HabSim: Architecture for modelling disruptions, propagation, detection and repair in deep space habitats | Luca Vaccino et.al. | 2506.08903 | null |
2025-06-10 | Real-Time Cascade Mitigation in Power Systems Using Influence Graph Improved by Reinforcement Learning | Kai Zhou et.al. | 2506.08893 | null |
2025-06-10 | Help or Hindrance: Understanding the Impact of Robot Communication in Action Teams | Tauhid Tanjim et.al. | 2506.08892 | null |
2025-06-10 | SmartAttack: Air-Gap Attack via Smartwatches | Mordechai Guri et.al. | 2506.08866 | null |
2025-06-10 | StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams | Zike Wu et.al. | 2506.08862 | link |
2025-06-10 | Fast Estimation of Globally Optimal Independent Contact Regions for Robust Grasping and Manipulation | Jonathan P. King et.al. | 2506.08856 | null |
2025-06-10 | Agile Reinforcement Learning for Real-Time Task Scheduling in Edge Computing | Amin Avan et.al. | 2506.08850 | link |
2025-06-10 | FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency | Yifei Su et.al. | 2506.08822 | null |
2025-06-10 | Enhancing Synthetic CT from CBCT via Multimodal Fusion: A Study on the Impact of CBCT Quality and Alignment | Maximilian Tschuchnig et.al. | 2506.08716 | null |
2025-06-10 | Balancing Fixed Number of Nodes Among Multiple Fixed Clusters | Paritosh Ranjan et.al. | 2506.08715 | null |
2025-06-10 | Industrial Flexibility Investment Under Uncertainty: A Multi-Stage Stochastic Framework Considering Energy and Reserve Market Participation | Amund Norland et.al. | 2506.08638 | null |
2025-06-10 | Plug-and-Play Linear Attention for Pre-trained Image and Video Restoration Models | Srinivasan Kidambi et.al. | 2506.08520 | link |
2025-06-10 | One Patch to Rule Them All: Transforming Static Patches into Dynamic Attacks in the Physical World | Xingshuo Han et.al. | 2506.08482 | null |
2025-06-10 | Silencing Empowerment, Allowing Bigotry: Auditing the Moderation of Hate Speech on Twitch | Prarabdh Shukla et.al. | 2506.07667 | link |
2025-06-09 | Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion | Xun Huang et.al. | 2506.08009 | null |
2025-06-09 | MADFormer: Mixed Autoregressive and Diffusion Transformers for Continuous Image Generation | Junhao Chen et.al. | 2506.07999 | null |
2025-06-09 | Unraveling Ethereum's Mempool: The Impact of Fee Fairness, Transaction Prioritization, and Consensus Efficiency | S M Mostaq Hossain et.al. | 2506.07988 | null |
2025-06-09 | Real-time Localization of a Soccer Ball from a Single Camera | Dmitrii Vorobev et.al. | 2506.07981 | null |
2025-06-09 | Low-Complexity Super-Resolution Signature Estimation of XL-MIMO FMCW Radar | Chandrashekhar Rai et.al. | 2506.07979 | null |
2025-06-09 | Predicting Situation Awareness from Physiological Signals | Kieran J. Smith et.al. | 2506.07930 | null |
2025-06-09 | LUCIFER: Language Understanding and Context-Infused Framework for Exploration and Behavior Refinement | Dimitris Panagopoulos et.al. | 2506.07915 | null |
2025-06-09 | GaussianVAE: Adaptive Learning Dynamics of 3D Gaussians for High-Fidelity Super-Resolution | Shuja Khalid et.al. | 2506.07897 | null |
2025-06-09 | CrosswalkNet: An Optimized Deep Learning Framework for Pedestrian Crosswalk Detection in Aerial Images with High-Performance Computing | Zubin Bhuyan et.al. | 2506.07885 | null |
2025-06-09 | Spatio-Temporal State Space Model For Efficient Event-Based Optical Flow | Muhammad Ahmed Humais et.al. | 2506.07878 | link |
2025-06-09 | Egocentric Event-Based Vision for Ping Pong Ball Trajectory Prediction | Ivan Alberico et.al. | 2506.07860 | link |
2025-06-09 | SAM2Auto: Auto Annotation Using FLASH | Arash Rocky et.al. | 2506.07850 | null |
2025-06-09 | R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation | William Ljungbergh et.al. | 2506.07826 | null |
2025-06-09 | On-The-Fly Symbolic Algorithm for Timed ATL with Abstractions | Nicolaj Ø. Jensen et.al. | 2506.07802 | null |
2025-06-09 | Novel software for continuous wavelet analysis enable EEG real-time analysis on portable computers | Shoichiro Nakanishi et.al. | 2506.07793 | null |
2025-06-09 | Language-Vision Planner and Executor for Text-to-Visual Reasoning | Yichang Xu et.al. | 2506.07778 | null |
2025-06-09 | ETA: Efficiency through Thinking Ahead, A Dual Approach to Self-Driving with Large Models | Shadi Hamdan et.al. | 2506.07725 | null |
2025-06-09 | CommSense: A Rapid and Accurate ISAC Paradigm | Sandip Jana et.al. | 2506.07685 | null |
2025-06-09 | QUITE: A Query Rewrite System Beyond Rules with LLM Agents | Yuyang Song et.al. | 2506.07675 | null |
2025-06-06 | RecGPT: A Foundation Model for Sequential Recommendation | Yangqin Jiang et.al. | 2506.06270 | link |
2025-06-06 | Integrating Complexity and Biological Realism: High-Performance Spiking Neural Networks for Breast Cancer Detection | Zofia Rudnicka et.al. | 2506.06265 | null |
2025-06-06 | Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens | Jihwan Jeong et.al. | 2506.06261 | null |
2025-06-06 | From NLVO to NAO: Reactive Robot Navigation using Velocity and Acceleration Obstacles | Asher Stern et.al. | 2506.06255 | null |
2025-06-06 | PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time | Weizhi Zhang et.al. | 2506.06254 | null |
2025-06-06 | Correlated Structural and Optical Characterization during Van der Waals Epitaxy of PbI2 on Graphene | C. P. Sonny Tsotezem et.al. | 2506.06241 | null |
2025-06-06 | Initial stage jet momentum broadening in tBLFQ formalism | Dana Avramescu et.al. | 2506.06206 | null |
2025-06-06 | NAT: Neural Acoustic Transfer for Interactive Scenes in Real Time | Xutong Jin et.al. | 2506.06190 | null |
2025-06-06 | Physics-Informed Neural Networks for Control of Single-Phase Flow Systems Governed by Partial Differential Equations | Luis Kin Miyatake et.al. | 2506.06188 | null |
2025-06-06 | Technical Report for Egocentric Mistake Detection for the HoloAssist Challenge | Constantin Patsch et.al. | 2506.06174 | null |
2025-06-06 | Stream DaQ: Stream-First Data Quality Monitoring | Vasileios Papastergios et.al. | 2506.06147 | link |
2025-06-06 | On the Suitability of Wi-Fi for Interconnecting Moving Equipment in Industrial Environments | Pietro Chiavassa et.al. | 2506.06074 | null |
2025-06-06 | Conversational Interfaces for Parametric Conceptual Architectural Design: Integrating Mixed Reality with LLM-driven Interaction | Ruochen Ji et.al. | 2506.06066 | null |
2025-06-06 | Direct Integration of Recursive Gaussian Process Regression Into Extended Kalman Filters With Application to Vapor Compression Cycle Control | Ricus Husmann et.al. | 2506.06065 | null |
2025-06-06 | Enhanced Trust Region Sequential Convex Optimization for Multi-Drone Thermal Screening Trajectory Planning in Urban Environments | Kaiyuan Chen et.al. | 2506.06012 | link |
2025-06-06 | MOGO: Residual Quantized Hierarchical Causal Transformer for High-Quality and Real-Time 3D Human Motion Generation | Dongjie Fu et.al. | 2506.05952 | null |
2025-06-06 | Neural Visibility Cache for Real-Time Light Sampling | Jakub Bokšanský et.al. | 2506.05930 | null |
2025-06-06 | Proactive Assistant Dialogue Generation from Streaming Egocentric Videos | Yichi Zhang et.al. | 2506.05904 | null |
2025-06-06 | The Online Data Filter for the KM3NeT Neutrino Telescopes | O. Adriani et.al. | 2506.05881 | null |
2025-06-06 | Towards Next-Generation Intelligent Maintenance: Collaborative Fusion of Large and Small Models | Xiaoyi Yuan et.al. | 2506.05854 | null |
2025-06-06 | FreeTimeGS: Free Gaussian Primitives at Anytime and Anywhere for Dynamic Scene Reconstruction | Yifan Wang et.al. | 2506.05348 | null |
2025-06-05 | SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs | Jiahui Wang et.al. | 2506.05344 | link |
2025-06-05 | Generalizable, real-time neural decoding with hybrid state-space models | Avery Hee-Woon Ryoo et.al. | 2506.05320 | null |
2025-06-05 | Fast-DataShapley: Neural Modeling for Training Data Valuation | Haifeng Sun et.al. | 2506.05281 | null |
2025-06-05 | Vision-Based Autonomous MM-Wave Reflector Using ArUco-Driven Angle-of-Arrival Estimation | Josue Marroquin et.al. | 2506.05195 | null |
2025-06-05 | Noise-Driven AI Sensors: Secure Healthcare Monitoring with PUFs | Christiana Chamon et.al. | 2506.05135 | null |
2025-06-05 | EDEN: Efficient Dual-Layer Exploration Planning for Fast UAV Autonomous Exploration in Large 3-D Environments | Qianli Dong et.al. | 2506.05106 | link |
2025-06-05 | Cloud-Based Interoperability in Residential Energy Systems | Darren Leniston et.al. | 2506.05076 | null |
2025-06-05 | PulseRide: A Robotic Wheelchair for Personalized Exertion Control with Human-in-the-Loop Reinforcement Learning | Azizul Zahid et.al. | 2506.05056 | null |
2025-06-05 | Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic Reasoning | Mehdi Azarafza et.al. | 2506.04998 | null |
2025-06-05 | En Route Path-planning for Partially Occupied Vehicles in Ride-pooling Systems | Pengbo Zhu et.al. | 2506.04968 | null |
2025-06-05 | Organic Crystal Active Waveguide as an All-Angle Signal Receiver and Transmission Platform for Visible Light Communication | Ankur Khapre et.al. | 2506.04874 | null |
2025-06-05 | Beyond the Desktop: XR-Driven Segmentation with Meta Quest 3 and MX Ink | Lisle Faray de Paiva et.al. | 2506.04858 | null |
2025-06-05 | Deep learning image burst stacking to reconstruct high-resolution ground-based solar observations | Christoph Schirninger et.al. | 2506.04781 | null |
2025-06-05 | A high-sensitivity frequency counter for free-induction-decay signals | Tong Gong et.al. | 2506.04780 | null |
2025-06-05 | Tire Wear Aware Trajectory Tracking Control for Multi-axle Swerve-drive Autonomous Mobile Robots | Tianxin Hu et.al. | 2506.04752 | null |
2025-06-05 | SRD: Reinforcement-Learned Semantic Perturbation for Backdoor Defense in VLMs | Shuhan Xu et.al. | 2506.04743 | null |
2025-06-05 | Real-Time LPV-Based Non-Linear Model Predictive Control for Robust Trajectory Tracking in Autonomous Vehicles | Nitish Kumar et.al. | 2506.04684 | null |
2025-06-05 | Application of SDRE to Achieve Gait Control in a Bipedal Robot for Knee-Type Exoskeleton Testing | Ping-Kong Huang et.al. | 2506.04680 | null |
2025-06-05 | Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation | Yuyang Wanyan et.al. | 2506.04614 | null |
2025-06-05 | Construction of Urban Greenland Resources Collaborative Management Platform | Dongyang Lyu et.al. | 2506.03830 | null |
2025-06-05 | MambaNeXt-YOLO: A Hybrid State Space Model for Real-time Object Detection | Xiaochun Lei et.al. | 2506.03654 | null |
2025-06-04 | MS-YOLO: A Multi-Scale Model for Accurate and Efficient Blood Cell Detection | Guohua Wu et.al. | 2506.03972 | null |
2025-06-04 | FPGA-Enabled Machine Learning Applications in Earth Observation: A Systematic Review | Cédric Léonard et.al. | 2506.03938 | link |
2025-06-04 | Forecasting Seasonal Influenza Epidemics with Physics-Informed Neural Networks | Martina Rama et.al. | 2506.03897 | null |
2025-06-04 | JointSplat: Probabilistic Joint Flow-Depth Optimization for Sparse-View Gaussian Splatting | Yang Xiao et.al. | 2506.03872 | null |
2025-06-04 | Frame-Level Real-Time Assessment of Stroke Rehabilitation Exercises from Video-Level Labeled Data: Task-Specific vs. Foundation Models | Gonçalo Mesquita et.al. | 2506.03752 | null |
2025-06-04 | Probabilistic measures afford fair comparisons of AIWP and NWP model output | Tilmann Gneiting et.al. | 2506.03744 | link |
2025-06-04 | Accelerating SfM-based Pose Estimation with Dominating Set | Joji Joseph et.al. | 2506.03667 | null |
2025-06-04 | Analyzing Transformer Models and Knowledge Distillation Approaches for Image Captioning on Edge AI | Wing Man Casca Kwok et.al. | 2506.03607 | null |
2025-06-04 | SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting | Shengjie Lin et.al. | 2506.03594 | link |
2025-06-04 | SwitchVLA: Execution-Aware Task Switching for Vision-Language-Action Models | Meng Li et.al. | 2506.03574 | null |
2025-06-04 | Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments | Reo Yoneyama et.al. | 2506.03554 | null |
2025-06-04 | A Threat Intelligence Event Extraction Conceptual Model for Cyber Threat Intelligence Feeds | Jamal H. Al-Yasiri et.al. | 2506.03551 | null |
2025-06-04 | Topology-Aware Graph Neural Network-based State Estimation for PMU-Unobservable Power Systems | Shiva Moshtagh et.al. | 2506.03493 | null |
2025-06-04 | Adaptive Configuration Selection for Multi-Model Inference Pipelines in Edge Computing | Jinhao Sheng et.al. | 2506.02814 | null |
2025-06-04 | Voyager: Real-Time Splatting City-Scale 3D Gaussians on Your Phone | Zheng Liu et.al. | 2506.02774 | null |
2025-06-03 | StARS DCM: A Sleep Stage-Decoding Forehead EEG Patch for Real-time Modulation of Sleep Physiology | William G. Coon et.al. | 2506.03442 | null |
2025-06-03 | Online multi-layer FDR control | Runqiu Wang et.al. | 2506.03406 | null |
2025-06-03 | A Multimodal, Multilingual, and Multidimensional Pipeline for Fine-grained Crowdsourcing Earthquake Damage Evaluation | Zihui Ma et.al. | 2506.03360 | link |
2025-06-03 | Spatial Association Between Near-Misses and Accident Blackspots in Sydney, Australia: A Getis-Ord |
Artur Grigorev et.al. | 2506.03356 | null |
2025-06-03 | Structural Vibration Monitoring with Diffractive Optical Processors | Yuntian Wang et.al. | 2506.03317 | null |
2025-06-03 | TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models | Chetwin Low et.al. | 2506.03099 | null |
2025-06-03 | InterMamba: Efficient Human-Human Interaction Generation with Adaptive Spatio-Temporal Mamba | Zizhao Wu et.al. | 2506.03084 | null |
2025-06-03 | LEG-SLAM: Real-Time Language-Enhanced Gaussian Splatting for SLAM | Roman Titkov et.al. | 2506.03073 | null |
2025-06-03 | Diffusion Buffer: Online Diffusion-based Speech Enhancement with Sub-Second Latency | Bunlong Lay et.al. | 2506.02908 | null |
2025-06-03 | When Blockchain Meets Crawlers: Real-time Market Analytics in Solana NFT Markets | Chengxin Shen et.al. | 2506.02892 | null |
2025-06-03 | OpenFace 3.0: A Lightweight Multitask System for Comprehensive Facial Behavior Analysis | Jiewen Hu et.al. | 2506.02891 | null |
2025-06-03 | CLONE: Customizing LLMs for Efficient Latency-Aware Inference at the Edge | Chunlin Tian et.al. | 2506.02847 | null |
2025-06-03 | Process Mining on Distributed Data Sources | Maximilian Weisenseel et.al. | 2506.02830 | null |
2025-06-03 | Target Sensing Performance in Disaster-Specific ISAC Networks | Ahmet Burak Ozyurt et.al. | 2506.02828 | null |
2025-06-03 | AI-Driven Vehicle Condition Monitoring with Cell-Aware Edge Service Migration | Charalampos Kalalas et.al. | 2506.02785 | null |
2025-06-03 | SAMJ: Fast Image Annotation on ImageJ/Fiji via Segment Anything Model | Carlos Garcia-Lopez-de-Haro et.al. | 2506.02783 | null |
2025-06-03 | RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS | Chuanyu Fu et.al. | 2506.02751 | null |
2025-06-03 | Collective Intelligence Outperforms Individual Talent: A Case Study in League of Legends | Angelo Josey Caldeira et.al. | 2506.02706 | null |
2025-06-03 | A Pretrained Probabilistic Transformer for City-Scale Traffic Volume Prediction | Shiyu Shen et.al. | 2506.02654 | null |
2025-06-03 | From Prompts to Protection: Large Language Model-Enabled In-Context Learning for Smart Public Safety UAV | Yousef Emami et.al. | 2506.02649 | null |
2025-06-03 | Phase Topology Stability of an Optical Vortex via an Electrically Controlled Twist-Planar Oriented Liquid Crystal Fresnel Lens | Elena Melnikova et.al. | 2506.02632 | null |
2025-06-03 | HORUS: A Mixed Reality Interface for Managing Teams of Mobile Robots | Omotoye Shamsudeen Adekoya et.al. | 2506.02622 | null |
2025-06-03 | Hierarchical Question-Answering for Driving Scene Understanding Using Vision-Language Models | Safaa Abdullahi Moallim Mohamud et.al. | 2506.02615 | null |
2025-05-30 | PB&J: Peanut Butter and Joints for Damped Articulation | Avery S. Williamson et.al. | 2505.24860 | link |
2025-05-30 | Don't Reinvent the Wheel: Efficient Instruction-Following Text Embedding based on Guided Space Transformation | Yingchaojie Feng et.al. | 2505.24754 | link |
2025-05-30 | Neural Network-based Universal Formulas for Control | Pol Mestres et.al. | 2505.24744 | null |
2025-05-30 | Efficient Text Encoders for Labor Market Analysis | Jens-Joris Decorte et.al. | 2505.24640 | null |
2025-05-30 | Co-designed Quantum Discrete Adiabatic Linear System Solver Via Dynamic Circuits | Boxuan Ai et.al. | 2505.24626 | null |
2025-05-30 | Frequency-Domain Joint Monitoring of Differential Group Delay and Dependent Loss of Optical Singleand Few-Mode Fiber Channels Based on CAZAC Sequences | Linsheng Fan et.al. | 2505.24589 | null |
2025-05-30 | Fine-tuning for Data-enabled Predictive Control of Noisy Systems by Reinforcement Learning | Jinbao Wang et.al. | 2505.24572 | null |
2025-05-30 | Airborne Neural Network | Paritosh Ranjan et.al. | 2505.24513 | null |
2025-05-30 | How can AI reduce fall injuries in the workplace? | Nicholas Cartocci et.al. | 2505.24507 | null |
2025-05-30 | Enhancing the Accuracy of Spatio-Temporal Models for Wind Speed Prediction by Incorporating Bias-Corrected Crowdsourced Data | Eamonn Organ et.al. | 2505.24506 | link |
2025-05-30 | Real-time Fall Prevention system for the Next-generation of Workers | Nicholas Cartocci et.al. | 2505.24487 | null |
2025-05-30 | Boosting Automatic Exercise Evaluation Through Musculoskeletal Simulation-Based IMU Data Augmentation | Andreas Spilz et.al. | 2505.24415 | null |
2025-05-30 | SAH-Drive: A Scenario-Aware Hybrid Planner for Closed-Loop Vehicle Trajectory Generation | Yuqi Fan et.al. | 2505.24390 | link |
2025-05-30 | Spatiotemporal Analysis of Forest Machine Operations Using 3D Video Classification | Maciej Wielgosz et.al. | 2505.24375 | null |
2025-05-30 | A Novel Coronary Artery Registration Method Based on Super-pixel Particle Swarm Optimization | Peng Qi et.al. | 2505.24351 | null |
2025-05-30 | A 3D Mobile Crowdsensing Framework for Sustainable Urban Digital Twins | Taku Yamazaki et.al. | 2505.24348 | null |
2025-05-30 | DTR: Delaunay Triangulation-based Racing for Scaled Autonomous Racing | Luca Tognoni et.al. | 2505.24320 | null |
2025-05-30 | A Novel Discrete Memristor-Coupled Heterogeneous Dual-Neuron Model and Its Application in Multi-Scenario Image Encryption | Yi Zou et.al. | 2505.24294 | null |
2025-05-30 | Proactive Guidance of Multi-Turn Conversation in Industrial Search | Xiaoyu Li et.al. | 2505.24251 | null |
2025-05-30 | MOPSA: Mixture of Prompt-Experts Based Speaker Adaptation for Elderly Speech Recognition | Chengxi Deng et.al. | 2505.24224 | null |
2025-05-29 | AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views | Lihan Jiang et.al. | 2505.23716 | null |
2025-05-29 | From Connectivity to Autonomy: The Dawn of Self-Evolving Communication Systems | Zeinab Nezami et.al. | 2505.23710 | null |
2025-05-29 | Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better | Danny Driess et.al. | 2505.23705 | null |
2025-05-29 | DiCoFlex: Model-agnostic diverse counterfactuals with flexible control | Oleksii Furman et.al. | 2505.23700 | null |
2025-05-29 | Errors in Stereo Geometry Induce Distance Misperception | Raffles Xingqi Zhu et.al. | 2505.23685 | null |
2025-05-29 | Differentially Private Space-Efficient Algorithms for Counting Distinct Elements in the Turnstile Model | Rachel Cummings et.al. | 2505.23682 | null |
2025-05-29 | Performance Analysis of Wireless Communication Systems Assisted by Fluid Reconfigurable Intelligent Surfaces | Farshad Rostami Ghadi et.al. | 2505.23680 | null |
2025-05-29 | Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model | Qingyu Shi et.al. | 2505.23606 | link |
2025-05-29 | MAPLE: A Mobile Assistant with Persistent Finite State Machines for Recovery Reasoning | Linqiang Guo et.al. | 2505.23596 | null |
2025-05-29 | Scalable decoding protocols for fast transversal logic in the surface code | Mark L. Turner et.al. | 2505.23567 | null |
2025-05-29 | The CASE Framework -- A New Architecture for Participatory Research and Digital Health Surveillance | Marco Hirsch et.al. | 2505.23516 | null |
2025-05-29 | DeepFilterGAN: A Full-band Real-time Speech Enhancement System with GAN-based Stochastic Regeneration | Sanberk Serbest et.al. | 2505.23515 | null |
2025-05-29 | Agentic Robot: A Brain-Inspired Framework for Vision-Language-Action Models in Embodied Agents | Zhejian Yang et.al. | 2505.23450 | null |
2025-05-29 | Enhanced DACER Algorithm with High Diffusion Efficiency | Yinuo Wang et.al. | 2505.23426 | null |
2025-05-29 | When water phase matters: its effect on the stopping cross section for proton therapy and astrophysics | F. Matias et.al. | 2505.23396 | null |
2025-05-29 | CF-DETR: Coarse-to-Fine Transformer for Real-Time Object Detection | Woojin Shin et.al. | 2505.23317 | null |
2025-05-29 | Investigating A Geometrical Solution to the Vergence-Accommodation Conflict for Targeted Movements in Virtual Reality | Xiaoye Michael Wang et.al. | 2505.23310 | null |
2025-05-29 | MathArena: Evaluating LLMs on Uncontaminated Math Competitions | Mislav Balunović et.al. | 2505.23281 | link |
2025-05-29 | Wireless Agentic AI with Retrieval-Augmented Multimodal Semantic Perception | Guangyuan Liu et.al. | 2505.23275 | null |
2025-05-29 | Context-Aware Semantic Communication for the Wireless Networks | Guangyuan Liu et.al. | 2505.23249 | null |
2025-05-29 | The Meeseeks Mesh: Spatially Consistent 3D Adversarial Objects for BEV Detector | Aixuan Li et.al. | 2505.22499 | null |
2025-05-29 | YH-MINER: Multimodal Intelligent System for Natural Ecological Reef Metric Extraction | Mingzhuang Wang et.al. | 2505.22250 | null |
2025-05-28 | VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models | Ce Zhang et.al. | 2505.22654 | null |
2025-05-28 | VR-Based Control of Multi-Copter Operation | Jack T. Hughes et.al. | 2505.22599 | null |
2025-05-28 | A Graph-Based Laser Path Solver Algorithm for Virtual Reality Laboratory Simulations | Andreas Müller et.al. | 2505.22540 | null |
2025-05-28 | AI instructional agent improves student's perceived learner control and learning outcome: empirical evidence from a randomized controlled trial | Fei Qin et.al. | 2505.22526 | null |
2025-05-28 | CPINN-ABPI: Physics-Informed Neural Networks for Accurate Power Estimation in MPSoCs | Mohamed R. Elshamy et.al. | 2505.22469 | null |
2025-05-28 | STDR: Spatio-Temporal Decoupling for Real-Time Dynamic Scene Rendering | Zehao Li et.al. | 2505.22400 | null |
2025-05-28 | Learning to Pursue AC Optimal Power Flow Solutions with Feasibility Guarantees | Damola Ajeyemi et.al. | 2505.22399 | null |
2025-05-28 | UP-SLAM: Adaptively Structured Gaussian SLAM with Uncertainty Prediction in Dynamic Environments | Wancai Zheng et.al. | 2505.22335 | null |
2025-05-28 | Versatile Cardiovascular Signal Generation with a Unified Diffusion Transformer | Zehua Chen et.al. | 2505.22306 | null |
2025-05-28 | Two-stage Audio-Visual Target Speaker Extraction System for Real-Time Processing On Edge Device | Zixuan Li et.al. | 2505.22229 | null |
2025-05-28 | ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation | Jiawen Yu et.al. | 2505.22159 | null |
2025-05-28 | Streaming Remote rendering services: Comparison of QUIC-based and WebRTC Protocols | Daniel Mejías et.al. | 2505.22132 | null |
2025-05-28 | Real-Time Blind Defocus Deblurring for Earth Observation: The IMAGIN-e Mission Approach | Alejandro D. Mousist et.al. | 2505.22128 | null |
2025-05-28 | Leveraging 5G Physical Layer Monitoring for Adaptive Remote Rendering in XR Applications | Inhar Yeregui et.al. | 2505.22123 | null |
2025-05-28 | A simulation framework for autonomous lunar construction work | Mattias Linde et.al. | 2505.22091 | null |
2025-05-28 | High Volume Rate 3D Ultrasound Reconstruction with Diffusion Models | Tristan S. W. Stevens et.al. | 2505.22090 | null |
2025-05-28 | Cognitively-Inspired Emergent Communication via Knowledge Graphs for Assisting the Visually Impaired | Ruxiao Chen et.al. | 2505.22087 | null |
2025-05-28 | On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition | Shujie HU et.al. | 2505.22072 | null |
2025-05-27 | Visual Product Graph: Bridging Visual Products And Composite Images For End-to-End Style Recommendations | Yue Li Du et.al. | 2505.21454 | null |
2025-05-27 | Hume: Introducing System-2 Thinking in Visual-Language-Action Model | Haoming Song et.al. | 2505.21432 | null |
2025-05-27 | Autonomous Multi-Modal LLM Agents for Treatment Planning in Focused Ultrasound Ablation Surgery | Lina Zhao et.al. | 2505.21418 | null |
2025-05-27 | AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs | Xuanwen Ding et.al. | 2505.21389 | link |
2025-05-27 | A first look at ROS~2 applications written in asynchronous Rust | Martin Škoudlil et.al. | 2505.21323 | null |
2025-05-27 | Assured Autonomy with Neuro-Symbolic Perception | R. Spencer Hallyburton et.al. | 2505.21322 | null |
2025-05-27 | Data-Driven Cellular Mobility Management via Bayesian Optimization and Reinforcement Learning | Mohamed Benzaghta et.al. | 2505.21249 | null |
2025-05-27 | Towards Quantum Simulation of Meson Scattering in a Z2 Lattice Gauge Theory | Yahui Chai et.al. | 2505.21240 | null |
2025-05-27 | 3D-UIR: 3D Gaussian for Underwater 3D Scene Reconstruction via Physics-Based Appearance-Medium Decouplin | Jieyu Yuan et.al. | 2505.21238 | null |
2025-05-27 | Think Twice, Act Once: Token-Aware Compression and Action Reuse for Efficient Inference in Vision-Language-Action Models | Xudong Tan et.al. | 2505.21200 | null |
2025-05-27 | Constructive community race: full-density spiking neural network model drives neuromorphic computing | Johanna Senk et.al. | 2505.21185 | null |
2025-05-27 | Hybrid Machine Learning and Mathematical Modeling for Tumor Dynamics Prediction: Comparing SPIONs against mNP-FDG | Amit K Chattopadhyay et.al. | 2505.21094 | null |
2025-05-27 | All-optical discrete illumination-based compressed ultrafast photography | Long Cheng et.al. | 2505.21086 | null |
2025-05-27 | Modeling of Water Evaporation in Hydrogels from Aspect of Mechanical Analytics | Zehua Yu et.al. | 2505.21075 | null |
2025-05-27 | Nonreciprocal and long-range three-body interactions in Bose-Einstein condensates induced by optical feedback | Yi-Qing Zhang et.al. | 2505.21044 | null |
2025-05-27 | CityGo: Lightweight Urban Modeling and Rendering with Proxy Buildings and Residual Gaussians | Weihang Liu et.al. | 2505.21041 | null |
2025-05-27 | ClearSphere: Multi-Earphone Synergy for Enhanced Conversational Clarity | Lixing He et.al. | 2505.21004 | null |
2025-05-27 | CNN-Based Channel Map Estimation for Movable Antenna Systems | Yitai Huang et.al. | 2505.21001 | null |
2025-05-27 | SCALOFT: An Initial Approach for Situation Coverage-Based Safety Analysis of an Autonomous Aerial Drone in a Mine Environment | Nawshin Mannan Proma et.al. | 2505.20969 | null |
2025-05-27 | YOLO-FireAD: Efficient Fire Detection via Attention-Guided Inverted Residual Learning and Dual-Pooling Feature Preservation | Weichao Pan et.al. | 2505.20884 | null |
2025-05-26 | Understanding and Supporting Co-viewing Comedy in VR with Embodied Expressive Avatars | Ryo Ohara et.al. | 2505.20082 | null |
2025-05-26 | M3DHMR: Monocular 3D Hand Mesh Recovery | Yihong Lin et.al. | 2505.20058 | null |
2025-05-26 | Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion | Zheqi Lv et.al. | 2505.20053 | link |
2025-05-26 | Uncertainty-Aware Attention Heads: Efficient Unsupervised Uncertainty Quantification for LLMs | Artem Vazhentsev et.al. | 2505.20045 | null |
2025-05-26 | Optimizing edge AI models on HPC systems with the edge in the loop | Marcel Aach et.al. | 2505.19995 | link |
2025-05-26 | A Cooperative Aerial System of A Payload Drone Equipped with Dexterous Rappelling End Droid for Cluttered Space Pickup | Wenjing Ren et.al. | 2505.19980 | null |
2025-05-26 | Dynamically Learned Test-Time Model Routing in Language Model Zoos with Service Level Guarantees | Herbert Woisetschläger et.al. | 2505.19947 | null |
2025-05-26 | Weather-Magician: Reconstruction and Rendering Framework for 4D Weather Synthesis In Real Time | Chen Sang et.al. | 2505.19919 | null |
2025-05-26 | EMAC+: Embodied Multimodal Agent for Collaborative Planning with VLM+LLM | Shuang Ao et.al. | 2505.19905 | null |
2025-05-26 | Adaptive Indexing for Approximate Query Processing in Exploratory Data Analysis | Stavros Maroulis et.al. | 2505.19872 | null |
2025-05-26 | PCDCNet: A Surrogate Model for Air Quality Forecasting with Physical-Chemical Dynamics and Constraints | Shuo Wang et.al. | 2505.19842 | link |
2025-05-26 | A Cost-efficient Credit-Based Shaper Deployment Framework for Time-Sensitive Networks | Santiago Torres-Borda et.al. | 2505.19771 | null |
2025-05-26 | GeoPF: Infusing Geometry into Potential Fields for Reactive Planning in Non-trivial Environments | Yuhe Gong et.al. | 2505.19688 | null |
2025-05-26 | A Fluorescent Material Model for Non-Spectral Editing & Rendering | Belcour Laurent et.al. | 2505.19672 | null |
2025-05-26 | Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling | Haiyang Sun et.al. | 2505.19669 | null |
2025-05-26 | Autonomous Flights inside Narrow Tunnels | Luqi Wang et.al. | 2505.19657 | link |
2025-05-26 | Software Engineering for Self-Adaptive Robotics: A Research Agenda | Shaukat Ali et.al. | 2505.19629 | null |
2025-05-26 | Indoor Air Quality Detection Robot Model Based on the Internet of Things (IoT) | Anggiat Mora Simamora et.al. | 2505.19600 | link |
2025-05-26 | Situationally-Aware Dynamics Learning | Alejandro Murillo-Gonzalez et.al. | 2505.19574 | null |
2025-05-26 | LLM-Agent-Controller: A Universal Multi-Agent Large Language Model System as a Control Engineer | Rasoul Zahedifar et.al. | 2505.19567 | null |
2025-05-23 | VideoGameBench: Can Vision-Language Models complete popular video games? | Alex L. Zhang et.al. | 2505.18134 | null |
2025-05-23 | ManuSearch: Democratizing Deep Search in Large Language Models with a Transparent and Open Multi-Agent Framework | Lisheng Huang et.al. | 2505.18105 | link |
2025-05-23 | SHARDeg: A Benchmark for Skeletal Human Action Recognition in Degraded Scenarios | Simon Malzard et.al. | 2505.18048 | null |
2025-05-23 | Clip4Retrofit: Enabling Real-Time Image Labeling on Edge Devices via Cross-Architecture CLIP Distillation | Li Zhong et.al. | 2505.18039 | null |
2025-05-23 | Clinical Validation of Deep Learning for Real-Time Tissue Oxygenation Estimation Using Spectral Imaging | Jens De Winne et.al. | 2505.18010 | null |
2025-05-23 | Re-evaluation of Logical Specification in Behavioural Verification | Radoslaw Klimek et.al. | 2505.17979 | null |
2025-05-23 | Evolving Machine Learning: A Survey | Ignacio Cabrera Martin et.al. | 2505.17902 | null |
2025-05-23 | Geometric Shape Modelling and Volume Estimation of Dry Bulk Cargo Piles using a Single Image | Debanshu Ratha et.al. | 2505.17896 | null |
2025-05-23 | Toward Optimal ANC: Establishing Mutual Information Lower Bound | François Derrida et.al. | 2505.17877 | null |
2025-05-23 | Light-Driven Bound State of Interacting Impurities in a Dirac-Like Bath | Vinayak M. Kulkarni et.al. | 2505.17811 | null |
2025-05-23 | DialogXpert: Driving Intelligent and Emotion-Aware Conversations through Online Value-Based Reinforcement Learning with LLM Priors | Tazeek Bin Abdur Rakib et.al. | 2505.17795 | null |
2025-05-23 | Real-time calibrations for future detectors at FAIR | Valentin Kladov et.al. | 2505.17781 | null |
2025-05-23 | Sec5GLoc: Securing 5G Indoor Localization via Adversary-Resilient Deep Learning Architecture | Ildi Alla et.al. | 2505.17776 | link |
2025-05-23 | HRSim: An agent-based simulation platform for high-capacity ride-sharing services | Wang Chen et.al. | 2505.17758 | link |
2025-05-23 | Instruct2See: Learning to Remove Any Obstructions Across Distributions | Junhang Li et.al. | 2505.17649 | null |
2025-05-23 | MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation | Jihan Yao et.al. | 2505.17613 | null |
2025-05-23 | A Unified Multi-Scale Attention-Based Network for Automatic 3D Segmentation of Lung Parenchyma & Nodules In Thoracic CT Images | Muhammad Abdullah et.al. | 2505.17602 | link |
2025-05-23 | JELAI: Integrating AI and Learning Analytics in Jupyter Notebooks | Manuel Valle Torre et.al. | 2505.17593 | null |
2025-05-23 | Distance Estimation in Outdoor Driving Environments Using Phase-only Correlation Method with Event Cameras | Masataka Kobayashi et.al. | 2505.17582 | null |
2025-05-23 | Direct Feature Access -- Scaling Network Traffic Feature Collection to Terabit Speed | Lukas Froschauer et.al. | 2505.17573 | null |
2025-05-22 | Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models | Junjie Xiong et.al. | 2505.16957 | null |
2025-05-22 | From Reality to Virtual Worlds: The Role of Photogrammetry in Game Development | Santiago Berrezueta-Guzman et.al. | 2505.16951 | null |
2025-05-22 | Scalable and Interpretable Contextual Bandits: A Literature Review and Retail Offer Prototype | Nikola Tankovic et.al. | 2505.16918 | null |
2025-05-22 | Identifying, Evaluating, and Mitigating Risks of AI Thought Partnerships | Kerem Oktar et.al. | 2505.16899 | null |
2025-05-22 | FlashBack: Consistency Model-Accelerated Shared Autonomy | Luzhe Sun et.al. | 2505.16892 | null |
2025-05-22 | Arbor-TVB: A Novel Multi-Scale Co-Simulation Framework with a Case Study on Neural-Level Seizure Generation and Whole-Brain Propagation | Thorsten Hater et.al. | 2505.16861 | null |
2025-05-22 | Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate | Hanglei Zhang et.al. | 2505.16845 | null |
2025-05-22 | SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving | Xuesong Chen et.al. | 2505.16805 | null |
2025-05-22 | Detecting Fake News Belief via Skin and Blood Flow Signals | Gennie Nguyen et.al. | 2505.16730 | null |
2025-05-22 | SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding | Sushant Gautam et.al. | 2505.16630 | null |
2025-05-22 | Recursive Offloading for LLM Serving in Multi-tier Networks | Zhiyuan Wu et.al. | 2505.16502 | link |
2025-05-22 | Human-like Semantic Navigation for Autonomous Driving using Knowledge Representation and Large Language Models | Augusto Luis Ballardini et.al. | 2505.16498 | null |
2025-05-22 | InspectionV3: Enhancing Tobacco Quality Assessment with Deep Convolutional Neural Networks for Automated Workshop Management | Yao Wei et.al. | 2505.16485 | null |
2025-05-22 | Beyond Static Testbeds: An Interaction-Centric Agent Simulation Platform for Dynamic Recommender Systems | Song Jin et.al. | 2505.16429 | null |
2025-05-22 | Dynamic Caustics by Ultrasonically Modulated Liquid Surface | Koki Nagakura et.al. | 2505.16397 | null |
2025-05-22 | Quantum-Driven Multihead Inland Waterbody Detection With Transformer-Encoded CYGNSS Delay-Doppler Map Data | Chia-Hsiang Lin et.al. | 2505.16391 | null |
2025-05-22 | Observing dynamics of distinct structural transitions in trapped-ion clusters | Akhil Ayyadevara et.al. | 2505.16378 | null |
2025-05-22 | Multimodal Generative AI for Story Point Estimation in Software Development | Mohammad Rubyet Islam et.al. | 2505.16290 | null |
2025-05-22 | Energy Spectra of Secondary Particles Induced by Solar Energetic Proton Events and Magnetospheric Effects | A. Chilingarian et.al. | 2505.16269 | null |
2025-05-22 | Interpretable Anomaly Detection in Encrypted Traffic Using SHAP with Machine Learning Models | Kalindi Singh et.al. | 2505.16261 | null |
2025-05-21 | Direct Detection of Cosmic Walls with Paleo Detectors | Wen Yin et.al. | 2505.15764 | null |
2025-05-21 | Majorana Zero Modes in a Heterogenous Structure of Topological and Trivial Domains in FeSe $_{1-x}$Te$_x$ | Prashant Gupta et.al. | 2505.15745 | null |
2025-05-21 | iBitter-Stack: A Multi-Representation Ensemble Learning Model for Accurate Bitter Peptide Identification | Sarfraz Ahmad et.al. | 2505.15730 | link |
2025-05-21 | Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model | Ke Hu et.al. | 2505.15670 | null |
2025-05-21 | Lithium Intercalation in the Anisotropic van der Waals Magnetic Semiconductor CrSBr | Kseniia Mosina et.al. | 2505.15663 | null |
2025-05-21 | Self-powered smart contact lenses: a multidisciplinary approach to micro-scale energy and 900 MHz - 1.1 GHz bandwidth microfabricated loop antennas communication systems | Patrice Salzenstein et.al. | 2505.15593 | null |
2025-05-21 | VP Lab: a PEFT-Enabled Visual Prompting Laboratory for Semantic Segmentation | Niccolo Avogaro et.al. | 2505.15592 | null |
2025-05-21 | Decreasing Utilization of Systems with Multi-Rate Cause-Effect Chains While Reducing End-to-End Latencies | Luiz Maia et.al. | 2505.15546 | null |
2025-05-21 | Exploiting Age of Information in Network Digital Twins for AI-driven Real-Time Link Blockage Detection | Michele Zhu et.al. | 2505.15519 | null |
2025-05-21 | AI-empowered Real-Time Line-of-Sight Identification via Network Digital Twins | Michele Zhu et.al. | 2505.15478 | null |
2025-05-21 | FAV-NSS: An HIL Framework for Accelerating Validation of Automotive Network Security Strategies | Changhong Li et.al. | 2505.15393 | null |
2025-05-21 | EVA: Expressive Virtual Avatars from Multi-view Videos | Hendrik Junkawitsch et.al. | 2505.15385 | null |
2025-05-21 | Real-Time Detection of Insider Threats Using Behavioral Analytics and Deep Evidential Clustering | Anas Ali et.al. | 2505.15383 | null |
2025-05-21 | RAZER: Robust Accelerated Zero-Shot 3D Open-Vocabulary Panoptic Reconstruction with Spatio-Temporal Aggregation | Naman Patel et.al. | 2505.15373 | null |
2025-05-21 | AI vs. Human Judgment of Content Moderation: LLM-as-a-Judge and Ethics-Based Response Refusals | Stefan Pasch et.al. | 2505.15365 | null |
2025-05-21 | Human in the Loop Adaptive Optimization for Improved Time Series Forecasting | Malik Tiomoko et.al. | 2505.15354 | link |
2025-05-21 | Subgap pumping of antiferromagnetic Mott insulators: photoexcitation mechanisms and applications | Radu Andrei et.al. | 2505.15343 | null |
2025-05-21 | High-Throughput Mechanical Characterization of Giant Unilamellar Vesicles by Real-Time Deformability Cytometry | Maximilian Kloppe et.al. | 2505.15341 | null |
2025-05-21 | LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models | Qianyue Hao et.al. | 2505.15293 | null |
2025-05-21 | LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval | Zhenyu Ning et.al. | 2505.15269 | null |
2025-05-20 | Emerging Properties in Unified Multimodal Pretraining | Chaorui Deng et.al. | 2505.14683 | null |
2025-05-20 | NExT-Search: Rebuilding User Feedback Ecosystem for Generative AI Search | Sunhao Dai et.al. | 2505.14680 | null |
2025-05-20 | Beyond Words: Multimodal LLM Knows When to Speak | Zikai Liao et.al. | 2505.14654 | null |
2025-05-20 | Separatrix configurations in holomorphic flows | Nicolas Kainz et.al. | 2505.14594 | null |
2025-05-20 | Representation Learning for Semantic Alignment of Language, Audio, and Visual Modalities | Parthasaarathy Sudarsanam et.al. | 2505.14562 | null |
2025-05-20 | Automated, Cross-Layer Root Cause Analysis of 5G Video-Conferencing Quality Degradation | Fan Yi et.al. | 2505.14540 | null |
2025-05-20 | PAST: Phonetic-Acoustic Speech Tokenizer | Nadav Har-Tuv et.al. | 2505.14470 | null |
2025-05-20 | Efficient Configuration-Constrained Tube MPC via Variables Restriction and Template Selection | Filippo Badalamenti et.al. | 2505.14440 | link |
2025-05-20 | Two Empirical Studies on Audiovisual Semiotics of Uncertainty | Sita Vriend et.al. | 2505.14379 | null |
2025-05-20 | Information-optimal measurement: From fixed sampling protocols to adaptive spectroscopy | J. Schroeder et.al. | 2505.14364 | null |
2025-05-20 | Local Minima Prediction using Dynamic Bayesian Filtering for UGV Navigation in Unstructured Environments | Seung Hun Lee et.al. | 2505.14337 | null |
2025-05-20 | Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach | Umberto Cappellazzo et.al. | 2505.14336 | null |
2025-05-20 | Exploring Jailbreak Attacks on LLMs through Intent Concealment and Diversion | Tiehan Cui et.al. | 2505.14316 | null |
2025-05-20 | Timely CPU Scheduling for Computation-intensive Status Updates | Mengqiu Zhou et.al. | 2505.14307 | null |
2025-05-20 | SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors | Maheep Chaudhary et.al. | 2505.14300 | null |
2025-05-20 | AquaSignal: An Integrated Framework for Robust Underwater Acoustic Analysis | Eirini Panteli et.al. | 2505.14285 | null |
2025-05-20 | Hybrid Adaptive Modeling in Process Monitoring: Leveraging Sequence Encoders and Physics-Informed Neural Networks | Mouad Elaarabi et.al. | 2505.14252 | null |
2025-05-20 | Visual Agentic Reinforcement Fine-Tuning | Ziyu Liu et.al. | 2505.14246 | link |
2025-05-20 | Automatic Dataset Generation for Knowledge Intensive Question Answering Tasks | Sizhe Yuen et.al. | 2505.14212 | null |
2025-05-20 | Dynamic Replanning for Improved Public Transport Routing | Abdallah Abuaisha et.al. | 2505.14193 | null |
2025-05-20 | QSVM-QNN: Quantum Support Vector Machine Based Quantum Neural Network Learning Algorithm for Brain-Computer Interfacing Systems | Bikash K. Behera et.al. | 2505.14192 | null |
2025-05-20 | Gaming Strategies in European Imbalance Settlement Mechanisms | Seyed Soroush Karimi Madahi et.al. | 2505.14133 | null |
2025-05-20 | Place Recognition: A Comprehensive Review, Current Challenges and Future Directions | Zhenyu Li et.al. | 2505.14068 | link |
2025-05-19 | Quantum Hardware-in-the-Loop for Optimal Power Flow in Renewable-Integrated Power Systems | Zeynab Kaseb et.al. | 2505.13356 | null |
2025-05-19 | Approximating Global Contact-Implicit MPC via Sampling and Local Complementarity | Sharanya Venkatesh et.al. | 2505.13350 | null |
2025-05-19 | Level Generation with Quantum Reservoir Computing | João S. Ferreira et.al. | 2505.13287 | null |
2025-05-19 | MAGI-1: Autoregressive Video Generation at Scale | Sand. ai et.al. | 2505.13211 | link |
2025-05-19 | Combinatorial Sample-and Back-Focal-Plane (BFP) Imaging. Pt. I: Instrument and acquisition parameters affecting BFP images and their analysis | Omer Shavit et.al. | 2505.13190 | null |
2025-05-19 | ToolSpectrum : Towards Personalized Tool Utilization for Large Language Models | Zihao Cheng et.al. | 2505.13176 | null |
2025-05-19 | A conformally mapped numerical wave tank supporting piston and flap wavemakers | Andreas Holm Akselsen et.al. | 2505.13154 | null |
2025-05-19 | Ocean wave spectrum reconstruction from HF radar data and its application to wave height estimation | Kaede Watanabe et.al. | 2505.13132 | null |
2025-05-19 | Constraint-Aware Diffusion Guidance for Robotics: Real-Time Obstacle Avoidance for Autonomous Racing | Hao Ma et.al. | 2505.13131 | null |
2025-05-19 | Adaptive Image Restoration for Video Surveillance: A Real-Time Approach | Muhammad Awais Amin et.al. | 2505.13130 | null |
2025-05-19 | Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation | Guo Chen et.al. | 2505.13094 | null |
2025-05-19 | PPTNet: A Hybrid Periodic Pattern-Transformer Architecture for Traffic Flow Prediction and Congestion Identification | Hongrui Kou et.al. | 2505.13047 | link |
2025-05-19 | Ultrafast Laser Induces Macroscopic Symmetry-Breaking of Diamond Color Centers | Yang Gao et.al. | 2505.12989 | null |
2025-05-19 | Regularized Model Predictive Control | Komeil Nosrati et.al. | 2505.12977 | null |
2025-05-19 | Fast, Not Fancy: Rethinking G2P with Rich Data and Rule-Based Models | Mahta Fetrat Qharabagh et.al. | 2505.12973 | link |
2025-05-19 | Multiscale Adaptive Conflict-Balancing Model For Multimedia Deepfake Detection | Zihan Xiong et.al. | 2505.12966 | null |
2025-05-19 | Effects of the Auto-Correlation of Delays on the Age of Information: A Gaussian Process Framework | Atsushi Inoie et.al. | 2505.12885 | null |
2025-05-19 | Optimization of Hybrid Quantum-Classical Algorithms | Lian Remme et.al. | 2505.12853 | null |
2025-05-19 | Reasoning BO: Enhancing Bayesian Optimization with Long-Context Reasoning Power of LLMs | Zhuo Yang et.al. | 2505.12833 | null |
2025-05-19 | Rethinking Features-Fused-Pyramid-Neck for Object Detection | Hulin Li et.al. | 2505.12820 | link |
2025-05-16 | msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML | Zhaolan Huang et.al. | 2505.11483 | link |
2025-05-16 | REACT: Runtime-Enabled Active Collision-avoidance Technique for Autonomous Driving | Heye Huang et.al. | 2505.11474 | null |
2025-05-16 | Learning Multimodal AI Algorithms for Amplifying Limited User Input into High-dimensional Control Space | Ali Rabiee et.al. | 2505.11366 | link |
2025-05-16 | Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models | Keunwoo Peter Yu et.al. | 2505.11326 | link |
2025-05-16 | Time-dependent Hole States in Multiconfigurational Time-Dependent Hartree-Fock Approaches: Applications in Photoionization of Water Molecule | Zhao-Han Zhang et.al. | 2505.11319 | null |
2025-05-16 | Diffusion Learning with Partial Agent Participation and Local Updates | Elsa Rizk et.al. | 2505.11307 | null |
2025-05-16 | MTevent: A Multi-Task Event Camera Dataset for 6D Pose Estimation and Moving Object Detection | Shrutarv Awasthi et.al. | 2505.11282 | link |
2025-05-16 | Semantic Caching of Contextual Summaries for Efficient Question-Answering with Language Models | Camille Couturier et.al. | 2505.11271 | null |
2025-05-16 | Learning traffic flows: Graph Neural Networks for Metamodelling Traffic Assignment | Oskar Bohn Lassen et.al. | 2505.11230 | null |
2025-05-16 | Real-Time Verification of Embodied Reasoning for Generative Skill Acquisition | Bo Yue et.al. | 2505.11175 | null |
2025-05-16 | Maximizing Asynchronicity in Event-based Neural Networks | Haiqing Hao et.al. | 2505.11165 | null |
2025-05-16 | Sonification of entanglement dynamics in many-qubit systems | Juliette Tudoce et.al. | 2505.11159 | null |
2025-05-16 | Open-Source Multi-Viewpoint Surgical Telerobotics | Guido Caccianiga et.al. | 2505.11142 | null |
2025-05-16 | A Multi-modal Fusion Network for Terrain Perception Based on Illumination Aware | Rui Wang et.al. | 2505.11066 | null |
2025-05-16 | Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment Benchmarking | Changlun Li et.al. | 2505.11065 | link |
2025-05-16 | Leveraging Real-Time Data Analysis and Multiple Kernel Learning for Manufacturing of Innovative Steels | Wolfgang Rannetbauer et.al. | 2505.11024 | null |
2025-05-16 | DRL-Based Injection Molding Process Parameter Optimization for Adaptive and Profitable Production | Joon-Young Kim et.al. | 2505.10988 | null |
2025-05-16 | GROQLoco: Generalist and RObot-agnostic Quadruped Locomotion Control using Offline Datasets | Narayanan PP et.al. | 2505.10973 | null |
2025-05-16 | Vaiage: A Multi-Agent Solution to Personalized Travel Planning | Binwen Liu et.al. | 2505.10922 | null |
2025-05-16 | Automated Identification of Logical Errors in Programs: Advancing Scalable Analysis of Student Misconceptions | Muntasir Hoq et.al. | 2505.10913 | null |
2025-05-15 | An AI-driven framework for the prediction of personalised health response to air pollution | Nazanin Zounemat Kermani et.al. | 2505.10556 | null |
2025-05-15 | Real-Time Out-of-Distribution Failure Prevention via Multi-Modal Reasoning | Milan Ganai et.al. | 2505.10547 | null |
2025-05-15 | LibIQ: Toward Real-Time Spectrum Classification in O-RAN dApps | Filippo Olimpieri et.al. | 2505.10537 | link |
2025-05-15 | Internal State Estimation in Groups via Active Information Gathering | Xuebo Ji et.al. | 2505.10415 | null |
2025-05-15 | Two-Stage Generative Model for Intracranial Aneurysm Meshes with Morphological Marker Conditioning | Wenhao Ding et.al. | 2505.10407 | link |
2025-05-15 | Schreier-Coset Graph Propagation | Aryan Mishra et.al. | 2505.10392 | null |
2025-05-15 | Arbitrarily Small Execution-Time Certificate: What was Missed in Analog Optimization | Liang Wu et.al. | 2505.10366 | link |
2025-05-15 | FactsR: A Safer Method for Producing High Quality Healthcare Documentation | Victor Petrén Bach Hansen et.al. | 2505.10360 | null |
2025-05-15 | Optimizing Electric Bus Charging Scheduling with Uncertainties Using Hierarchical Deep Reinforcement Learning | Jiaju Qi et.al. | 2505.10296 | null |
2025-05-15 | From Questions to Clinical Recommendations: Large Language Models Driving Evidence-Based Clinical Decision Making | Dubai Li et.al. | 2505.10282 | link |
2025-05-15 | AttentionGuard: Transformer-based Misbehavior Detection for Secure Vehicular Platoons | Hexu Li et.al. | 2505.10273 | null |
2025-05-15 | Defect Detection in Photolithographic Patterns Using Deep Learning Models Trained on Synthetic Data | Prashant P. Shinde et.al. | 2505.10192 | null |
2025-05-15 | KAITIAN: A Unified Communication Framework for Enabling Efficient Collaboration Across Heterogeneous Accelerators in Embodied AI Systems | Jieke Lin et.al. | 2505.10183 | null |
2025-05-15 | Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence | Xiang He et.al. | 2505.10176 | link |
2025-05-15 | High-performance local automaton decoders for defect matching in 1D | Louis Paletta et.al. | 2505.10162 | null |
2025-05-15 | CFARNet: Learning-Based High-Resolution Multi-Target Detection for Rainbow Beam Radar | Qiushi Liang et.al. | 2505.10150 | null |
2025-05-15 | VRSplat: Fast and Robust Gaussian Splatting for Virtual Reality | Xuechang Tu et.al. | 2505.10144 | link |
2025-05-15 | IMITATE: Image Registration with Context for unknown time frame recovery | Ziad Kheil et.al. | 2505.10124 | link |
2025-05-15 | Learning Virtual Machine Scheduling in Cloud Computing through Language Agents | JieHao Wu et.al. | 2505.10117 | null |
2025-05-15 | LAV: Audio-Driven Dynamic Visual Generation with Neural Compression and StyleGAN2 | Jongmin Jung et.al. | 2505.10101 | null |
2025-05-14 | UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing | Yung-Hsuan Lai et.al. | 2505.09615 | link |
2025-05-14 | Quantum simulation of bubble nucleation across a quantum phase transition | De Luo et.al. | 2505.09607 | null |
2025-05-14 | Spec2VolCAMU-Net: A Spectrogram-to-Volume Model for EEG-to-fMRI Reconstruction based on Multi-directional Time-Frequency Convolutional Attention Encoder and Vision-Mamba U-Net | Dongyi He et.al. | 2505.09521 | link |
2025-05-14 | Wearable Tracking of Eye and Body Movements During Breaching Training: Towards Real-Time Blast Injury Monitoring | Jeremy P. Kemmerer et.al. | 2505.09508 | null |
2025-05-14 | Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput | Bo Zhang et.al. | 2505.09498 | null |
2025-05-14 | Decentralized Nonlinear Model Predictive Control-Based Flock Navigation with Real-Time Obstacle Avoidance in Unknown Obstructed Environments | Nuthasith Gerdpratoom et.al. | 2505.09434 | null |
2025-05-14 | UMotion: Uncertainty-driven Human Motion Estimation from Inertial and Ultra-wideband Units | Huakun Liu et.al. | 2505.09393 | link |
2025-05-14 | Examining Deployment and Refinement of the VIOLA-AI Intracranial Hemorrhage Model Using an Interactive NeoMedSys Platform | Qinghui Liu et.al. | 2505.09380 | link |
2025-05-14 | ARCANE -- Early Detection of Interplanetary Coronal Mass Ejections | H. T. Rüdisser et.al. | 2505.09365 | link |
2025-05-14 | APR-Transformer: Initial Pose Estimation for Localization in Complex Environments through Absolute Pose Regression | Srinivas Ravuri et.al. | 2505.09356 | link |
2025-05-14 | Neural Video Compression using 2D Gaussian Splatting | Lakshya Gupta et.al. | 2505.09324 | null |
2025-05-14 | Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging | Hongjin Qian et.al. | 2505.09316 | null |
2025-05-14 | Robot-Assisted Drone Recovery on a Wavy Surface Using Error-State Kalman Filter and Receding Horizon Model Predictive Control | Yimou Wu et.al. | 2505.09145 | null |
2025-05-14 | Quantum Error-Corrected Computation of Molecular Energies | Kentaro Yamamoto et.al. | 2505.09133 | null |
2025-05-14 | Non-equilibrium scalar fields at finite temperature and density | Sebastian Mendizabal et.al. | 2505.09104 | null |
2025-05-14 | OpenLKA: An Open Dataset of Lane Keeping Assist from Recent Car Models under Real-world Driving Conditions | Yuhang Wang et.al. | 2505.09092 | link |
2025-05-14 | Modeling Interdependent Cybersecurity Threats Using Bayesian Networks: A Case Study on In-Vehicle Infotainment Systems | Sangita Sridar et.al. | 2505.09048 | null |
2025-05-14 | RT-cache: Efficient Robot Trajectory Retrieval System | Owen Kwon et.al. | 2505.09040 | null |
2025-05-14 | Multiparty Selective Disclosure using Attribute-Based Encryption | Shigenori Ohashi et.al. | 2505.09034 | null |
2025-05-13 | Enhancing Aerial Combat Tactics through Hierarchical Multi-Agent Reinforcement Learning | Ardian Selmonaj et.al. | 2505.08995 | null |
2025-05-13 | Aya Vision: Advancing the Frontier of Multilingual Multimodality | Saurabh Dash et.al. | 2505.08751 | null |
2025-05-13 | A Study of Data-driven Methods for Inventory Optimization | Lee Yeung Ping et.al. | 2505.08673 | null |
2025-05-13 | Claycode: Stylable and Deformable 2D Scannable Codes | Marco Maida et.al. | 2505.08666 | null |
2025-05-13 | ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking | Haofeng Liu et.al. | 2505.08581 | link |
2025-05-13 | End-to-End Multi-Task Policy Learning from NMPC for Quadruped Locomotion | Anudeep Sajja et.al. | 2505.08574 | null |
2025-05-13 | Extract the Best, Discard the Rest: CSI Feedback with Offline Large AI Models | Jialin Zhuang et.al. | 2505.08566 | null |
2025-05-13 | Towards Digital Twin in Flood Forecasting with Data Assimilation Satellite Earth Observations -- A Proof-of-Concept in the Alzette Catchment | Thanh Huy Nguyen et.al. | 2505.08553 | null |
2025-05-13 | The RaspGrade Dataset: Towards Automatic Raspberry Ripeness Grading with Deep Learning | Mohamed Lamine Mekhalfi et.al. | 2505.08537 | null |
2025-05-13 | Diffusion-assisted Model Predictive Control Optimization for Power System Real-Time Operation | Linna Xu et.al. | 2505.08535 | null |
2025-05-13 | Towards Resilient SDA: Graph Theory and Cooperative Control in Distributed Network Architectures | Nesrine Benchoubane et.al. | 2505.08520 | null |
2025-05-13 | Isolation Forest in Novelty Detection Scenario | Adam Ulrich et.al. | 2505.08489 | null |
2025-05-13 | BAT: Benchmark for Auto-bidding Task | Alexandra Khirianova et.al. | 2505.08485 | link |
2025-05-13 | Large Language Models Meet Stance Detection: A Survey of Tasks, Methods, Applications, Challenges and Future Directions | Lata Pangtey et.al. | 2505.08464 | null |
2025-05-13 | Measurements of molecular size and shape on a chip | Xin Zhu et.al. | 2505.08452 | null |
2025-05-13 | Anisotropic fluctuations of momentum and angular momentum of heavy quarks in the pre-equilibrium stage of pA collisions at the LHC | Gabriele Parisi et.al. | 2505.08441 | null |
2025-05-13 | MDF: Multi-Modal Data Fusion with CNN-Based Object Detection for Enhanced Indoor Localization Using LiDAR-SLAM | Saqi Hussain Kalan et.al. | 2505.08388 | null |
2025-05-13 | FauForensics: Boosting Audio-Visual Deepfake Detection with Facial Action Units | Jian Wang et.al. | 2505.08294 | null |
2025-05-13 | Ground-based Observations of Temporal Variation of Cosmic Ray Spectrum during Forbush Decreases | W. Mitthumsiri et.al. | 2505.08248 | null |
2025-05-13 | Motion Control of High-Dimensional Musculoskeletal Systems with Hierarchical Model-Based Planning | Yunyue Wei et.al. | 2505.08238 | null |
2025-05-13 | Online differentially private inference in stochastic gradient descent | Jinhan Xie et.al. | 2505.08227 | null |
2025-05-13 | VTutor for High-Impact Tutoring at Scale: Managing Engagement and Real-Time Multi-Screen Monitoring with P2P Connections | Eason Chen et.al. | 2505.07736 | null |
2025-05-12 | MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering | Rushi Qiang et.al. | 2505.07782 | link |
2025-05-12 | Robo-Taxi Fleet Coordination with Accelerated High-Capacity Ridepooling | Xinling Li et.al. | 2505.07776 | null |
2025-05-12 | Benchmarking of CPU-intensive Stream Data Processing in The Edge Computing Systems | Tomasz Szydlo et.al. | 2505.07755 | null |
2025-05-12 | Gameplay Highlights Generation | Vignesh Edithal et.al. | 2505.07721 | null |
2025-05-12 | Hybrid Control Strategies for Safe and Adaptive Robot-Assisted Dressing | Yasmin Rafiq et.al. | 2505.07710 | null |
2025-05-12 | Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications | Biel Tura Vecino et.al. | 2505.07701 | null |
2025-05-12 | Verified Purely Functional Catenable Real-Time Deques | Jules Viennot et.al. | 2505.07681 | null |
2025-05-12 | SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models | Hang Wu et.al. | 2505.07680 | null |
2025-05-12 | Intuitive Human-Robot Interfaces Leveraging on Autonomy Features for the Control of Highly-redundant Robots | Davide Torielli et.al. | 2505.07668 | link |
2025-05-12 | Neural Brain: A Neuroscience-inspired Framework for Embodied Agents | Jian Liu et.al. | 2505.07634 | link |
2025-05-12 | Deep Learning Advances in Vision-Based Traffic Accident Anticipation: A Comprehensive Review of Methods,Datasets,and Future Directions | Yi Zhang et.al. | 2505.07611 | null |
2025-05-12 | AgentFlow: Resilient Adaptive Cloud-Edge Framework for Multi-Agent Coordination | Ching Han Chen et.al. | 2505.07603 | null |
2025-05-12 | Decoding Chess Puzzle Play and Standard Cognitive Tasks for BCI: A Low-Cost EEG Study | Matthew Russell et.al. | 2505.07592 | null |
2025-05-12 | Privacy-Preserving Real-Time Vietnamese-English Translation on iOS using Edge AI | Cong Le et.al. | 2505.07583 | null |
2025-05-12 | Superstring entanglement at finite temperature and its Hagedorn behavior | Daniel Luiz Nedel et.al. | 2505.07567 | null |
2025-05-12 | Self-Supervised Event Representations: Towards Accurate, Real-Time Perception on SoC FPGAs | Kamil Jeziorek et.al. | 2505.07556 | null |
2025-05-12 | GIFStream: 4D Gaussian-based Immersive Video with Feature Stream | Hao Li et.al. | 2505.07539 | null |
2025-05-12 | Convex Trajectory Optimization via Monomial Coordinates Transcription for Cislunar Rendezvous | Omar Regantini et.al. | 2505.07521 | null |
2025-05-12 | Lightweight Multispectral Crop-Weed Segmentation for Precision Agriculture | Zeynep Galymzhankyzy et.al. | 2505.07444 | null |
2025-05-09 | A Large Language Model-Enhanced Q-learning for Capacitated Vehicle Routing Problem with Time Windows | Linjiang Cao et.al. | 2505.06178 | null |
2025-05-09 | Estimating Quality in Therapeutic Conversations: A Multi-Dimensional Natural Language Processing Framework | Alice Rueda et.al. | 2505.06151 | null |
2025-05-09 | S2MNet: Speckle-To-Mesh Net for Three-Dimensional Cardiac Morphology Reconstruction via Echocardiogram | Xilin Gong et.al. | 2505.06105 | null |
2025-05-09 | HashKitty: Distributed Password Analysis | Pedro Antunes et.al. | 2505.06084 | link |
2025-05-09 | Centralized Decision-Making for Platooning By Using SPaT-Driven Reference Speeds | Melih Yazgan et.al. | 2505.06071 | null |
2025-05-09 | Context Informed Incremental Learning Improves Myoelectric Control Performance in Virtual Reality Object Manipulation Tasks | Gabriel Gagné et.al. | 2505.06064 | link |
2025-05-09 | Fast Differentiable Modal Simulation of Non-linear Strings, Membranes, and Plates | Rodrigo Diaz et.al. | 2505.05940 | link |
2025-05-09 | Priority-Driven Safe Model Predictive Control Approach to Autonomous Driving Applications | Francesco Prignoli et.al. | 2505.05933 | null |
2025-05-09 | Multi-armed Bandit for Stochastic Shortest Path in Mixed Autonomy | Yu Bai et.al. | 2505.05878 | null |
2025-05-09 | DaringFed: A Dynamic Bayesian Persuasion Pricing for Online Federated Learning under Two-sided Incomplete Information | Yun Xin et.al. | 2505.05842 | null |
2025-05-09 | Human-in-the-Loop AI for HVAC Management Enhancing Comfort and Energy Efficiency | Xinyu Liang et.al. | 2505.05796 | null |
2025-05-09 | Quantitative Hardness Assessment with Vision-based Tactile Sensing for Fruit Classification and Grasping | Zhongyuan Liao et.al. | 2505.05725 | null |
2025-05-08 | An Efficient Transport-Based Dissimilarity Measure for Time Series Classification under Warping Distortions | Akram Aldroubi et.al. | 2505.05676 | null |
2025-05-08 | Adaptive Stress Testing Black-Box LLM Planners | Neeloy Chakraborty et.al. | 2505.05665 | null |
2025-05-08 | UltraGauss: Ultrafast Gaussian Reconstruction of 3D Ultrasound Volumes | Mark C. Eid et.al. | 2505.05643 | null |
2025-05-08 | LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities | Kalyan Nakka et.al. | 2505.05619 | link |
2025-05-08 | Trading Under Uncertainty: A Distribution-Based Strategy for Futures Markets Using FutureQuant Transformer | Wenhao Guo et.al. | 2505.05595 | null |
2025-05-08 | Flight Validation of Learning-Based Trajectory Optimization for the Astrobee Free-Flyer | Somrita Banerjee et.al. | 2505.05588 | null |
2025-05-08 | Steepest Descent Density Control for Compact 3D Gaussian Splatting | Peihao Wang et.al. | 2505.05587 | null |
2025-05-08 | Quantum-network nodes with real-time noise mitigation using spectator qubits | S. J. H. Loenen et.al. | 2505.05582 | null |
2025-05-08 | SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation | Yonwoo Choi et.al. | 2505.05475 | link |
2025-05-08 | StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant | Haibo Wang et.al. | 2505.05467 | null |
2025-05-08 | EDmamba: A Simple yet Effective Event Denoising Method with State Space Model | Ciyu Ruan et.al. | 2505.05391 | null |
2025-05-08 | OcularAge: A Comparative Study of Iris and Periocular Images for Pediatric Age Estimation | Naveenkumar G Venkataswamy et.al. | 2505.05374 | null |
2025-05-08 | Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization | Sooyoung Park et.al. | 2505.05343 | link |
2025-05-08 | Progressive Inertial Poser: Progressive Real-Time Kinematic Chain Estimation for 3D Full-Body Pose from Three IMU Sensors | Zunjie Zhu et.al. | 2505.05336 | null |
2025-05-08 | Advanced Stock Market Prediction Using Long Short-Term Memory Networks: A Comprehensive Deep Learning Framework | Rajneesh Chaudhary et.al. | 2505.05325 | null |
2025-05-08 | SmartTrap: Automated Precision Experiments with Optical Tweezers | Martin Selin et.al. | 2505.05290 | null |
2025-05-08 | CV-MP: Max-Pressure Control in Heterogeneously Distributed and Partially Connected Vehicle Environments | Chaopeng Tan et.al. | 2505.05258 | null |
2025-05-08 | Adaptive Biased User Scheduling for Heterogeneous Wireless Federate Learning Network | Changxiang Wu et.al. | 2505.05231 | null |
2025-05-08 | PaniCar: Securing the Perception of Advanced Driving Assistance Systems Against Emergency Vehicle Lighting | Elad Feldman et.al. | 2505.05183 | null |
2025-05-08 | Multi-agent Embodied AI: Advances and Future Directions | Zhaohan Feng et.al. | 2505.05108 | null |
2025-05-08 | Pairing Real-Time Piano Transcription with Symbol-level Tracking for Precise and Robust Score Following | Silvan Peter et.al. | 2505.05078 | null |
2025-05-08 | xTrace: A Facial Expressive Behaviour Analysis Tool for Continuous Affect Recognition | Mani Kumar Tellamekala et.al. | 2505.05043 | null |
2025-05-08 | Reality-infused Deep Learning Framework via Angle-resolved Metasurfaces | Wei Chen et.al. | 2505.05011 | null |
2025-05-08 | StabStitch++: Unsupervised Online Video Stitching with Spatiotemporal Bidirectional Warps | Lang Nie et.al. | 2505.05001 | link |
2025-05-08 | Robust Model-Based In-Hand Manipulation with Integrated Real-Time Motion-Contact Planning and Tracking | Yongpeng Jiang et.al. | 2505.04978 | null |
2025-05-08 | The candidates of 2 |
Y. M. Htet et.al. | 2505.04975 | null |
2025-05-08 | AI and Vision based Autonomous Navigation of Nano-Drones in Partially-Known Environments | Mattia Sartori et.al. | 2505.04972 | null |
2025-05-08 | Real-Time Model Predictive Control of Vehicles with Convex-Polygon-Aware Collision Avoidance in Tight Spaces | Haruki Kojima et.al. | 2505.04935 | null |
2025-05-07 | EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning | Zhenghao Xing et.al. | 2505.04623 | link |
2025-05-07 | Dynamic Network Flow Optimization for Task Scheduling in PTZ Camera Surveillance Systems | Mohammad Merati et.al. | 2505.04596 | null |
2025-05-07 | Runtime Advocates: A Persona-Driven Framework for Requirements@Runtime Decision Support | Demetrius Hernandez et.al. | 2505.04551 | null |
2025-05-07 | Edge-GPU Based Face Tracking for Face Detection and Recognition Acceleration | Asma Baobaid et.al. | 2505.04524 | null |
2025-05-07 | Leveraging Simultaneous Usage of Edge GPU Hardware Engines for Video Face Detection and Recognition | Asma Baobaid et.al. | 2505.04502 | null |
2025-05-07 | Estimating Dynamic Soft Continuum Robot States From Boundaries | Tongjia Zheng et.al. | 2505.04491 | null |
2025-05-07 | "I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments | Ziyi Zhang et.al. | 2505.04488 | null |
2025-05-07 | Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration | Shigeki Karita et.al. | 2505.04457 | link |
2025-05-07 | Meta-Learning Driven Lightweight Phase Shift Compression for IRS-Assisted Wireless Systems | Xianhua Yu et.al. | 2505.04453 | null |
2025-05-07 | Phase Shift Information Compression in IRS-aided Wireless Systems: Challenges and Opportunities | Xianhua Yu et.al. | 2505.04449 | null |
2025-05-07 | SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer | Young-Hu Park et.al. | 2505.04394 | null |
2025-05-07 | Predicting Road Surface Anomalies by Visual Tracking of a Preceding Vehicle | Petr Jahoda et.al. | 2505.04392 | null |
2025-05-07 | Design and Evaluation of an NDN-Based Network for Distributed Digital Twins | Chen Chen et.al. | 2505.04326 | null |
2025-05-07 | Massive MIMO: Instantaneous versus Statistical CSI-Based Power Allocation | Zahra Mobini et.al. | 2505.04294 | null |
2025-05-07 | Integrated Airline Fleet and Crew Recovery through Local Search | Philip de Bruin et.al. | 2505.04274 | null |
2025-05-07 | RGB-Event Fusion with Self-Attention for Collision Prediction | Pietro Bonazzi et.al. | 2505.04258 | link |
2025-05-07 | Multi-Agent Reinforcement Learning-based Cooperative Autonomous Driving in Smart Intersections | Taoyuan Yu et.al. | 2505.04231 | null |
2025-05-07 | An Enhanced YOLOv8 Model for Real-Time and Accurate Pothole Detection and Measurement | Mustafa Yurdakul et.al. | 2505.04207 | null |
2025-05-07 | Spatial-Wavelength Multiplexing Reliable Photonic Integrated General-Purpose Analog Computing System | Tao Zhu et.al. | 2505.04197 | null |
2025-05-07 | Beyond Task Performance: Human Experience in Human-Robot Collaboration | Sean Kille et.al. | 2505.04182 | null |
2025-05-06 | VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model | Zuwei Long et.al. | 2505.03739 | link |
2025-05-06 | AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control | Jialong Li et.al. | 2505.03738 | null |
2025-05-06 | Frenet Corridor Planner: An Optimal Local Path Planning Framework for Autonomous Driving | Faizan M. Tariq et.al. | 2505.03695 | null |
2025-05-06 | RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration | Huajie Tan et.al. | 2505.03673 | link |
2025-05-06 | PAHA: Parts-Aware Audio-Driven Human Animation with Diffusion Model | Y. B. Wang et.al. | 2505.03603 | null |
2025-05-06 | LlamaFirewall: An open source guardrail system for building secure AI agents | Sahana Chennabasappa et.al. | 2505.03574 | null |
2025-05-06 | Real-Time Person Image Synthesis Using a Flow Matching Model | Jiwoo Jeong et.al. | 2505.03562 | link |
2025-05-06 | Rapid AI-based generation of coverage paths for dispensing applications | Simon Baeuerle et.al. | 2505.03560 | null |
2025-05-06 | Real-time small area estimation of food security in Zimbabwe: integrating mobile-phone and face-to-face surveys using joint multilevel regression and poststratification | Sahoko Ishida et.al. | 2505.03517 | link |
2025-05-06 | Learning-based Homothetic Tube MPC | Yulong Gao et.al. | 2505.03482 | link |
2025-05-06 | A generalised non-linear reconstructor for all Fourier-type wavefront sensors | Victoria Laidlaw et.al. | 2505.03477 | null |
2025-05-06 | Simulation to Reality: Testbeds and Architectures for Connected and Automated Vehicles | David Klüner et.al. | 2505.03472 | null |
2025-05-06 | Mitigating Backdoor Triggered and Targeted Data Poisoning Attacks in Voice Authentication Systems | Alireza Mohammadi et.al. | 2505.03455 | null |
2025-05-06 | Advancing Remote and Continuous Cardiovascular Patient Monitoring through a Novel and Resource-efficient IoT-Driven Framework | Sanam Nayab et.al. | 2505.03409 | null |
2025-05-06 | Quantum Feature Space of a Qubit Coupled to an Arbitrary Bath | Chris Wise et.al. | 2505.03397 | link |
2025-05-06 | DroidRetriever: An Autonomous Navigation and Information Integration System Facilitating Mobile Sensemaking | Yiheng Bian et.al. | 2505.03364 | null |
2025-05-06 | GUAVA: Generalizable Upper Body 3D Gaussian Avatar | Dongbin Zhang et.al. | 2505.03351 | null |
2025-05-06 | Artificial Behavior Intelligence: Technology, Challenges, and Future Directions | Kanghyun Jo et.al. | 2505.03315 | null |
2025-05-06 | An Active Inference perspective on Neurofeedback Training | Côme Annicchiarico et.al. | 2505.03308 | null |
2025-05-06 | Model Predictive Fuzzy Control: A Hierarchical Multi-Agent Control Architecture for Outdoor Search-and-Rescue Robots | Craig Maxwell et.al. | 2505.03257 | null |
2025-05-05 | Beyond the Monitor: Mixed Reality Visualization and AI for Enhanced Digital Pathology Workflow | Jai Prakash Veerla et.al. | 2505.02780 | link |
2025-05-05 | Teaching the social media generation: rethinking learning without sacrificing quality | Sepinoud Azimi et.al. | 2505.02770 | null |
2025-05-05 | Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play | Yemin Shi et.al. | 2505.02707 | link |
2025-05-05 | Dance of Fireworks: An Interactive Broadcast Gymnastics Training System Based on Pose Estimation | Haotian Chen et.al. | 2505.02690 | null |
2025-05-05 | Adaptive Budgeted Multi-Armed Bandits for IoT with Dynamic Resource Constraints | Shubham Vaishnav et.al. | 2505.02640 | null |
2025-05-05 | LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis | Qingkai Fang et.al. | 2505.02625 | link |
2025-05-05 | Wise Goose Chase: A Predictive Path Planning Algorithm for Dynamic Rebalancing in Ride-Hailing Systems | Avalpreet Singh Brar et.al. | 2505.02603 | null |
2025-05-05 | Maximal Compatibility Matching for Preference-Aware Ride-Hailing Systems | Avalpreet Singh Brar et.al. | 2505.02599 | null |
2025-05-05 | LiDAR-Inertial SLAM-Based Navigation and Safety-Oriented AI-Driven Control System for Skid-Steer Robots | Mehdi Heydari Shahna et.al. | 2505.02598 | null |
2025-05-05 | Spatiotemporal Non-Uniformity-Aware Online Task Scheduling in Collaborative Edge Computing for Industrial Internet of Things | Yang Li et.al. | 2505.02597 | null |
2025-05-05 | HapticVLM: VLM-Driven Texture Recognition Aimed at Intelligent Haptic Interaction | Muhammad Haris Khan et.al. | 2505.02569 | null |
2025-05-05 | Machine-Learning-Powered Neural Interfaces for Smart Prosthetics and Diagnostics | MohammadAli Shaeri et.al. | 2505.02516 | null |
2025-05-05 | ReeM: Ensemble Building Thermodynamics Model for Efficient HVAC Control via Hierarchical Reinforcement Learning | Yang Deng et.al. | 2505.02439 | null |
2025-05-05 | Towards Effective Issue Assignment using Online Machine Learning | Athanasios Michailoudis et.al. | 2505.02437 | link |
2025-05-05 | Encrypted Federated Search Using Homomorphic Encryption | Om Rathod et.al. | 2505.02409 | null |
2025-05-05 | A Real-Time Control Barrier Function-Based Safety Filter for Motion Planning with Arbitrary Road Boundary Constraints | Jianye Xu et.al. | 2505.02395 | link |
2025-05-05 | Sloshing suppression with a controlled elastic baffle via deep reinforcement learning and SPH simulation | Mai Ye et.al. | 2505.02354 | null |
2025-05-05 | Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering | Jihao Zhao et.al. | 2505.02311 | link |
2025-05-04 | RNBF: Real-Time RGB-D Based Neural Barrier Functions for Safe Robotic Navigation | Satyajeet Das et.al. | 2505.02294 | null |
2025-05-04 | Real-time Spatial Retrieval Augmented Generation for Urban Environments | David Nazareno Campo et.al. | 2505.02271 | null |
2025-05-02 | FalconWing: An Open-Source Platform for Ultra-Light Fixed-Wing Aircraft Research | Yan Miao et.al. | 2505.01383 | null |
2025-05-02 | An Efficient Real-Time Planning Method for Swarm Robotics Based on an Optimal Virtual Tube | Pengda Mao et.al. | 2505.01380 | null |
2025-05-02 | Closing the Loop: A Systematic Review of Experience-Driven Game Adaptation | Phil Lopes et.al. | 2505.01351 | null |
2025-05-02 | How much to Dereverberate? Low-Latency Single-Channel Speech Enhancement in Distant Microphone Scenarios | Satvik Venkatesh et.al. | 2505.01338 | null |
2025-05-02 | Early Detection of Patient Deterioration from Real-Time Wearable Monitoring System | Lo Pang-Yun Ting et.al. | 2505.01305 | null |
2025-05-02 | Contactless pulse rate assessment: Results and insights for application in driving simulator | Đorđe D. Nešković et.al. | 2505.01299 | null |
2025-05-02 | FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing | Gaoxiang Cong et.al. | 2505.01263 | null |
2025-05-02 | CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment | Edson Araujo et.al. | 2505.01237 | link |
2025-05-02 | Efficient Vision-based Vehicle Speed Estimation | Andrej Macko et.al. | 2505.01203 | null |
2025-05-02 | Machine learning-based prediction of species mass fraction and flame characteristics in partially premixed turbulent jet flame | Amirali Shateri et.al. | 2505.01201 | null |
2025-05-02 | AGRO: An Autonomous AI Rover for Precision Agriculture | Simar Ghumman et.al. | 2505.01200 | null |
2025-05-02 | A Secured Triad of IoT, Machine Learning, and Blockchain for Crop Forecasting in Agriculture | Najmus Sakib Sizan et.al. | 2505.01196 | null |
2025-05-02 | Fast Flow-based Visuomotor Policies via Conditional Optimal Transport Couplings | Andreas Sochopoulos et.al. | 2505.01179 | null |
2025-05-02 | Empirical Comparison of Lightweight Forecasting Models for Seasonal and Non-Seasonal Time Series | Thanh Son Nguyen et.al. | 2505.01163 | null |
2025-05-02 | Machine Learning for Physical Simulation Challenge Results and Retrospective Analysis: Power Grid Use Case | Milad Leyli-Abadi et.al. | 2505.01156 | null |
2025-05-02 | In-Situ Growth and Ionic Switching Behavior of Single-Crystalline Silver Iodide Nanoflakes | Amir Parsi et.al. | 2505.01062 | null |
2025-05-02 | Model Tensor Planning | An T. Le et.al. | 2505.01059 | link |
2025-05-02 | Kinetic roughening transition of ice crystals and its implications during recrystallization | Jorge H. Melillo et.al. | 2505.01055 | null |
2025-05-02 | Tightly Coupled Range Inertial Odometry and Mapping with Exact Point Cloud Downsampling | Kenji Koide et.al. | 2505.01017 | null |
2025-05-02 | Identifying Root Cause of bugs by Capturing Changed Code Lines with Relational Graph Neural Networks | Jiaqi Zhang et.al. | 2505.00990 | link |
2025-05-01 | A Practical Framework for Simulating Time-Resolved Spectroscopy Based on a Real-time Dyson Expansion | Cian Reeves et.al. | 2505.00667 | null |
2025-05-01 | Open-Source LLM-Driven Federated Transformer for Predictive IoV Management | Yazan Otoum et.al. | 2505.00651 | null |
2025-05-01 | Deep Learning Assisted Outer Volume Removal for Highly-Accelerated Real-Time Dynamic MRI | Merve Gülle et.al. | 2505.00643 | null |
2025-05-01 | Fully passive quantum random number generation with untrusted light | KaiWei Qiu et.al. | 2505.00636 | null |
2025-05-01 | A Novel Feature-Aware Chaotic Image Encryption Scheme For Data Security and Privacy in IoT and Edge Networks | Muhammad Shahbaz Khan et.al. | 2505.00593 | null |
2025-05-01 | Bridging Cultural and Digital Divides: A Low-Latency JackTrip Framework for Equitable Music Education in the Global South | Tiange Zhou et.al. | 2505.00550 | null |
2025-05-01 | Leveraging Partial SMILES Validation Scheme for Enhanced Drug Design in Reinforcement Learning Frameworks | Xinyu Wang et.al. | 2505.00530 | null |
2025-05-01 | UserCentrix: An Agentic Memory-augmented AI Framework for Smart Spaces | Alaa Saleh et.al. | 2505.00472 | null |
2025-05-01 | HoneyWin: High-Interaction Windows Honeypot in Enterprise Environment | Yan Lin Aung et.al. | 2505.00465 | null |
2025-05-01 | Real-Time Animatable 2DGS-Avatars with Detail Enhancement from Monocular Videos | Xia Yuan et.al. | 2505.00421 | null |
2025-05-01 | Multi-dimensional optical imaging on a chip | Liheng Bian et.al. | 2505.00408 | link |
2025-05-01 | Stealth Signals: Multi-Discriminator GANs for Covert Communications Against Diverse Wardens | Afan Ali et.al. | 2505.00399 | null |
2025-05-01 | Urban Air Mobility as a System of Systems: An LLM-Enhanced Holonic Approach | Ahmed R. Sadik et.al. | 2505.00368 | null |
2025-05-01 | Edge Large AI Models: Revolutionizing 6G Networks | Zixin Wang et.al. | 2505.00321 | null |
2025-05-01 | Avatar Communication Provides More Efficient Online Social Support Than Text Communication | Masanori Takano et.al. | 2505.00287 | null |
2025-05-01 | Empowering Agentic Video Analytics Systems with Video Language Models | Yuxuan Yan et.al. | 2505.00254 | null |
2025-05-01 | LLM-Based Threat Detection and Prevention Framework for IoT Ecosystems | Yazan Otoum et.al. | 2505.00240 | null |
2025-04-30 | Real-Time Brain-Computer Interface Control of Walking Exoskeleton with Bilateral Sensory Feedback | Jeffrey Lim et.al. | 2505.00219 | null |
2025-04-30 | PSN Game: Game-theoretic Planning via a Player Selection Network | Tianyu Qiu et.al. | 2505.00213 | null |
2025-04-30 | Generative Machine Learning in Adaptive Control of Dynamic Manufacturing Processes: A Review | Suk Ki Lee et.al. | 2505.00210 | null |
2025-04-30 | A Survey of Interactive Generative Video | Jiwen Yu et.al. | 2504.21853 | null |
2025-04-30 | Differentiable Room Acoustic Rendering with Multi-View Vision Priors | Derong Jin et.al. | 2504.21847 | null |
2025-04-30 | Why Compress What You Can Generate? When GPT-4o Generation Ushers in Image Compression Fields | Yixin Gao et.al. | 2504.21814 | null |
2025-04-30 | WebThinker: Empowering Large Reasoning Models with Deep Research Capability | Xiaoxi Li et.al. | 2504.21776 | link |
2025-04-30 | Smart Environmental Monitoring of Marine Pollution using Edge AI | Mohamed Moursi et.al. | 2504.21759 | null |
2025-04-30 | TheraQuest: A Gamified, LLM-Powered Simulation for Massage Therapy Training | Shengqian Wang et.al. | 2504.21735 | null |
2025-04-30 | MovementVR: An open-source tool for the study of motor control and learning in virtual reality | Cristina Rossi et.al. | 2504.21696 | null |
2025-04-30 | Enhancing Health Mention Classification Performance: A Study on Advancements in Parameter Efficient Tuning | Reem Abdel-Salam et.al. | 2504.21685 | null |
2025-04-30 | Effect of eccentric mixing parameters on chaotic characteristics and mixing time for viscous liquid based on sound decibels | Ronfgang Wang et.al. | 2504.21621 | null |
2025-04-30 | Real Time Semantic Segmentation of High Resolution Automotive LiDAR Scans | Hannes Reichert et.al. | 2504.21602 | link |
2025-04-30 | Real-time Program Evaluation using Anytime-valid Rank Tests | Sam van Meer et.al. | 2504.21595 | null |
2025-04-30 | Toward Realization of Low-Altitude Economy Networks: Core Architecture, Integrated Technologies, and Future Directions | Yixian Wang et.al. | 2504.21583 | null |
2025-04-30 | Scientific Workflow Scheduling in Cloud Considering Cold Start and Variable Pricing Model | Suvarthi Sarkar et.al. | 2504.21536 | null |
2025-04-30 | Efficient Conversational Search via Topical Locality in Dense Retrieval | Cristina Ioana Muntean et.al. | 2504.21507 | link |
2025-04-30 | Turning a Disposable Bronchoscope into a Dynamic Speckle Imaging Tool: Yes, It Works | Aurélien plyer et.al. | 2504.21469 | null |
2025-04-30 | Integration of a Synthetic Molecular Motor Into a Rotary DNA Nanostructure: A Framework for Single-Molecule Actuation | Seham Helmi et.al. | 2504.21434 | null |
2025-04-30 | Enhanced Semi-Supervised Stamping Process Monitoring with Physically-Informed Feature Extraction | Jianyu Zhang et.al. | 2504.21389 | null |
2025-04-30 | DGFNet: End-to-End Audio-Visual Source Separation Based on Dynamic Gating Fusion | Yinfeng Yu et.al. | 2504.21366 | null |
2025-04-30 | ImaginateAR: AI-Assisted In-Situ Authoring in Augmented Reality | Jaewook Lee et.al. | 2504.21360 | null |
2025-04-30 | Generative QoE Modeling: A Lightweight Approach for Telecom Networks | Vinti Nayar et.al. | 2504.21353 | null |
2025-04-29 | Real-Time Wayfinding Assistant for Blind and Low-Vision Users | Dabbrata Das et.al. | 2504.20976 | null |
2025-04-29 | SVD Based Least Squares for X-Ray Pneumonia Classification Using Deep Features | Mete Erdogan et.al. | 2504.20970 | null |
2025-04-29 | AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security | Zikui Cai et.al. | 2504.20965 | link |
2025-04-29 | Mìmir: A real-time interactive visualization library for CUDA programs | Francisco Carter et.al. | 2504.20937 | null |
2025-04-29 | SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings | Florian Vahl et.al. | 2504.20808 | null |
2025-04-29 | Integrating Human Feedback into a Reinforcement Learning-Based Framework for Adaptive User Interfaces | Daniel Gaspar-Figueiredo et.al. | 2504.20782 | null |
2025-04-29 | An Online Cross-layered Defense Strategy with Bandwidth Allocation for Multi-channel Systems under DoS Attacks | Liheng Wan et.al. | 2504.20762 | null |
2025-04-29 | Confidence-based Intent Prediction for Teleoperation in Bimanual Robotic Suturing | Zhaoyang Jacopo Hu et.al. | 2504.20761 | null |
2025-04-29 | Graph-Based Fault Diagnosis for Rotating Machinery: Adaptive Segmentation and Structural Feature Integration | Moirangthem Tiken Singh et.al. | 2504.20756 | null |
2025-04-29 | Formal and Empirical Study of Metadata-Based Profiling for Resource Management in the Computing Continuum | Andrea Morichetta et.al. | 2504.20740 | link |
2025-04-29 | Intelligent Task Offloading in VANETs: A Hybrid AI-Driven Approach for Low-Latency and Energy Efficiency | Tariq Qayyum et.al. | 2504.20735 | null |
2025-04-29 | A High-Granularity Proton CT Enhanced by Track Discrimination | Huang-Chao Shi et.al. | 2504.20698 | null |
2025-04-29 | Efficient Listener: Dyadic Facial Motion Synthesis via Action Diffusion | Zesheng Wang et.al. | 2504.20685 | null |
2025-04-29 | Quantum Computation for Jets in Heavy Ion Collisions | Wenyang Qian et.al. | 2504.20683 | null |
2025-04-29 | FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection | Yao Xiao et.al. | 2504.20670 | null |
2025-04-29 | Quantum-Enhanced Hybrid Reinforcement Learning Framework for Dynamic Path Planning in Autonomous Systems | Sahil Tomar et.al. | 2504.20660 | null |
2025-04-29 | PaRT: Enhancing Proactive Social Chatbots with Personalized Real-Time Retrieval | Zihan Niu et.al. | 2504.20624 | null |
2025-04-29 | Information Retrieval in the Age of Generative AI: The RGB Model | Michele Garetto et.al. | 2504.20610 | link |
2025-04-29 | WakeLoc: An Ultra-Low Power, Accurate and Scalable On-Demand RTLS using Wake-Up Radios | Silvano Cortesi et.al. | 2504.20545 | null |
2025-04-29 | Digital Twin-Empowered Cooperative Autonomous Car-sharing Services: Proof-of-Concept | Kazuma Nonomura et.al. | 2504.20542 | null |
2025-04-29 | Feelbert: A Feedback Linearization-based Embedded Real-Time Quadrupedal Locomotion Framework | Aristide Emanuele Casucci et.al. | 2504.19965 | null |
2025-04-28 | Learning Streaming Video Representation via Multitask Training | Yibin Yan et.al. | 2504.20041 | null |
2025-04-28 | HJRNO: Hamilton-Jacobi Reachability with Neural Operators | Yankai Li et.al. | 2504.19989 | null |
2025-04-28 | Real-Time Imitation of Human Head Motions, Blinks and Emotions by Nao Robot: A Closed-Loop Approach | Keyhan Rayati et.al. | 2504.19985 | null |
2025-04-28 | Shopformer: Transformer-Based Framework for Detecting Shoplifting via Human Pose | Narges Rashvand et.al. | 2504.19970 | null |
2025-04-28 | Automated decision-making for dynamic task assignment at scale | Riccardo Lo Bianco et.al. | 2504.19933 | link |
2025-04-28 | NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks | Chia-Yu Hung et.al. | 2504.19854 | null |
2025-04-28 | Optimizing the Charging of Open Quantum Batteries using Long Short-Term Memory-Driven Reinforcement Learning | Shadab Zakavati et.al. | 2504.19840 | null |
2025-04-28 | Optimal real-time dynamic treatment regimes with application to oxytocin use in preventing postpartum hemorrhage | Haiyan Zhu et.al. | 2504.19831 | null |
2025-04-28 | Digital Twin-based Out-of-Distribution Detection in Autonomous Vessels | Erblin Isaku et.al. | 2504.19816 | null |
2025-04-28 | Contrastive Language-Image Learning with Augmented Textual Prompts for 3D/4D FER Using Vision-Language Model | Muzammil Behzad et.al. | 2504.19739 | null |
2025-04-28 | The ATLAS of Traffic Lights: A Reliable Perception Framework for Autonomous Driving | Rupert Polley et.al. | 2504.19722 | null |
2025-04-28 | Advances in Approximate Bayesian Inference for Models in Epidemiology | Xiahui Li et.al. | 2504.19698 | null |
2025-04-28 | GPA-RAM: Grasp-Pretraining Augmented Robotic Attention Mamba for Spatial Task Learning | Juyi Sheng et.al. | 2504.19683 | null |
2025-04-28 | Neuronal correlations shape the scaling behavior of memory capacity and nonlinear computational capability of recurrent neural networks | Shotaro Takasu et.al. | 2504.19657 | null |
2025-04-28 | Transformation & Translation Occupancy Grid Mapping: 2-Dimensional Deep Learning Refined SLAM | Leon Davies et.al. | 2504.19654 | null |
2025-04-28 | GAN-SLAM: Real-Time GAN Aided Floor Plan Creation Through SLAM | Leon Davies et.al. | 2504.19653 | null |
2025-04-28 | Robot Motion Planning using One-Step Diffusion with Noise-Optimized Approximate Motions | Tomoharu Aizu et.al. | 2504.19652 | null |
2025-04-28 | QFDNN: A Resource-Efficient Variational Quantum Feature Deep Neural Networks for Fraud Detection and Loan Prediction | Subham Das et.al. | 2504.19632 | null |
2025-04-28 | ARMOR: Adaptive Meshing with Reinforcement Optimization for Real-time 3D Monitoring in Unexposed Scenes | Yizhe Zhang et.al. | 2504.19624 | null |
2025-04-25 | Automating Nanoindentation: Optimizing Workflows for Precision and Accuracy | Vivek Chawla et.al. | 2504.18525 | null |
2025-04-25 | Online Distributed Queue Length Estimation | Aditya Bhaskara et.al. | 2504.18503 | null |
2025-04-25 | A Taylor Series Approach to Correction of Input Errors in Gaussian Process Regression | Muzaffar Qureshi et.al. | 2504.18463 | null |
2025-04-25 | Enhancing Strawberry Yield Forecasting with Backcasted IoT Sensor Data and Machine Learning | Tewodros Alemu Ayall et.al. | 2504.18451 | null |
2025-04-25 | Online learning to accelerate nonlinear PDE solvers: applied to multiphase porous media flow | Vinicius L S Silva et.al. | 2504.18414 | null |
2025-04-25 | Virial theorem for rigidly rotating matter | Sourav Dey et.al. | 2504.18388 | null |
2025-04-25 | Renewable-Colocated Green Hydrogen Production: Optimal Scheduling and Profitability | Siying Li et.al. | 2504.18368 | null |
2025-04-25 | SSD-Poser: Avatar Pose Estimation with State Space Duality from Sparse Observations | Shuting Zhao et.al. | 2504.18332 | null |
2025-04-25 | STP4D: Spatio-Temporal-Prompt Consistent Modeling for Text-to-4D Gaussian Splatting | Yunze Deng et.al. | 2504.18318 | null |
2025-04-25 | Design and Evaluation of a UGV-Based Robotic Platform for Precision Soil Moisture Remote Sensing | Ilektra Tsimpidi et.al. | 2504.18284 | null |
2025-04-25 | Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator | Minjae Kang et.al. | 2504.18283 | null |
2025-04-25 | SecCityVR: Visualization and Collaborative Exploration of Software Vulnerabilities in Virtual Reality | Dennis Wüppelman et.al. | 2504.18238 | null |
2025-04-25 | Time and Frequency Domain-based Anomaly Detection in Smart Meter Data for Distribution Network Studies | Petar Labura et.al. | 2504.18231 | null |
2025-04-25 | Sampling-Based Grasp and Collision Prediction for Assisted Teleoperation | Simon Manschitz et.al. | 2504.18186 | null |
2025-04-25 | PerfCam: Digital Twinning for Production Lines Using 3D Gaussian Splatting and Vision Models | Michel Gokan Khan et.al. | 2504.18165 | link |
2025-04-25 | Evaluation of Distimation's Real-world Performance on a Superconducting Quantum Computer | Hikaru Yokomori et.al. | 2504.18141 | null |
2025-04-25 | Study on Real-Time Road Surface Reconstruction Using Stereo Vision | Deepak Ghimire et.al. | 2504.18112 | null |
2025-04-25 | Teleportation-based Speed Meter for Precision Measurement | Yohei Nishino et.al. | 2504.18111 | null |
2025-04-25 | Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation | Weipeng Tan et.al. | 2504.18087 | null |
2025-04-25 | Phonon-Assisted Radiative Lifetimes and Exciton Dynamics from First Principles | Chunhao Guo et.al. | 2504.18071 | null |
2025-04-24 | Replay to Remember: Retaining Domain Knowledge in Streaming Language Models | Sneh Pillai et.al. | 2504.17780 | null |
2025-04-24 | Disaggregated Deep Learning via In-Physics Computing at Radio Frequency | Zhihui Gao et.al. | 2504.17752 | null |
2025-04-24 | BIM-Constrained Optimization for Accurate Localization and Deviation Correction in Construction Monitoring | Asier Bikandi et.al. | 2504.17693 | null |
2025-04-24 | Optimized Cloud Resource Allocation Using Genetic Algorithms for Energy Efficiency and QoS Assurance | Caroline Panggabean et.al. | 2504.17675 | null |
2025-04-24 | Unifying Complementarity Constraints and Control Barrier Functions for Safe Whole-Body Robot Control | Rafael I. Cabral Muchacho et.al. | 2504.17647 | null |
2025-04-24 | Beyond Labels: Zero-Shot Diabetic Foot Ulcer Wound Segmentation with Self-attention Diffusion Models and the Potential for Text-Guided Customization | Abderrachid Hamrani et.al. | 2504.17628 | null |
2025-04-24 | TSUE: A Two-Stage Data Update Method for an Erasure Coded Cluster File System | Zheng Wei et.al. | 2504.17598 | null |
2025-04-24 | RGB-D Tracking via Hierarchical Modality Aggregation and Distribution Network | Boyue Xu et.al. | 2504.17595 | null |
2025-04-24 | A Multi-Agent, Laxity-Based Aggregation Strategy for Cost-Effective Electric Vehicle Charging and Local Transformer Overload Prevention | Kristoffer Christensen et.al. | 2504.17575 | null |
2025-04-24 | Flying through cluttered and dynamic environments with LiDAR | Huajie Wu et.al. | 2504.17569 | null |
2025-04-24 | IRA: Adaptive Interest-aware Representation and Alignment for Personalized Multi-interest Retrieval | Youngjune Lee et.al. | 2504.17529 | null |
2025-04-24 | Adaptive Orchestration of Modular Generative Information Access Systems | Mohanna Hoveyda et.al. | 2504.17454 | link |
2025-04-24 | Storing and Querying Evolving Graphs in NoSQL Storage Models | Alexandros Spitalas et.al. | 2504.17438 | null |
2025-04-24 | StereoMamba: Real-time and Robust Intraoperative Stereo Disparity Estimation via Long-range Spatial Dependencies | Xu Wang et.al. | 2504.17401 | null |
2025-04-24 | Inverse-Designed Metasurfaces for Wavefront Restoration in Under-Display Camera Systems | Jaegang Jo et.al. | 2504.17368 | null |
2025-04-24 | TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos | Linli Yao et.al. | 2504.17343 | link |
2025-04-24 | Bridging Optical Sensing and Wearable Health Monitoring: A Functionalized Plasmonic Nanopillar for Non-Invasive Sweat Glucose Detection | Ling Liu et.al. | 2504.17339 | null |
2025-04-24 | EdgePoint2: Compact Descriptors for Superior Efficiency and Accuracy | Haodi Yao et.al. | 2504.17280 | null |
2025-04-24 | MV-Crafter: An Intelligent System for Music-guided Video Generation | Chuer Chen et.al. | 2504.17267 | null |
2025-04-24 | Symbolic Representation for Any-to-Any Generative Tasks | Jiaqi Chen et.al. | 2504.17261 | null |
2025-04-24 | Fast Online Adaptive Neural MPC via Meta-Learning | Yu Mei et.al. | 2504.16369 | link |
2025-04-23 | Meta-Learning Online Dynamics Model Adaptation in Off-Road Autonomous Driving | Jacob Levy et.al. | 2504.16923 | null |
2025-04-23 | An Accelerated Camera 3DMA Framework for Efficient Urban GNSS Multipath Estimation | Shiyao Lv et.al. | 2504.16906 | null |
2025-04-23 | Reconfigurable Intelligent Surface Control for a Moving Receiver | Hamed Radpour et.al. | 2504.16874 | null |
2025-04-23 | Graph2Nav: 3D Object-Relation Graph Generation to Robot Navigation | Tixiao Shan et.al. | 2504.16782 | null |
2025-04-23 | Evaluation Framework for AI Systems in "the Wild" | Sarah Jabbour et.al. | 2504.16778 | null |
2025-04-23 | Deep photonic reservoir computer for nonlinear equalization of 16-level quadrature amplitude modulation signals | Rui-Qian Li et.al. | 2504.16769 | null |
2025-04-23 | Beating the break-even point with autonomous quantum error correction | Yi Li et.al. | 2504.16746 | null |
2025-04-23 | PP-Tac: Paper Picking Using Tactile Feedback in Dexterous Robotic Hands | Pei Lin et.al. | 2504.16649 | null |
2025-04-23 | Bridging Econometrics and AI: VaR Estimation via Reinforcement Learning and GARCH Models | Fredy Pokou et.al. | 2504.16635 | null |
2025-04-23 | Data-Assimilated Model-Based Reinforcement Learning for Partially Observed Chaotic Flows | Defne E. Ozan et.al. | 2504.16588 | null |
2025-04-23 | PsyCounAssist: A Full-Cycle AI-Powered Psychological Counseling Assistant System | Xianghe Liu et.al. | 2504.16573 | null |
2025-04-23 | A Collaborative Intrusion Detection System Using Snort IDS Nodes | Tom Davies et.al. | 2504.16550 | null |
2025-04-23 | 6G EdgeAI: Performance Evaluation and Analysis | Chien-Sheng Yang et.al. | 2504.16529 | null |
2025-04-23 | Intelligent Depression Prevention via LLM-Based Dialogue Analysis: Overcoming the Limitations of Scale-Dependent Diagnosis through Precise Emotional Pattern Recognition | Zhenguang Zhong et.al. | 2504.16504 | null |
2025-04-23 | FeedQUAC: Quick Unobtrusive AI-Generated Commentary | Tao Long et.al. | 2504.16416 | null |
2025-04-23 | Circinus: Efficient Query Planner for Compound ML Serving | Banruo Liu et.al. | 2504.16397 | null |
2025-04-23 | Fast and Modular Whole-Body Lagrangian Dynamics of Legged Robots with Changing Morphology | Sahand Farghdani et.al. | 2504.16383 | null |
2025-04-23 | SILM: A Subjective Intent Based Low-Latency Framework for Multiple Traffic Participants Joint Trajectory Prediction | Qu Weiming et.al. | 2504.16377 | null |
2025-04-23 | Revisiting Radar Camera Alignment by Contrastive Learning for 3D Object Detection | Linhua Kong et.al. | 2504.16368 | null |
2025-04-22 | PRIME: Fast Primal-Dual Feedback Optimization for Markets with Application to Optimal Power Flow | Nicholas Julian Behr et.al. | 2504.16048 | link |
2025-04-22 | A Comparative and Measurement-Based Study on Real-Time Network KPI Extraction Methods for 5G and Beyond Applications | Batuhan Kaplan et.al. | 2504.16039 | null |
2025-04-22 | LLMs meet Federated Learning for Scalable and Secure IoT Management | Yazan Otoum et.al. | 2504.16032 | null |
2025-04-22 | LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale | Joya Chen et.al. | 2504.16030 | null |
2025-04-22 | A UAV-Aided Digital Twin Framework for IoT Networks with High Accuracy and Synchronization | Ghofran Khalaf et.al. | 2504.15967 | null |
2025-04-22 | FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation | Zebin Yao et.al. | 2504.15958 | link |
2025-04-22 | Monocular inspection of spacecraft under illumination constraints and avoidance regions | Tochukwu Elijah Ogri et.al. | 2504.15954 | null |
2025-04-22 | Real-time raw signal genomic analysis using fully integrated memristor hardware | Peiyi He et.al. | 2504.15934 | link |
2025-04-22 | Learning the Spoofability of Limit Order Books With Interpretable Probabilistic Neural Networks | Timothée Fabre et.al. | 2504.15908 | null |
2025-04-22 | RaSCL: Radar to Satellite Crossview Localization | Blerim Abdullai et.al. | 2504.15899 | null |
2025-04-22 | An Extended Horizon Tactical Decision-Making for Automated Driving Based on Monte Carlo Tree Search | Karim Essalmi et.al. | 2504.15869 | null |
2025-04-22 | Adaptive PCA-Based Outlier Detection for Multi-Feature Time Series in Space Missions | Jonah Ekelund et.al. | 2504.15846 | null |
2025-04-22 | Characterization and ex vivo application of flexible 2D scintillating coatings in ultra-high dose rate electron beams for FLASH radiotherapy | Verdi Vanreusel et.al. | 2504.15824 | null |
2025-04-22 | Microstructure and Manipulation: Quantifying Pump-and-Dump Dynamics in Cryptocurrency Markets | Mahya Karbalaii et.al. | 2504.15790 | null |
2025-04-22 | Enhancing Tennis Training with Real-Time Swing Data Visualisation in Immersive Virtual Reality | Ryan Najami et.al. | 2504.15746 | null |
2025-04-22 | You Sense Only Once Beneath: Ultra-Light Real-Time Underwater Object Detection | Jun Dong et.al. | 2504.15694 | null |
2025-04-22 | Comparative Analysis of Evolutionary Algorithms for Energy-Aware Production Scheduling | Sascha C Burmeister et.al. | 2504.15672 | null |
2025-04-22 | Symbolic Runtime Verification and Adaptive Decision-Making for Robot-Assisted Dressing | Yasmin Rafiq et.al. | 2504.15666 | null |
2025-04-22 | Neural Kinematic Bases for Fluids | Yibo Liu et.al. | 2504.15657 | null |
2025-04-22 | A Vision-Enabled Prosthetic Hand for Children with Upper Limb Disabilities | Md Abdul Baset Sarker et.al. | 2504.15654 | null |
2025-04-21 | StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians | Cailin Zhuang et.al. | 2504.15281 | null |
2025-04-21 | DRAWER: Digital Reconstruction and Articulation With Environment Realism | Hongchi Xia et.al. | 2504.15278 | null |
2025-04-21 | Impulsive pattern recognition of a myoelectric hand via Dynamic Time Warping | Mustafa Can Kadilar et.al. | 2504.15256 | null |
2025-04-21 | Scalable Discrete Event Simulation Tool for Large-Scale Cyber-Physical Energy Systems: Advancing System Efficiency and Scalability | Khandaker Akramul Haque et.al. | 2504.15198 | null |
2025-04-21 | Time-Series Analysis on Edge-AI Hardware for Healthcare Monitoring | Jinhai Hu et.al. | 2504.15178 | null |
2025-04-21 | Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture | Meng Cui et.al. | 2504.15171 | null |
2025-04-21 | Neural ATTF: A Scalable Solution to Lifelong Multi-Agent Path Planning | Kushal Shah et.al. | 2504.15130 | null |
2025-04-21 | A General Infrastructure and Workflow for Quadrotor Deep Reinforcement Learning and Reality Deployment | Kangyao Huang et.al. | 2504.15129 | null |
2025-04-21 | Robust and Real-time Surface Normal Estimation from Stereo Disparities using Affine Transformations | Csongor Csanad Kariko et.al. | 2504.15121 | null |
2025-04-21 | Muon Imaging of Hydrotreatment Towers | Rafael Armando Martínez-Rivero et.al. | 2504.15103 | null |
2025-04-21 | NeuGaze: Reshaping the future BCI | Yiqian Yang et.al. | 2504.15101 | link |
2025-04-21 | VistaDepth: Frequency Modulation With Bias Reweighting For Enhanced Long-Range Depth Estimation | Mingxia Zhan et.al. | 2504.15095 | null |
2025-04-21 | Reconfiguration and Real-Time Operation of Networked Microgrids Under Load Uncertainty | Hannah Moring et.al. | 2504.15084 | null |
2025-04-21 | Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides | Jinghua Zhao et.al. | 2504.15066 | null |
2025-04-21 | Beyond Terabit/s Integrated Neuromorphic Photonic Processor for DSP-Free Optical Interconnects | Benshan Wang et.al. | 2504.15044 | null |
2025-04-21 | Dual Utilization of Perturbation for Stream Data Publication under Local Differential Privacy | Rong Du et.al. | 2504.14993 | null |
2025-04-21 | 3D Gaussian Head Avatars with Expressive Dynamic Appearances by Compact Tensorial Representations | Yating Wang et.al. | 2504.14967 | null |
2025-04-21 | Dynamic Graph-Like Learning with Contrastive Clustering on Temporally-Factored Ship Motion Data for Imbalanced Sea State Estimation in Autonomous Vessel | Kexin Wang et.al. | 2504.14907 | null |
2025-04-21 | Distributed Time-Varying Gaussian Regression via Kalman Filtering | Nicola Taddei et.al. | 2504.14900 | link |
2025-04-21 | Physics-Aware Compression of Plasma Distribution Functions with GPU-Accelerated Gaussian Mixture Models | Andong Hu et.al. | 2504.14897 | null |
2025-04-18 | ChatNekoHacker: Real-Time Fan Engagement with Conversational Agents | Takuya Sera et.al. | 2504.13793 | null |
2025-04-18 | Equi-Euler GraphNet: An Equivariant, Temporal-Dynamics Informed Graph Neural Network for Dual Force and Trajectory Prediction in Multi-Body Systems | Vinay Sharma et.al. | 2504.13768 | null |
2025-04-18 | Realizing string breaking dynamics in a |
Constantia Alexandrou et.al. | 2504.13760 | null |
2025-04-18 | Intelligent Interaction Strategies for Context-Aware Cognitive Augmentation | Xiangrong et.al. | 2504.13684 | null |
2025-04-18 | Lightweight LiDAR-Camera 3D Dynamic Object Detection and Multi-Class Trajectory Prediction | Yushen He et.al. | 2504.13647 | link |
2025-04-18 | SupResDiffGAN a new approach for the Super-Resolution task | Dawid Kopeć et.al. | 2504.13622 | null |
2025-04-18 | FocusTrack: A Self-Adaptive Local Sampling Algorithm for Efficient Anti-UAV Tracking | Ying Wang et.al. | 2504.13604 | link |
2025-04-18 | Memristive chaotic circuit for information processing through time | Manuel Escudero et.al. | 2504.13600 | null |
2025-04-18 | RAG Without the Lag: Interactive Debugging for Retrieval-Augmented Generation Pipelines | Quentin Romero Lauro et.al. | 2504.13587 | null |
2025-04-18 | Estimating constraints on cosmological parameters via the canonical and the differential redshift drift with SKA HI 21-cm observations | Jiangang Kang et.al. | 2504.13583 | null |
2025-04-18 | MAAM: A Lightweight Multi-Agent Aggregation Module for Efficient Image Classification Based on the MindSpore Framework | Zhenkai Qin et.al. | 2504.13574 | null |
2025-04-18 | Enhancing Multilingual Sentiment Analysis with Explainability for Sinhala, English, and Code-Mixed Content | Azmarah Rizvi et.al. | 2504.13545 | null |
2025-04-18 | Can Local Representation Alignment RNNs Solve Temporal Tasks? | Nikolay Manchev et.al. | 2504.13531 | null |
2025-04-18 | Neural Ganglion Sensors: Learning Task-specific Event Cameras Inspired by the Neural Circuit of the Human Retina | Haley M. So et.al. | 2504.13457 | null |
2025-04-18 | RT-HDIST: Ray-Tracing Core-based Hausdorff Distance Computation | YoungWoo Kim et.al. | 2504.13436 | null |
2025-04-18 | POET: Supporting Prompting Creativity and Personalization with Automated Expansion of Text-to-Image Generation | Evans Xu Han et.al. | 2504.13392 | null |
2025-04-17 | Multi-Sensor Fusion-Based Mobile Manipulator Remote Control for Intelligent Smart Home Assistance | Xiao Jin et.al. | 2504.13370 | null |
2025-04-17 | AI-Empowered Integrated Sensing and Communications | Mojtaba Vaezi et.al. | 2504.13363 | null |
2025-04-17 | Physical Reservoir Computing in Hook-Shaped Rover Wheel Spokes for Real-Time Terrain Identification | Xiao Jin et.al. | 2504.13348 | null |
2025-04-17 | Adaptive AI decision interface for autonomous electronic material discovery | Yahao Dai et.al. | 2504.13344 | null |
2025-04-17 | Should We Tailor the Talk? Understanding the Impact of Conversational Styles on Preference Elicitation in Conversational Recommender Systems | Ivica Kostric et.al. | 2504.13095 | link |
2025-04-17 | EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance | Yang Yue et.al. | 2504.13065 | link |
2025-04-17 | Pose and Facial Expression Transfer by using StyleGAN | Petr Jahoda et.al. | 2504.13021 | null |
2025-04-17 | GSAC: Leveraging Gaussian Splatting for Photorealistic Avatar Creation with Unity Integration | Rendong Zhang et.al. | 2504.12999 | link |
2025-04-17 | New Frontiers in Muon-Spin Spectroscopy Using Si-Pixel Detectors | Heiko Augustin et.al. | 2504.12993 | null |
2025-04-17 | Efficient Chebyshev Reconstruction for the Anisotropic Equilibrium Model in Magnetic Particle Imaging | Christine Droigk et.al. | 2504.12981 | null |
2025-04-17 | Real-time High-fidelity Gaussian Human Avatars with Position-based Interpolation of Spatially Distributed MLPs | Youyi Zhan et.al. | 2504.12909 | link |
2025-04-17 | Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation | Yuyang Li et.al. | 2504.12908 | link |
2025-04-17 | Market-Driven Flexibility Provision: A Tri-Level Optimization Approach for Carbon Reduction | Shijie Pan et.al. | 2504.12877 | null |
2025-04-17 | AAA-Gaussians: Anti-Aliased and Artifact-Free 3D Gaussian Rendering | Michael Steiner et.al. | 2504.12811 | null |
2025-04-17 | Distributed Intelligent Sensing and Communications for 6G: Architecture and Use Cases | Kyriakos Stylianopoulos et.al. | 2504.12765 | null |
2025-04-17 | Biasing the Driving Style of an Artificial Race Driver for Online Time-Optimal Maneuver Planning | Sebastiano Taddei et.al. | 2504.12744 | null |
2025-04-17 | Chinese-Vicuna: A Chinese Instruction-following Llama-based Model | Chenghao Fan et.al. | 2504.12737 | null |
2025-04-17 | Incorporating a Deep Neural Network into Moving Horizon Estimation for Embedded Thermal Torque Derating of an Electric Machine | Alexander Winkler et.al. | 2504.12736 | null |
2025-04-17 | Embodied Neuromorphic Control Applied on a 7-DOF Robotic Manipulator | Ziqi Wang et.al. | 2504.12702 | link |
2025-04-17 | Predicting Driver's Perceived Risk: a Model Based on Semi-Supervised Learning Strategy | Siwei Huang et.al. | 2504.12665 | null |
2025-04-17 | Autonomous Drone for Dynamic Smoke Plume Tracking | Srijan Kumar Pal et.al. | 2504.12664 | null |
2025-04-17 | AdaptoVision: A Multi-Resolution Image Recognition Model for Robust and Scalable Classification | Md. Sanaullah Chowdhury Lameya Sabrin et.al. | 2504.12652 | null |
2025-04-17 | Observation of the Axion quasiparticle in 2D MnBi $_2$Te$_4$ | Jian-Xiang Qiu et.al. | 2504.12572 | null |
2025-04-17 | Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions | Yifei Dong et.al. | 2504.11967 | null |
2025-04-17 | Real-Time Reconstruction of Ground Motion During Small Magnitude Earthquakes: A Pilot Study | Youngkyu Kim et.al. | 2504.11752 | null |
2025-04-16 | Decision-based AI Visual Navigation for Cardiac Ultrasounds | Andy Dimnaku et.al. | 2504.12535 | null |
2025-04-16 | SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians | Liam Schoneveld et.al. | 2504.12292 | null |
2025-04-16 | An Evaluation of N-Gram Selection Strategies for Regular Expression Indexing in Contemporary Text Analysis Tasks | Ling Zhang et.al. | 2504.12251 | link |
2025-04-16 | Data Assimilation for Robust UQ Within Agent-Based Simulation on HPC Systems | Adam Spannaus et.al. | 2504.12228 | null |
2025-04-16 | Deep Generative Models for Bayesian Inference on High-Rate Sensor Data: Applications in Automotive Radar and Medical Imaging | Tristan S. W. Stevens et.al. | 2504.12154 | null |
2025-04-16 | GripMap: An Efficient, Spatially Resolved Constraint Framework for Offline and Online Trajectory Planning in Autonomous Racing | Frederik Werner et.al. | 2504.12115 | null |
2025-04-16 | Self-Supervised Traversability Learning with Online Prototype Adaptation for Off-Road Autonomous Driving | Yafeng Bu et.al. | 2504.12109 | null |
2025-04-16 | A Review of YOLOv12: Attention-Based Enhancements vs. Previous Versions | Rahima Khanam et.al. | 2504.11995 | null |
2025-04-16 | The Evolution of Zero Trust Architecture (ZTA) from Concept to Implementation | Md Nasiruzzaman et.al. | 2504.11984 | null |
2025-04-16 | Flow Intelligence: Robust Feature Matching via Temporal Signature Correlation | Jie Wang et.al. | 2504.11949 | null |
2025-04-16 | Mind2Matter: Creating 3D Models from EEG Signals | Xia Deng et.al. | 2504.11936 | link |
2025-04-16 | Broadening Participation through Physical Computing: Replicating Sensor-Based Programming Workshops for Rural Students in Sri Lanka | Poornima Meegammana et.al. | 2504.11913 | null |
2025-04-16 | Trajectory Dispersion Control for Precision Landing Guidance of Reusable Rockets | Xinglun Chen et.al. | 2504.11894 | null |
2025-04-16 | Real-Time Shape Estimation of Tensegrity Structures Using Strut Inclination Angles | Tufail Ahmad Bhat et.al. | 2504.11868 | null |
2025-04-16 | Network-Integrated Decoding System for Real-Time Quantum Error Correction with Lattice Surgery | Namitha Liyanage et.al. | 2504.11805 | null |
2025-04-16 | TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion | Yiran Wang et.al. | 2504.11773 | null |
2025-04-16 | Polarisation-Inclusive Spiking Neural Networks for Real-Time RFI Detection in Modern Radio Telescopes | Nicholas J. Pritchard et.al. | 2504.11720 | link |
2025-04-16 | A New Paradigm of User-Centric Wireless Communication Driven by Large Language Models | Kuiyuan Ding et.al. | 2504.11696 | null |
2025-04-16 | 3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians | Zeming Wei et.al. | 2504.11218 | link |
2025-04-16 | Efficient Distributed Retrieval-Augmented Generation for Enhancing Language Model Performance | Shangyu Liu et.al. | 2504.11197 | null |
2025-04-16 | A Real-time Anomaly Detection Method for Robots based on a Flexible and Sparse Latent Space | Taewook Kang et.al. | 2504.11170 | null |
2025-04-15 | Real-time Object and Event Detection Service through Computer Vision and Edge Computing | Marcos Mendes et.al. | 2504.11662 | null |
2025-04-15 | TextArena | Leon Guertler et.al. | 2504.11442 | link |
2025-04-15 | Predicting Wave Dynamics using Deep Learning with Multistep Integration Inspired Attention and Physics-Based Loss Decomposition | Indu Kant Deo et.al. | 2504.11433 | null |
2025-04-15 | HeatSense: Intelligent Thermal Anomaly Detection for Securing NoC-Enabled MPSoCs | Mahdi Hasanzadeh et.al. | 2504.11421 | null |
2025-04-15 | Sensitivity Analysis of State Space Models for Scrap Composition Estimation in EAF and BOF | Yiqing Zhou et.al. | 2504.11319 | null |
2025-04-15 | Hybrid Compton-PET Imaging for ion-range verification:A Preclinical Study for Proton-, Helium-, and Carbon-Therapy at HIT | Javier Balibrea-Correa et.al. | 2504.11273 | null |
2025-04-15 | Enhanced Small Target Detection via Multi-Modal Fusion and Attention Mechanisms: A YOLOv5 Approach | Xiaoxiao Ma et.al. | 2504.11262 | null |
2025-04-15 | Focal Split: Untethered Snapshot Depth from Differential Defocus | Junjie Luo et.al. | 2504.11202 | null |
2025-04-15 | QAMA: Quantum annealing multi-head attention operator with classical deep learning framework | Peng Du et.al. | 2504.11083 | null |
2025-04-15 | Intraoperative perfusion assessment by continuous, low-latency hyperspectral light-field imaging: development, methodology, and clinical application | Stefan Kray et.al. | 2504.10953 | null |
2025-04-15 | A Signal Matrix-Based Local Flaw Detection Framework for Steel Wire Ropes Using Convolutional Neural Networks | Siyu You et.al. | 2504.10952 | null |
2025-04-15 | Design and Verification of a Synchronus First In First Out (FIFO) | Yatheeswar Penta et.al. | 2504.10901 | null |
2025-04-15 | ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic Grasping | Shun Iwase et.al. | 2504.10857 | null |
2025-04-15 | Real-Time Word-Level Temporal Segmentation in Streaming Speech Recognition | Naoto Nishida et.al. | 2504.10849 | null |
2025-04-15 | LightFormer: A lightweight and efficient decoder for remote sensing image segmentation | Sihang Chen et.al. | 2504.10834 | null |
2025-04-15 | Hallucination-Aware Generative Pretrained Transformer for Cooperative Aerial Mobility Control | Hyojun Ahn et.al. | 2504.10831 | null |
2025-04-15 | SonicSieve: Bringing Directional Speech Extraction to Smartphones Using Acoustic Microstructures | Kuang Yuan et.al. | 2504.10793 | null |
2025-04-15 | ATLASv2: LLM-Guided Adaptive Landmark Acquisition and Navigation on the Edge | Mikolaj Walczak et.al. | 2504.10784 | null |
2025-04-15 | Diversity-Fair Online Selection | Ming Hu et.al. | 2504.10389 | null |
2025-04-15 | LL-Gaussian: Low-Light Scene Reconstruction and Enhancement via Gaussian Splatting for Novel View Synthesis | Hao Sun et.al. | 2504.10331 | null |
2025-04-15 | WildLive: Near Real-time Visual Wildlife Tracking onboard UAVs | Nguyen Ngoc Dat et.al. | 2504.10165 | null |
2025-04-15 | TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models | Jaewoo Lee et.al. | 2504.09897 | link |
2025-04-14 | DNF-Avatar: Distilling Neural Fields for Real-time Animatable Avatar Relighting | Zeren Jiang et.al. | 2504.10486 | link |
2025-04-14 | HybridCollab: Unifying In-Person and Remote Collaboration for Cardiovascular Surgical Planning in Mobile Augmented Reality | Pratham Darrpan Mehta et.al. | 2504.10440 | null |
2025-04-14 | Towards Low-Latency Event-based Obstacle Avoidance on a FPGA-Drone | Pietro Bonazzi et.al. | 2504.10400 | link |
2025-04-14 | Patch and Shuffle: A Preprocessing Technique for Texture Classification in Autonomous Cementitious Fabrication | Jeremiah Giordani et.al. | 2504.10353 | null |
2025-04-14 | SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model | Zongcan Ding et.al. | 2504.10320 | null |
2025-04-14 | CAT: A Conditional Adaptation Tailor for Efficient and Effective Instance-Specific Pansharpening on Real-World Data | Tianyu Xin et.al. | 2504.10242 | null |
2025-04-14 | ROSFD: Robust Online Streaming Fraud Detection with Resilience to Concept Drift in Data Streams | Vivek Yelleti et.al. | 2504.10229 | null |
2025-04-14 | Unleashing Expert Opinion from Social Media for Stock Prediction | Wanyun Zhou et.al. | 2504.10078 | link |
2025-04-14 | DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction | Kiana Hoshanfar et.al. | 2504.10070 | null |
2025-04-14 | Time for Timed Monitorability | Thomas M. Grosen et.al. | 2504.10008 | null |
2025-04-14 | VR MRI Training for Adolescents: A Comparative Study of Gamified VR, Passive VR, 360 Video, and Traditional Educational Video | Yue Yang et.al. | 2504.09955 | null |
2025-04-14 | Efficient Task-specific Conditional Diffusion Policies: Shortcut Model Acceleration and SO(3) Optimization | Haiyong Yu et.al. | 2504.09927 | null |
2025-04-14 | Fusing Bluetooth with Pedestrian Dead Reckoning: A Floor Plan-Assisted Positioning Approach | Wenxuan Pan et.al. | 2504.09905 | null |
2025-04-14 | LiteTracker: Leveraging Temporal Causality for Accurate Low-latency Tissue Tracking | Mert Asim Karaoglu et.al. | 2504.09904 | null |
2025-04-14 | MCBlock: Boosting Neural Radiance Field Training Speed by MCTS-based Dynamic-Resolution Ray Sampling | Yunpeng Tan et.al. | 2504.09878 | null |
2025-04-14 | CKMImageNet: A Dataset for AI-Based Channel Knowledge Map Towards Environment-Aware Communication and Sensing | Zijian Wu et.al. | 2504.09849 | null |
2025-04-14 | RINGO: Real-time Navigation with a Guiding Trajectory for Aerial Manipulators in Unknown Environments | Zhaopeng Zhang et.al. | 2504.08338 | null |
2025-04-11 | TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning | Hang Ni et.al. | 2504.08694 | null |
2025-04-11 | Safe Flow Matching: Robot Motion Planning with Control Barrier Functions | Xiaobing Dai et.al. | 2504.08661 | null |
2025-04-11 | TinyCenterSpeed: Efficient Center-Based Object Detection for Autonomous Racing | Neil Reichlin et.al. | 2504.08655 | link |
2025-04-11 | Enhancing Neutrino Reconstruction in Water-Cherenkov Air Shower Arrays Using Multi-Photosensors | J. Alvarez-Muñiz et.al. | 2504.08652 | null |
2025-04-11 | TorchFX: A modern approach to Audio DSP with PyTorch and GPU acceleration | Matteo Spanio et.al. | 2504.08624 | link |
2025-04-11 | Enterprise-Grade Security for the Model Context Protocol (MCP): Frameworks and Mitigation Strategies | Vineeth Sai Narajala et.al. | 2504.08623 | null |
2025-04-11 | FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment | Sebastián Barbas Laina et.al. | 2504.08603 | null |
2025-04-11 | POD-Based Sparse Stochastic Estimation of Wind Turbine Blade Vibrations | Lorenzo Schena et.al. | 2504.08505 | null |
2025-04-11 | AI-Driven Smart Sportswear for Real-Time Fitness Monitoring Using Textile Strain Sensors | Chenyu Tang et.al. | 2504.08500 | null |
2025-04-11 | A Comparative Study of Recommender Systems under Big Data Constraints | Arimondo Scrivano et.al. | 2504.08457 | null |
2025-04-11 | Muon-Accelerated Attention Distillation for Real-Time Edge Synthesis via Optimized Latent Diffusion | Weiye Chen et.al. | 2504.08451 | link |
2025-04-11 | The Composite Visual-Laser Navigation Method Applied in Indoor Poultry Farming Environments | Jiafan Lu et.al. | 2504.08431 | null |
2025-04-11 | Light-YOLOv8-Flame: A Lightweight High-Performance Flame Detection Algorithm | Jiawei Lan et.al. | 2504.08389 | null |
2025-04-11 | MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft | Junliang Guo et.al. | 2504.08388 | null |
2025-04-11 | PCA-RAG: Principal Component Analysis for Efficient Retrieval-Augmented Generation | Arman Khaledian et.al. | 2504.08386 | null |
2025-04-11 | DRIP: DRop unImportant data Points -- Enhancing Machine Learning Efficiency with Grad-CAM-Based Real-Time Data Prioritization for On-Device Training | Marcus Rüb et.al. | 2504.08364 | null |
2025-04-11 | Trabant: A Serverless Architecture for Multi-Tenant Orbital Edge Computing | Tobias Pfandzelter et.al. | 2504.08337 | link |
2025-04-11 | Towards a Digital Twin of Noisy Quantum Computers: Calibration-Driven Emulation of Transmon Qubits | Ronny Müller et.al. | 2504.08313 | null |
2025-04-11 | Gigabit-rate Quantum Key Distribution on Integrated Photonic Chips | Si Qi Ng et.al. | 2504.08298 | null |
2025-04-10 | A Review of HPC-Accelerated CFD in National Security and Defense | James Afful et.al. | 2504.07837 | null |
2025-04-10 | A Hybrid Semantic RAN Protocol Stack Design for 6G System and Its Implementation | Luhan wang et.al. | 2504.07829 | null |
2025-04-10 | MMLA: Multi-Environment, Multi-Species, Low-Altitude Aerial Footage Dataset | Jenna Kline et.al. | 2504.07744 | null |
2025-04-10 | A Novel Deep Learning Approach for Emulating Computationally Expensive Postfire Debris Flows | Palak Patel et.al. | 2504.07736 | null |
2025-04-10 | Finite-temperature real-time properties of magnetic polarons in two-dimensional quantum antiferromagnets | Toni Guthardt et.al. | 2504.07715 | null |
2025-04-10 | Heart Failure Prediction using Modal Decomposition and Masked Autoencoders for Scarce Echocardiography Databases | Andrés Bell-Navas et.al. | 2504.07606 | link |
2025-04-10 | Tuning chirality amplitude at ultrafast timescales | Hiroki Ueda et.al. | 2504.07599 | null |
2025-04-10 | MUFFLER: Secure Tor Traffic Obfuscation with Dynamic Connection Shuffling and Splitting | Minjae Seo et.al. | 2504.07543 | null |
2025-04-10 | Intelligent DoS and DDoS Detection: A Hybrid GRU-NTM Approach to Network Security | Caroline Panggabean et.al. | 2504.07478 | null |
2025-04-10 | Nonlinear Optimal Guidance for Intercepting Moving Targets | Han Wang et.al. | 2504.07430 | null |
2025-04-10 | ThermoStereoRT: Thermal Stereo Matching in Real Time via Knowledge Distillation and Attention-based Refinement | Anning Hu et.al. | 2504.07418 | null |
2025-04-10 | WK-Pnet: FM-Based Positioning via Wavelet Packet Decomposition and Knowledge Distillation | Shilian Zheng et.al. | 2504.07399 | null |
2025-04-10 | MicroNAS: An Automated Framework for Developing a Fall Detection System | Seyed Mojtaba Mohasel et.al. | 2504.07397 | null |
2025-04-09 | CiMBA: Accelerating Genome Sequencing through On-Device Basecalling via Compute-in-Memory | William Andrew Simon et.al. | 2504.07298 | null |
2025-04-09 | Data-Enabled Neighboring Extremal: Case Study on Model-Free Trajectory Tracking for Robotic Arm | Amin Vahidi-Moghaddam et.al. | 2504.07292 | null |
2025-04-09 | Enabling Continuous 5G Connectivity in Aircraft through Low Earth Orbit Satellites | Raúl Parada et.al. | 2504.07262 | null |
2025-04-09 | Visual-Aware Speech Recognition for Noisy Scenarios | Lakshmipathi Balaji et.al. | 2504.07229 | null |
2025-04-09 | Discovery of extreme Quasi-Periodic Eruptions in a newly accreting massive black hole | Lorena Hernández-García et.al. | 2504.07169 | null |
2025-04-09 | OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens | Jiacheng Liu et.al. | 2504.07096 | null |
2025-04-09 | FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution | Gene Chou et.al. | 2504.07093 | link |
2025-04-09 | Cerebral blood flow monitoring using a deep learning implementation of the two-layer DCS analytical model with a 512 512 SPAD array | Mingliang Pan et.al. | 2504.06997 | null |
2025-04-09 | Audio-visual Event Localization on Portrait Mode Short Videos | Wuyang Liu et.al. | 2504.06884 | null |
2025-04-09 | Determining Fetal Orientations From Blind Sweep Ultrasound Video | Jakub Maciej Wiśniewski et.al. | 2504.06836 | null |
2025-04-09 | Integrated Sensing and Communications Over the Years: An Evolution Perspective | Di Zhang et.al. | 2504.06830 | null |
2025-04-09 | SVG-IR: Spatially-Varying Gaussian Splatting for Inverse Rendering | Hanxiao Sun et.al. | 2504.06815 | link |
2025-04-09 | Modeling and analysis methods for early detection of leakage points in gas transmission systems | Ilgar Aliyev et.al. | 2504.06809 | null |
2025-04-09 | How do Copilot Suggestions Impact Developers' Frustration and Productivity? | Emanuela Guglielmi et.al. | 2504.06808 | null |
2025-04-09 | Controllable Automatic Foley Artist | Roi Benita et.al. | 2504.06778 | link |
2025-04-09 | Bridging Research and Standardization: Innovations and Methodology for 6G Standard Contributions | Francesca Conserva et.al. | 2504.06682 | null |
2025-04-09 | Dynamic Residual Safe Reinforcement Learning for Multi-Agent Safety-Critical Scenarios Decision-Making | Kaifeng Wang et.al. | 2504.06670 | null |
2025-04-09 | Robust and Noise-resilient Long-Term Prediction of Spatiotemporal Data Using Variational Mode Graph Neural Networks with 3D Attention | Osama Ahmad et.al. | 2504.06660 | null |
2025-04-09 | InstantSticker: Realistic Decal Blending via Disentangled Object Reconstruction | Yi Zhang et.al. | 2504.06620 | null |
2025-04-09 | InteractRank: Personalized Web-Scale Search Pre-Ranking with Cross Interaction Features | Sujay Khandagale et.al. | 2504.06609 | link |
2025-04-09 | Overcoming Dynamic Environments: A Hybrid Approach to Motion Planning for Manipulators | Ho Minh Quang Ngo et.al. | 2504.06596 | null |
2025-04-09 | NAPER: Fault Protection for Real-Time Resource-Constrained Deep Neural Networks | Rian Adam Rajagede et.al. | 2504.06591 | null |
2025-04-09 | A Streamable Neural Audio Codec with Residual Scalar-Vector Quantization for Real-Time Communication | Xiao-Hang Jiang et.al. | 2504.06561 | link |
2025-04-09 | ICPS: Real-Time Resource Configuration for Cloud Serverless Functions Considering Affinity | Long Chen et.al. | 2504.06512 | null |
2025-04-09 | Equivalent Circuit Modeling of a Lumped-element Loaded Metasurface under Arbitrary Incidence and Polarization | Athanasios Nousiou et.al. | 2504.06501 | null |
2025-04-08 | A Case for Network-wide Orchestration of Host-based Intrusion Detection and Response | Mark Timmons et.al. | 2504.06241 | null |
2025-04-08 | Accessible and Pedagogically-Grounded Explainability for Human-Robot Interaction: A Framework Based on UDL and Symbolic Interfaces | Francisco J. Rodríguez Lera et.al. | 2504.06189 | link |
2025-04-08 | Efficient algorithms to solve atom reconfiguration problems. III. The bird and batching algorithms and other parallel implementations on GPUs | Fouad Afiouni et.al. | 2504.06182 | null |
2025-04-08 | Real-Time Pitch/F0 Detection Using Spectrogram Images and Convolutional Neural Networks | Xufang Zhao et.al. | 2504.06165 | null |
2025-04-08 | Accelerating Vehicle Routing via AI-Initialized Genetic Algorithms | Ido Greenberg et.al. | 2504.06126 | null |
2025-04-08 | Safe Interaction via Monte Carlo Linear-Quadratic Games | Benjamin A. Christie et.al. | 2504.06124 | link |
2025-04-08 | A Robust Real-Time Lane Detection Method with Fog-Enhanced Feature Fusion for Foggy Conditions | Ronghui Zhang et.al. | 2504.06121 | null |
2025-04-08 | Real-Time LaCAM | Runzhe Liang et.al. | 2504.06091 | null |
2025-04-08 | Paul Kinsler et.al. | 2504.06056 | null | |
2025-04-08 | Modular Soft Wearable Glove for Real-Time Gesture Recognition and Dynamic 3D Shape Reconstruction | Huazhi Dong et.al. | 2504.05983 | null |
2025-04-08 | An Empirical Study of GPT-4o Image Generation Capabilities | Sixiang Chen et.al. | 2504.05979 | link |
2025-04-08 | Context-aware Rate Adaptation for Predictive Flying Networks using Contextual Bandits | Ruben Queiros et.al. | 2504.05964 | null |
2025-04-08 | Hybrid Control as a Proxy for Detection and Mitigation of Sensor Attacks in Cooperative Driving | Mischa Huisman et.al. | 2504.05958 | link |
2025-04-08 | InstructMPC: A Human-LLM-in-the-Loop Framework for Context-Aware Control | Ruixiang Wu et.al. | 2504.05946 | null |
2025-04-08 | Réduire le bruit grâce à la réalité augmentée sonore -- Auditory Concealer | Clara Boukhemia et.al. | 2504.05847 | null |
2025-04-08 | Negotiating Strict Latency Limits for Dynamic Real-Time Services in Vehicular Time-Sensitive Networks | Timo Häckel et.al. | 2504.05793 | null |
2025-04-08 | Residual U-Net for accurate and efficient prediction of hemodynamics in two-dimensional asymmetric stenosis | Xintong Zou et.al. | 2504.05778 | null |
2025-04-08 | A Lightweight Multi-Module Fusion Approach for Korean Character Recognition | Inho Jake Park et.al. | 2504.05770 | null |
2025-04-08 | Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation | Zhihua Xu et.al. | 2504.05746 | null |
2025-04-08 | Micro-splatting: Maximizing Isotropic Constraints for Refined Optimization in 3D Gaussian Splatting | Jee Won Lee et.al. | 2504.05740 | null |
2025-04-08 | REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning | Jihyun Lee et.al. | 2504.04956 | null |
2025-04-07 | Using Physiological Measures, Gaze, and Facial Expressions to Model Human Trust in a Robot Partner | Haley N. Green et.al. | 2504.05291 | null |
2025-04-07 | RobustDexGrasp: Robust Dexterous Grasping of General Objects from Single-view Perception | Hui Zhang et.al. | 2504.05287 | null |
2025-04-07 | A Telecentric Offset Reflective Imaging System (TORIS) for Terahertz Imaging and Spectroscopy | Pouyan Rezapoor et.al. | 2504.05267 | null |
2025-04-07 | From Sparse Signal to Smooth Motion: Real-Time Motion Generation with Rolling Prediction Models | German Barquero et.al. | 2504.05265 | null |
2025-04-07 | Vision-Language Model Predictive Control for Manipulation Planning and Trajectory Generation | Jiaming Chen et.al. | 2504.05225 | link |
2025-04-07 | LLM-Alignment Live-Streaming Recommendation | Yueyang Liu et.al. | 2504.05217 | null |
2025-04-07 | Post-Training Language Models for Continual Relation Extraction | Sefika Efeoglu et.al. | 2504.05214 | null |
2025-04-07 | Stereo-LiDAR Fusion by Semi-Global Matching With Discrete Disparity-Matching Cost and Semidensification | Yasuhiro Yao et.al. | 2504.05148 | link |
2025-04-07 | Decentralized Semantic Federated Learning for Real-Time Public Safety Tasks: Challenges, Methods, and Directions | Baosheng Li et.al. | 2504.05107 | null |
2025-04-07 | SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation | Stephen Brade et.al. | 2504.05106 | null |
2025-04-07 | Speech-to-Trajectory: Learning Human-Like Verbal Guidance for Robot Motion | Eran Beeri Bamani et.al. | 2504.05084 | null |
2025-04-07 | AI-Driven Tactical Communications and Networking for Defense: A Survey and Emerging Trends | Victor Monzon Baeza et.al. | 2504.05071 | null |
2025-04-07 | SILVIA: Ultra-precision formation flying demonstration for space-based interferometry | Takahiro Ito et.al. | 2504.05001 | null |
2025-04-07 | Transforming Future Data Center Operations and Management via Physical AI | Zhiwei Cao et.al. | 2504.04982 | null |
2025-04-07 | Boosting Relational Deep Learning with Pretrained Tabular Models | Veronica Lachi et.al. | 2504.04934 | link |
2025-04-07 | Real-time tuneable bright bonding plasmonic modes in Ga nanostructures | Renu Raman Sahu et.al. | 2504.04922 | null |
2025-04-07 | Parallelization is All System Identification Needs: End-to-end Vibration Diagnostics on a multi-core RISC-V edge device | Amirhossein Kiamarzi et.al. | 2504.04884 | null |
2025-04-07 | Closed-Loop Neural Operator-Based Observer of Traffic Density | Alice Harting et.al. | 2504.04873 | null |
2025-04-07 | Embracing Dynamics: Dynamics-aware 4D Gaussian Splatting SLAM | Zhicong Sun et.al. | 2504.04844 | link |
2025-04-04 | CAMINO: Cloud-native Autonomous Management and Intent-based Orchestrator | Konstantinos Antonakoglou et.al. | 2504.03586 | null |
2025-04-04 | The building blocks of software work explain coding careers and language popularity | Xiangnan Feng et.al. | 2504.03581 | null |
2025-04-04 | Online Traffic Density Estimation using Physics-Informed Neural Networks | Dennis Wilkman et.al. | 2504.03483 | null |
2025-04-04 | DML-RAM: Deep Multimodal Learning Framework for Robotic Arm Manipulation using Pre-trained Models | Sathish Kumar et.al. | 2504.03423 | null |
2025-04-04 | NeRFlex: Resource-aware Real-time High-quality Rendering of Complex Scenes on Mobile Devices | Zhe Wang et.al. | 2504.03415 | null |
2025-04-04 | An Efficient GPU-based Implementation for Noise Robust Sound Source Localization | Zirui Lin et.al. | 2504.03373 | null |
2025-04-04 | Stance-Driven Multimodal Controlled Statement Generation: New Dataset and Task | Bingqian Wang et.al. | 2504.03295 | null |
2025-04-04 | Mitigating the Impact of Electrode Shift on Classification Performance in Electromyography-Based Motion Prediction Using Sliding-Window Normalization | Taichi Tanaka et.al. | 2504.03196 | null |
2025-04-04 | Real-Time Roadway Obstacle Detection for Electric Scooters Using Deep Learning and Multi-Sensor Fusion | Zeyang Zheng et.al. | 2504.03171 | link |
2025-04-04 | Water Mapping and Change Detection Using Time Series Derived from the Continuous Monitoring of Land Disturbance Algorithm | Huong Pham et.al. | 2504.03170 | null |
2025-04-04 | Performance-Aware Control of Modular Batteries For Fast Frequency Response | Yutong He et.al. | 2504.03150 | null |
2025-04-04 | A Human Digital Twin Architecture for Knowledge-based Interactions and Context-Aware Conversations | Abdul Mannan Mohammed et.al. | 2504.03147 | null |
2025-04-04 | Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation | Fa-Ting Hong et.al. | 2504.02542 | link |
2025-04-03 | Compressing 3D Gaussian Splatting by Noise-Substituted Vector Quantization | Haishan Wang et.al. | 2504.03059 | link |
2025-04-03 | Cooperative Inference for Real-Time 3D Human Pose Estimation in Multi-Device Edge Networks | Hyun-Ho Choi et.al. | 2504.03052 | link |
2025-04-03 | Emotion Recognition Using Convolutional Neural Networks | Shaoyuan Xu et.al. | 2504.03010 | null |
2025-04-03 | Generating Diverse Audio-Visual 360 Soundscapes for Sound Event Localization and Detection | Adrian S. Roman et.al. | 2504.02988 | link |
2025-04-03 | Level Up Peer Review in Education: Investigating genAI-driven Gamification system and its influence on Peer Feedback Effectiveness | Rafal Wlodarski et.al. | 2504.02962 | null |
2025-04-03 | LiDAR-based Object Detection with Real-time Voice Specifications | Anurag Kulkarni et.al. | 2504.02920 | link |
2025-04-03 | Bubbles in a box: Eliminating edge nucleation in cold-atom simulators of vacuum decay | Alexander C. Jenkins et.al. | 2504.02829 | null |
2025-04-03 | Dynamic Directional Routing of Freight in the Physical Internet | Sahrish Jaleel Shaikh et.al. | 2504.02722 | null |
2025-04-03 | UAV-Assisted 5G Networks: Mobility-Aware 3D Trajectory Optimization and Resource Allocation for Dynamic Environments | Asad Mahmood et.al. | 2504.02613 | null |
2025-04-03 | Human-Centered Development of an Explainable AI Framework for Real-Time Surgical Risk Surveillance | Andrea E Davidson et.al. | 2504.02551 | null |
2025-04-03 | Online Multivariate Regularized Distributional Regression for High-dimensional Probabilistic Electricity Price Forecasting | Simon Hirsch et.al. | 2504.02518 | link |
2025-04-03 | Industrial Internet Robot Collaboration System and Edge Computing Optimization | Qian Zuo et.al. | 2504.02492 | null |
2025-04-03 | Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision | Xiaofeng Han et.al. | 2504.02477 | null |
2025-04-03 | MonoGS++: Fast and Accurate Monocular RGB Gaussian SLAM | Renwu Li et.al. | 2504.02437 | null |
2025-04-03 | OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication | Zhongjian Wang et.al. | 2504.02433 | null |
2025-04-03 | **Life |