HumanAIGC Research Papers

Updated on 2025.06.26

Table of Contents

Talking Face
Image Animation
Video Generation
TryOn
Visual Edit
Others
Music2Dance and Co-speech
Speech and Interaction

Talking Face

Publish Date	Title	Authors	PDF	Code
2025-06-24	Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router	Yubo Huang et.al.	2506.19833	null
2025-06-23	OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation	Qijun Gan et.al.	2506.18866	null
2025-06-17	SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting	Ziqiao Peng et.al.	2506.14742	null
2025-06-17	Compressed Video Super-Resolution based on Hierarchical Encoding	Yuxuan Jiang et.al.	2506.14381	null
2025-06-16	Audio-Visual Driven Compression for Low-Bitrate Talking Head Videos	Riku Takahashi et.al.	2506.13419	null
2025-06-15	iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer	Zhelun Shen et.al.	2506.12847	null
2025-06-13	ICME 2025 Grand Challenge on Video Super-Resolution for Video Conferencing	Babak Naderi et.al.	2506.12269	link
2025-06-10	HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation	Ziyao Huang et.al.	2506.08797	null
2025-06-03	NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results	Xiaohong Liu et.al.	2506.02875	null
2025-06-02	Cocktail-Party Audio-Visual Speech Recognition	Thai-Binh Nguyen et.al.	2506.02178	null
2025-06-02	Low-Rank Head Avatar Personalization with Registers	Sai Tanmay Reddy Chakkera et.al.	2506.01935	null
2025-06-02	Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation	Yuan Gan et.al.	2506.01591	link
2025-06-01	SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers	Zhengcong Fei et.al.	2506.00830	null
2025-05-30	TalkingHeadBench: A Multi-Modal Benchmark & Analysis of Talking-Head DeepFake Detection	Xinqi Xiong et.al.	2505.24866	null
2025-05-29	Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion Modulation	Jiahao Cui et.al.	2505.23525	link
2025-05-29	Video Editing for Audio-Visual Dubbing	Binyamin Manela et.al.	2505.23406	link
2025-05-29	Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation	Hao Li et.al.	2505.23290	link
2025-05-29	MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation	Siyuan Wang et.al.	2505.23120	link
2025-05-28	Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation	Zhe Kong et.al.	2505.22647	link
2025-05-28	Tell me Habibi, is it Real or Fake?	Kartik Kuckreja et.al.	2505.22581	null
2025-05-28	Neural Face Skinning for Mesh-agnostic Facial Expression Cloning	Sihun Cha et.al.	2505.22416	null
2025-05-28	FaceEditTalker: Interactive Talking Head Generation with Facial Attribute Editing	Guanwen Feng et.al.	2505.22141	null
2025-05-28	RESOUND: Speech Reconstruction from Silent Videos via Acoustic-Semantic Decomposed Modeling	Long-Khanh Pham et.al.	2505.22024	null
2025-05-27	OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers	Ziqiao Peng et.al.	2505.21448	null
2025-05-26	Total-Editing: Head Avatar with Editable Appearance, Motion, and Lighting	Yizhou Zhao et.al.	2505.20582	null
2025-05-26	DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations	Ziqiao Peng et.al.	2505.18096	null
2025-05-22	Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis	Radek Daněček et.al.	2504.13386	null
2025-05-14	Test-Time Augmentation for Pose-invariant Face Recognition	Jaemin Jung et.al.	2505.09256	null
2025-05-10	VTutor: An Animated Pedagogical Agent SDK that Provide Real Time Multi-Model Feedback	Eason Chen et.al.	2505.06676	null
2025-05-10	OT-Talk: Animating 3D Talking Head with Optimal Transportation	Xinmu Wang et.al.	2505.01932	null
2025-05-10	MagicPortrait: Temporally Consistent Face Reenactment with 3D Geometric Guidance	Mengting Wei et.al.	2504.21497	link
2025-05-08	OXSeg: Multidimensional attention UNet-based lip segmentation using semi-supervised lip contours	Hanie Moghaddasi et.al.	2505.05531	null
2025-05-03	GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting	Anushka Agarwal et.al.	2505.01928	null
2025-05-02	Model See Model Do: Speech-Driven Facial Animation with Style Control	Yifang Pan et.al.	2505.01319	null
2025-05-02	FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing	Gaoxiang Cong et.al.	2505.01263	null
2025-05-01	KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution	Antoni Bigata et.al.	2505.00497	null
2025-04-29	IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos	Yuan Li et.al.	2504.19165	null
2025-04-27	Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions	Mohammad Mahdi Abootorabi et.al.	2504.19056	link
2025-04-26	Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning	Yifan Xie et.al.	2504.18810	null
2025-04-25	Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation	Weipeng Tan et.al.	2504.18087	null
2025-04-14	SpinMeRound: Consistent Multi-View Identity Generation Using Diffusion Models	Stathis Galanakis et.al.	2504.10716	null
2025-04-10	ChildlikeSHAPES: Semantic Hierarchical Region Parsing for Animating Figure Drawings	Astitva Srivastava et.al.	2504.08022	null
2025-04-08	VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing	Juan Luis Gonzalez Bello et.al.	2504.07146	null
2025-04-08	SE4Lip: Speech-Lip Encoder for Talking Head Synthesis to Solve Phoneme-Viseme Alignment Ambiguity	Yihuan Huang et.al.	2504.05803	null
2025-04-08	Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation	Zhihua Xu et.al.	2504.05746	null
2025-04-08	Contrastive Decoupled Representation Learning and Regularization for Speech-Preserving Facial Expression Manipulation	Tianshui Chen et.al.	2504.05672	null
2025-04-07	Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation	Fa-Ting Hong et.al.	2504.02542	link
2025-04-06	FluentLip: A Phonemes-Based Two-stage Approach for Audio-Driven Lip Synthesis with Optical Flow Consistency	Shiyan Liu et.al.	2504.04427	null
2025-04-04	A Human Digital Twin Architecture for Knowledge-based Interactions and Context-Aware Conversations	Abdul Mannan Mohammed et.al.	2504.03147	null
2025-04-03	OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication	Zhongjian Wang et.al.	2504.02433	null
2025-04-03	VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models	Kim Sung-Bin et.al.	2504.02386	null
2025-04-02	Detecting Lip-Syncing Deepfakes: Vision Temporal Transformer for Analyzing Mouth Inconsistencies	Soumyya Kanti Datta et.al.	2504.01470	link
2025-04-02	EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters	Xuli Shen et.al.	2503.19416	null
2025-04-01	Monocular and Generalizable Gaussian Talking Head Animation	Shengjie Gong et.al.	2504.00665	null
2025-03-31	Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics	Lee Chae-Yeon et.al.	2503.20308	null
2025-03-30	MoCha: Towards Movie-Grade Talking Character Synthesis	Cong Wei et.al.	2503.23307	null
2025-03-29	STSA: Spatial-Temporal Semantic Alignment for Visual Dubbing	Zijun Ding et.al.	2503.23039	link
2025-03-28	Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis	Shuai Shen et.al.	2503.22605	null
2025-03-28	Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance	Haijie Yang et.al.	2503.22225	null
2025-03-27	ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model	Jinwei Qi et.al.	2503.21144	null
2025-03-26	Dual Audio-Centric Modality Coupling for Talking Head Generation	Ao Fu et.al.	2503.22728	null
2025-03-25	AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers	Jiazhi Guan et.al.	2503.19824	null
2025-03-25	MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation	Yukang Lin et.al.	2503.19383	null
2025-03-25	HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation	Zunnan Xu et.al.	2503.18860	null
2025-03-25	Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model	Yingying Fan et.al.	2503.16942	null
2025-03-24	DisentTalk: Cross-lingual Talking Face Generation via Semantic Disentangled Diffusion Model	Kangwei Liu et.al.	2503.19001	null
2025-03-24	Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation	Dingcheng Zhen et.al.	2503.18429	null
2025-03-23	DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation	Peng Chen et.al.	2503.18159	link
2025-03-21	TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting	Jianchuan Chen et.al.	2503.17032	null
2025-03-21	From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech	Ji-Hoon Kim et.al.	2503.16956	null
2025-03-20	UniSync: A Unified Framework for Audio-Visual Synchronization	Tao Feng et.al.	2503.16357	null
2025-03-20	PC-Talk: Precise Facial Animation Control for Audio-Driven Talking Face Generation	Baiqin Wang et.al.	2503.14295	null
2025-03-19	DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis	Yuming Gu et.al.	2503.15667	link
2025-03-19	KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation	Antoni Bigata et.al.	2503.01715	null
2025-03-17	SyncDiff: Diffusion-based Talking Head Synthesis with Bottlenecked Temporal Visual Prior for Improved Synchronization	Xulin Fan et.al.	2503.13371	null
2025-03-17	Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait	Chaolong Yang et.al.	2503.12963	link
2025-03-16	Versatile Multimodal Controls for Whole-Body Talking Human Animation	Zheng Qin et.al.	2503.08714	null
2025-03-14	Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control	Hejia Chen et.al.	2503.14517	null
2025-03-14	EmoDiffusion: Enhancing Emotional 3D Facial Animation with Latent Diffusion Models	Yixuan Zhang et.al.	2503.11028	null
2025-03-12	StyleSpeaker: Audio-Enhanced Fine-Grained Style Modeling for Speech-Driven 3D Facial Animation	An Yang et.al.	2503.09852	null
2025-03-12	Bidirectional Learned Facial Animation Codec for Low Bitrate Talking Head Videos	Riku Takahashi et.al.	2503.09787	null
2025-03-09	Removing Averaging: Personalized Lip-Sync Driven Characters Based on Identity Adapter	Yanyu Zhu et.al.	2503.06397	null
2025-03-07	MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice	Hongwei Yi et.al.	2503.05978	null
2025-03-06	FREAK: Frequency-modulated High-fidelity and Real-time Audio-driven Talking Portrait Synthesis	Ziqi Ni et.al.	2503.04067	null
2025-03-02	FaceShot: Bring Any Character into Life	Junyao Gao et.al.	2503.00740	null
2025-03-01	Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture	Xuanchen Li et.al.	2503.00495	null
2025-02-28	Two-Stream Spatial-Temporal Transformer Framework for Person Identification via Natural Conversational Keypoints	Masoumeh Chapariniya et.al.	2502.20803	null
2025-02-28	ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model	Xuangeng Chu et.al.	2502.20323	null
2025-02-27	InsTaG: Learning Personalized 3D Talking Head from Few-Second Video	Jiahe Li et.al.	2502.20387	link
2025-02-27	High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model	Mingtao Guo et.al.	2502.19894	link
2025-02-26	FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion mode	Lingzhou Mu et.al.	2502.19455	null
2025-02-24	Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation	Baptiste Chopin et.al.	2502.17198	null
2025-02-20	NeRF-3DTalker: Neural Radiance Field with 3D Prior Aided Audio Disentanglement for Talking Head Synthesis	Xiaoxing Liu et.al.	2502.14178	null
2025-02-18	AV-Flow: Transforming Text to Audio-Visual Human-like Interactions	Aggelina Chatziagapi et.al.	2502.13133	null
2025-02-17	SayAnything: Audio-Driven Lip Synchronization with Conditional Video Diffusion	Junxian Ma et.al.	2502.11515	null
2025-02-15	SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers	Di Qiu et.al.	2502.10841	link
2025-02-13	Long-Term TalkingFace Generation via Motion-Prior Conditional Diffusion Model	Fei Shen et.al.	2502.09533	null
2025-02-13	VTutor: An Open-Source SDK for Generative AI-Powered Animated Pedagogical Agents with Multi-Media Output	Eason Chen et.al.	2502.04103	null
2025-02-11	Playmate: Flexible Control of Portrait Animation via 3D-Implicit Space Guided Diffusion	Xingpei Ma et.al.	2502.07203	null
2025-02-07	Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark	Han Zhang et.al.	2502.04976	null
2025-02-02	EmoTalkingGaussian: Continuous Emotion-conditioned Talking Head Synthesis	Junuk Cha et.al.	2502.00654	null
2025-01-24	SyncAnimation: A Real-Time End-to-End Framework for Audio-Driven Human Pose and Talking Head Animation	Yujian Liu et.al.	2501.14646	null
2025-01-21	A Lightweight and Interpretable Deepfakes Detection Framework	Muhammad Umar Farooq et.al.	2501.11927	null
2025-01-18	EMO2: End-Effector Guided Audio-Driven Avatar Video Generation	Linrui Tian et.al.	2501.10687	null
2025-01-17	TalkingEyes: Pluralistic Speech-Driven 3D Eye Gaze Animation	Yixiang Zhuang et.al.	2501.09921	null
2025-01-15	Joint Learning of Depth and Appearance for Portrait Image Animation	Xinya Ji et.al.	2501.08649	null
2025-01-15	Make-A-Character 2: Animatable 3D Character Generation From a Single Image	Lin Liu et.al.	2501.07870	null
2025-01-09	Towards Dynamic Neural Communication and Speech Neuroprosthesis Based on Viseme Decoding	Ji-Ha Park et.al.	2501.14790	null
2025-01-09	Identity-Preserving Video Dubbing Using Motion Warping	Runzhen Liu et.al.	2501.04586	null
2025-01-09	MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation	Huaize Liu et.al.	2501.01808	null
2025-01-07	Generating and Detecting Various Types of Fake Image and Audio Content: A Review of Modern Deep Learning Technologies and Tools	Arash Dehghani et.al.	2501.06227	null
2025-01-07	VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control	Yuanpeng Tu et.al.	2501.01427	null
2025-01-06	RDD4D: 4D Attention-Guided Road Damage Detection And Classification	Asma Alkalbani et.al.	2501.02822	link
2025-01-06	Takeaways from Applying LLM Capabilities to Multiple Conversational Avatars in a VR Pilot Study	Mykola Maslych et.al.	2501.00168	null
2025-01-03	JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing	Qili Wang et.al.	2501.01798	link
2024-12-28	DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis	Kaijun Deng et.al.	2412.20148	link
2024-12-26	UniAvatar: Taming Lifelike Audio-Driven Talking Head Generation with Comprehensive Motion and Lighting Control	Wenzhang Sun et.al.	2412.19860	null
2024-12-26	Generating Editable Head Avatars with 3D Gaussian GANs	Guohao Li et.al.	2412.19149	link
2024-12-23	FaceLift: Single Image to 3D Head with View Generation and GS-LRM	Weijie Lyu et.al.	2412.17812	null
2024-12-22	FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation	Tianyun Zhong et.al.	2412.16915	null
2024-12-18	Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters	Steven Hogue et.al.	2412.14333	link
2024-12-18	GLCF: A Global-Local Multimodal Coherence Analysis Framework for Talking Face Generation Detection	Xiaocan Chen et.al.	2412.13656	null
2024-12-18	Learning to Control an Android Robot Head for Facial Animation	Marcel Heisler et.al.	2412.13641	null
2024-12-18	Real-time One-Step Diffusion-based Expressive Portrait Videos Generation	Hanzhong Guo et.al.	2412.13479	link
2024-12-18	VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization	Tao Liu et.al.	2412.09892	null
2024-12-16	Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content	Rohit Kundu et.al.	2412.12278	null
2024-12-13	GoHD: Gaze-oriented and Highly Disentangled Portrait Animation with Rhythmic Poses and Realistic Expression	Ziqi Zhou et.al.	2412.09296	link
2024-12-12	LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync	Chunyu Li et.al.	2412.09262	link
2024-12-12	EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing	Gaoxiang Cong et.al.	2412.08988	null
2024-12-11	PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis	Yifan Xie et.al.	2412.08504	null
2024-12-10	PortraitTalk: Towards Customizable One-Shot Audio-to-Talking Face Generation	Fatemeh Nazarieh et.al.	2412.07754	null
2024-12-10	IF-MDM: Implicit Face Motion Diffusion Model for High-Fidelity Realtime Talking Head Generation	Sejong Yang et.al.	2412.04000	null
2024-12-05	MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation	Longtao Zheng et.al.	2412.04448	null
2024-12-05	Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks	Jiahao Cui et.al.	2412.00733	link
2024-12-04	SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model	Yan Li et.al.	2412.03430	null
2024-12-02	One Shot, One Talk: Whole-body Talking Avatar from a Single Image	Jun Xiang et.al.	2412.01106	null
2024-12-01	Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation	Shuling Zhao et.al.	2412.00719	null
2024-11-29	LokiTalk: Learning Fine-Grained and Generalizable Correspondences to Enhance NeRF-based Talking Head Synthesis	Tianqi Li et.al.	2411.19525	null
2024-11-29	Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis	Tianqi Li et.al.	2411.19509	link
2024-11-29	V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow	Jeongsoo Choi et.al.	2411.19486	link
2024-11-26	Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey	Hong-Hanh Nguyen-Le et.al.	2411.17911	null
2024-11-25	Sonic: Shifting Focus to Global Audio Perception in Portrait Animation	Xiaozhong Ji et.al.	2411.16331	null
2024-11-25	ESARM: 3D Emotional Speech-to-Animation via Reward Model from Automatically-Ranked Demonstrations	Xulong Zhang et.al.	2411.13089	null
2024-11-24	LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis	Haojie Zhang et.al.	2411.16748	null
2024-11-23	EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion	Haotian Wang et.al.	2411.16726	null
2024-11-23	ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance	Haijie Yang et.al.	2411.15436	null
2024-11-20	Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis	Pegah Salehi et.al.	2411.13209	link
2024-11-20	JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation	Xuyang Cao et.al.	2411.09209	link
2024-11-14	LES-Talker: Fine-Grained Emotion Editing for Talking Head Generation in Linear Emotion Space	Guanwen Feng et.al.	2411.09268	null
2024-11-06	Large Generative Model-assisted Talking-face Semantic Communication System	Feibo Jiang et.al.	2411.03876	null
2024-10-31	Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts	Xiang Deng et.al.	2410.23836	null
2024-10-29	Multimodal Semantic Communication for Generative Audio-Driven Video Conferencing	Haonan Tong et.al.	2410.22112	null
2024-10-24	Real-time 3D-aware Portrait Video Relighting	Ziqi Cai et.al.	2410.18355	link
2024-10-21	Joker: Conditional 3D Head Synthesis with Extreme Facial Expressions	Malte Prinzler et.al.	2410.16395	null
2024-10-18	Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization	Bin Lin et.al.	2410.14283	null
2024-10-18	DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation	Hanbo Cheng et.al.	2410.13726	link
2024-10-16	MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting	Yue Zhang et.al.	2410.10122	link
2024-10-15	Titanic Calling: Low Bandwidth Video Conference from the Titanic Wreck	Fevziye Irem Eyiokur et.al.	2410.11434	null
2024-10-15	MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes	Zhenhui Ye et.al.	2410.06734	null
2024-10-14	Character-aware audio-visual subtitling in context	Jaesung Huh et.al.	2410.11068	null
2024-10-14	Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads	Federico Nocentini et.al.	2410.11041	null
2024-10-14	TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model	Jiazhi Guan et.al.	2410.10696	null
2024-10-14	Generative Human Video Compression with Multi-granularity Temporal Trajectory Factorization	Shanzhi Yin et.al.	2410.10171	null
2024-10-10	MMHead: Towards Fine-grained Multi-modal 3D Facial Animation	Sijing Wu et.al.	2410.07757	null
2024-10-09	FreeAvatar: Robust 3D Facial Animation Transfer by Learning an Expression Foundation Model	Feng Qiu et.al.	2409.13180	null
2024-10-01	LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details	Jian Yang et.al.	2410.00990	null
2024-09-29	Learning Frame-Wise Emotion Intensity for Audio-Driven Talking-Head Generation	Jingyi Xu et.al.	2409.19501	null
2024-09-27	Diverse Code Query Learning for Speech-Driven Facial Animation	Chunzhi Gu et.al.	2409.19143	null
2024-09-26	Stable Video Portraits	Mirela Ostrek et.al.	2409.18083	null
2024-09-25	ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE	Sichun Wu et.al.	2409.07966	link
2024-09-24	FastTalker: Jointly Generating Speech and Conversational Gestures from Text	Zixin Guo et.al.	2409.16404	null
2024-09-23	FaceVid-1K: A Large-Scale High-Quality Multiracial Human Face Video Dataset	Donglin Di et.al.	2410.07151	null
2024-09-23	MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning	Yue Han et.al.	2409.15179	null
2024-09-18	JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation	Sai Tanmay Reddy Chakkera et.al.	2409.12156	null
2024-09-18	GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations	Kartik Teotia et.al.	2409.11951	null
2024-09-17	3DFacePolicy: Speech-Driven 3D Facial Animation with Diffusion Policy	Xuanmeng Sha et.al.	2409.10848	null
2024-09-16	DreamHead: Learning Spatial-Temporal Correspondence via Hierarchical Diffusion for Audio-driven Talking Head Synthesis	Fa-Ting Hong et.al.	2409.10281	null
2024-09-14	StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads	Suzhen Wang et.al.	2409.09292	null
2024-09-11	DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures	Steven Hogue et.al.	2409.07649	null
2024-09-11	EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion	Jian Zhang et.al.	2409.07255	link
2024-09-09	PersonaTalk: Bring Attention to Your Persona in Visual Dubbing	Longhao Zhang et.al.	2409.05379	null
2024-09-09	KAN-Based Fusion of Dual-Domain for Audio-Driven Facial Landmarks Generation	Hoang-Son Vo-Thanh et.al.	2409.05330	link
2024-09-05	SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing	Lingyu Xiong et.al.	2409.03605	null
2024-09-05	SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model	Weipeng Tan et.al.	2409.03270	null
2024-09-04	PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation	Jun Ling et.al.	2409.02657	null
2024-09-02	KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding	Zhihao Xu et.al.	2409.01113	link
2024-08-28	Micro and macro facial expressions by driven animations in realistic Virtual Humans	Rubens Halbig Montanha et.al.	2408.16110	null
2024-08-27	MegActor- $Σ$ : Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer	Shurong Yang et.al.	2408.14975	null
2024-08-25	TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation	Jack Saunders et.al.	2408.13714	null
2024-08-23	G3FA: Geometry-guided GAN for Face Animation	Alireza Javanmardi et.al.	2408.13049	null
2024-08-21	AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition	Minheng Ni et.al.	2408.11564	null
2024-08-21	EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face with Mesh Attention	Yihong Lin et.al.	2408.11518	null
2024-08-20	DEGAS: Detailed Expressions on Full-Body Gaussian Avatars	Zhijing Shao et.al.	2408.10588	link
2024-08-18	FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model	Ziyu Yao et.al.	2408.09384	null
2024-08-18	Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation	Xukun Zhou et.al.	2408.09357	null
2024-08-18	S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis	Dongze Li et.al.	2408.09347	null
2024-08-16	GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer	Yihong Lin et.al.	2408.01826	null
2024-08-14	Content and Style Aware Audio-Driven Facial Animation	Qingju Liu et.al.	2408.07005	null
2024-08-12	DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation	Jisoo Kim et.al.	2408.06010	null
2024-08-10	High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model	Weizhi Zhong et.al.	2408.05416	null
2024-08-10	Style-Preserving Lip Sync via Audio-Aware Style Reference	Weizhi Zhong et.al.	2408.05412	null
2024-08-09	DeepSpeak Dataset v1.0	Sarah Barrington et.al.	2408.05366	null
2024-08-06	ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer	Jiazhi Guan et.al.	2408.03284	null
2024-08-03	Landmark-guided Diffusion Model for High-fidelity and Temporally Coherent Talking Head Generation	Jintao Tan et.al.	2408.01732	null
2024-08-03	JambaTalk: Speech-Driven 3D Talking Head Generation Based on Hybrid Transformer-Mamba Model	Farzaneh Jafari et.al.	2408.01627	null
2024-08-01	UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model	Xiangyu Fan et.al.	2408.00762	null
2024-08-01	Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion	Manuel Kansy et.al.	2408.00458	null
2024-08-01	EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head	Qianyun He et.al.	2408.00297	null
2024-07-31	Deformable 3D Shape Diffusion Model	Dengsheng Chen et.al.	2407.21428	null
2024-07-26	LinguaLinker: Audio-Driven Portraits Animation with Implicit Facial Control Enhancement	Rui Zhang et.al.	2407.18595	null
2024-07-24	A Comprehensive Review and Taxonomy of Audio-Visual Synchronization Techniques for Realistic Speech Animation	Jose Geraldo Fernandes et.al.	2407.17430	null
2024-07-24	The impact of differences in facial features between real speakers and 3D face models on synthesized lip motions	Rabab Algadhy et.al.	2407.17253	null
2024-07-22	PAV: Personalized Head Avatar from Unstructured Video Collection	Akin Caliskan et.al.	2407.21047	null
2024-07-21	Anchored Diffusion for Video Face Reenactment	Idan Kligvasser et.al.	2407.15153	null
2024-07-20	Text-based Talking Video Editing with Cascaded Conditional Diffusion	Bo Han et.al.	2407.14841	null
2024-07-17	Universal Facial Encoding of Codec Avatars from VR Headsets	Shaojie Bai et.al.	2407.13038	null
2024-07-17	EmoFace: Audio-driven Emotional 3D Face Animation	Chang Liu et.al.	2407.12501	link
2024-07-13	Learning Online Scale Transformation for Talking Head Video Generation	Fa-Ting Hong et.al.	2407.09965	null
2024-07-12	Real Face Video Animation Platform	Xiaokai Chen et.al.	2407.18955	null
2024-07-12	One-Shot Pose-Driving Face Animation Platform	He Feng et.al.	2407.08949	null
2024-07-12	EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions	Zhiyuan Chen et.al.	2407.08136	link
2024-07-08	MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices	Jianwen Jiang et.al.	2407.05712	null
2024-07-08	Audio-driven High-resolution Seamless Talking Head Video Editing via StyleGAN	Jiacheng Su et.al.	2407.05577	null
2024-07-04	Compressed Skinning for Facial Blendshapes	Ladislav Kavan et.al.	2406.11597	null
2024-07-03	LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control	Jianzhu Guo et.al.	2407.03168	link
2024-07-01	Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert	Han EunGi et.al.	2407.01034	null
2024-06-26	RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network	Xiaozhong Ji et.al.	2406.18284	null
2024-06-24	The Effects of Embodiment and Personality Expression on Learning in LLM-based Educational Agents	Sinan Sonlu et.al.	2407.10993	null
2024-06-21	EmpathyEar: An Open-source Avatar Multimodal Empathetic Chatbot	Hao Fei et.al.	2406.15177	link
2024-06-20	MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset	Kim Sung-Bin et.al.	2406.14272	null
2024-06-19	DF40: Toward Next-Generation Deepfake Detection	Zhiyuan Yan et.al.	2406.13495	link
2024-06-19	AniFaceDiff: High-Fidelity Face Reenactment via Facial Parametric Conditioned Diffusion Models	Ken Chen et.al.	2406.13272	null
2024-06-18	RITA: A Real-time Interactive Talking Avatars Framework	Wuxinlin Cheng et.al.	2406.13093	null
2024-06-18	A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing	Ming Meng et.al.	2406.10553	null
2024-06-17	NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head Generation	Niu Guanchen et.al.	2406.11259	null
2024-06-17	Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement	Runyi Yu et.al.	2406.08096	null
2024-06-16	Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation	Mingwang Xu et.al.	2406.08801	null
2024-06-14	DNPM: A Neural Parametric Model for the Synthesis of Facial Geometric Details	Haitao Cao et.al.	2405.19688	null
2024-06-13	Talking Heads: Understanding Inter-layer Communication in Transformer Language Models	Jack Merullo et.al.	2406.09519	null
2024-06-13	DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing	Neha Sahipjohn et.al.	2406.08802	null
2024-06-12	Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation	Jiadong Liang et.al.	2406.07895	null
2024-06-07	Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation	Yue Ma et.al.	2406.01900	null
2024-06-05	Controllable Talking Face Generation by Implicit Facial Keypoints Editing	Dong Zhao et.al.	2406.02880	link
2024-05-31	MunchSonic: Tracking Fine-grained Dietary Actions through Active Acoustic Sensing on Eyeglasses	Saif Mahmud et.al.	2405.21004	null
2024-05-31	MegActor: Harness the Power of Raw Video for Vivid Portrait Animation	Shurong Yang et.al.	2405.20851	link
2024-05-30	Audio2Rig: Artist-oriented deep learning tool for facial animation	Bastien Arcelin et.al.	2405.20412	null
2024-05-28	OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance	Shuheng Ge et.al.	2405.14709	null
2024-05-24	InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation	Yuchi Wang et.al.	2405.15758	link
2024-05-22	Metabook: An Automatically Generated Augmented Reality Storybook Interaction System to Improve Children's Engagement in Storytelling	Yibo Wang et.al.	2405.13701	null
2024-05-21	Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control	Yue Han et.al.	2405.12970	null
2024-05-16	Faces that Speak: Jointly Synthesising Talking Face and Speech from Text	Youngjoon Jang et.al.	2405.10272	null
2024-05-14	PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset	Yang Hou et.al.	2405.08838	link
2024-05-12	Listen, Disentangle, and Control: Controllable Speech-Driven Talking Head Generation	Changpeng Cai et.al.	2405.07257	null
2024-05-10	NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior	Gihoon Kim et.al.	2405.05749	null
2024-05-09	SwapTalk: Audio-Driven Talking Face Generation with One-Shot Customization in Latent Space	Zeren Zhang et.al.	2405.05636	null
2024-05-08	Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention	Ruijie Tao et.al.	2404.18501	link
2024-05-07	Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation	Dogucan Yaman et.al.	2405.04327	null
2024-05-06	AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding	Tao Liu et.al.	2405.03121	link
2024-04-29	EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars	Nikita Drobyshev et.al.	2404.19110	null
2024-04-29	GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting	Bo Chen et.al.	2404.19040	null
2024-04-29	Embedded Representation Learning Network for Animating Styled Video Portrait	Tianyong Wang et.al.	2404.19038	null
2024-04-29	CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation	Xiangyu Liang et.al.	2404.18604	null
2024-04-28	GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting	Hongyun Yu et.al.	2404.14037	null
2024-04-25	GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting	Kyusun Cho et.al.	2404.16012	link
2024-04-23	TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting	Jiahe Li et.al.	2404.15264	link
2024-04-19	Learn2Talk: 3D Talking Face Learns from 2D Talking Face	Yixiang Zhuang et.al.	2404.12888	null
2024-04-16	VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time	Sicheng Xu et.al.	2404.10667	null
2024-04-15	FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance, Head-pose, and Facial Expression Features	Andre Rochow et.al.	2404.09736	null
2024-04-13	THQA: A Perceptual Quality Assessment Database for Talking Heads	Yingjie Zhou et.al.	2404.09003	link
2024-04-11	EFHQ: Multi-purpose ExtremePose-Face-HQ dataset	Trung Tuan Dao et.al.	2312.17205	null
2024-04-09	Deepfake Generation and Detection: A Benchmark and Survey	Gan Pei et.al.	2403.17881	link
2024-04-08	SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation	Heyuan Li et.al.	2404.05680	null
2024-04-07	GvT: A Graph-based Vision Transformer with Talking-Heads Utilizing Sparsity, Trained from Scratch on Small Datasets	Dongjing Shan et.al.	2404.04924	null
2024-04-07	Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation	Renshuai Liu et.al.	2401.01207	null
2024-04-03	MI-NeRF: Learning a Single Face NeRF from Multiple Identities	Aggelina Chatziagapi et.al.	2403.19920	null
2024-04-02	EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis	Shuai Tan et.al.	2404.01647	null
2024-04-02	Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation	Taekyung Ki et.al.	2404.00636	null
2024-04-01	FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio	Chao Xu et.al.	2403.01901	link
2024-04-01	Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation	Se Jin Park et.al.	2305.19556	null
2024-03-29	Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior	Jaehoon Ko et.al.	2403.20153	link
2024-03-28	MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation	Seyeon Kim et.al.	2403.19144	link
2024-03-28	GOTCHA: Real-Time Video Deepfake Detection via Challenge-Response	Govind Mittal et.al.	2210.06186	link
2024-03-27	X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention	You Xie et.al.	2403.15931	null
2024-03-26	Superior and Pragmatic Talking Face Generation with Teacher-Student Framework	Chao Liang et.al.	2403.17883	null
2024-03-26	AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation	Huawei Wei et.al.	2403.17694	link
2024-03-25	DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment	Stella Bounareli et.al.	2403.17217	null
2024-03-25	AnimateMe: 4D Facial Expressions via Diffusion Models	Dimitrios Gerogiannis et.al.	2403.17213	null
2024-03-25	Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework	Ziyao Huang et.al.	2403.16510	link
2024-03-23	Adaptive Super Resolution For One-Shot Talking-Head Generation	Luchuan Song et.al.	2403.15944	link
2024-03-23	Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis	Zhenhui Ye et.al.	2401.08503	link
2024-03-22	LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One Example	Soyeon Yoon et.al.	2403.15227	link
2024-03-22	Virbo: Multimodal Multilingual Avatar Video Generation in Digital Marketing	Juan Zhang et.al.	2403.11700	null
2024-03-19	EmoVOCA: Speech-Driven Emotional 3D Talking Heads	Federico Nocentini et.al.	2403.12886	link
2024-03-19	ScanTalk: 3D Talking Heads from Unregistered Scans	Federico Nocentini et.al.	2403.10942	link
2024-03-15	StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation	Dongchan Min et.al.	2208.10922	null
2024-03-14	GAIA: Zero-shot Talking Avatar Generation	Tianyu He et.al.	2311.15230	null
2024-03-13	Say Anything with Any Style	Shuai Tan et.al.	2403.06363	null
2024-03-12	FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization	Shuai Tan et.al.	2403.06375	null
2024-03-12	Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style	Shuai Tan et.al.	2403.06365	null
2024-03-11	A Comparative Study of Perceptual Quality Metrics for Audio-driven Talking Head Videos	Weixia Zhang et.al.	2403.06421	link
2024-03-05	Memories are One-to-Many Mapping Alleviators in Talking Face Generation	Anni Tang et.al.	2212.05005	null
2024-03-02	G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment	Juan Zhang et.al.	2402.18122	null
2024-03-01	DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder	Chenpeng Du et.al.	2303.17550	null
2024-02-29	Learning a Generalized Physical Face Model From Data	Lingchen Yang et.al.	2402.19477	null
2024-02-28	Context-aware Talking Face Video Generation	Meidai Xuanyuan et.al.	2402.18092	null
2024-02-27	EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions	Linrui Tian et.al.	2402.17485	null
2024-02-27	Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis	Zicheng Zhang et.al.	2402.17364	link
2024-02-26	Resolution-Agnostic Neural Compression for High-Fidelity Portrait Video Conferencing via Implicit Radiance Fields	Yifei Li et.al.	2402.16599	null
2024-02-25	AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation	Yasheng Sun et.al.	2402.16124	null
2024-02-21	Bring Your Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized Characters	Zechen Bai et.al.	2402.13724	link
2024-02-21	StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing	Gaoxiang Cong et.al.	2402.12636	link
2024-02-12	StyleLipSync: Style-based Personalized Lip-sync Video Generation	Taekyung Ki et.al.	2305.00521	null
2024-02-08	DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer	Zhiyuan Ma et.al.	2402.05712	link
2024-02-05	One-shot Neural Face Reenactment via Finding Directions in GAN's Latent Space	Stella Bounareli et.al.	2402.03553	null
2024-02-02	EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation	Guanwen Feng et.al.	2402.01422	null
2024-01-31	MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis	Wenhao Guan et.al.	2312.10687	null
2024-01-30	Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance	Qingcheng Zhao et.al.	2401.15687	null
2024-01-28	Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-Syncing DeepFakes	Weifeng Liu et.al.	2401.15668	link
2024-01-27	An Implicit Physical Face Model Driven by Expression and Style	Lingchen Yang et.al.	2401.15414	null
2024-01-26	Implicit Neural Representation for Physics-driven Actuated Soft Bodies	Lingchen Yang et.al.	2401.14861	null
2024-01-25	SAiD: Speech-driven Blendshape Facial Animation with Diffusion	Inkyu Park et.al.	2401.08655	link
2024-01-23	NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for Talking Face Synthesis	Chongke Bi et.al.	2401.12568	null
2024-01-19	Fast Registration of Photorealistic Avatars for VR Facial Animation	Chaitanya Patel et.al.	2401.11002	null
2024-01-18	Exposing Lip-syncing Deepfakes from Mouth Inconsistencies	Soumyya Kanti Datta et.al.	2401.10113	link
2024-01-18	Text-driven Talking Face Synthesis by Reprogramming Audio-driven Models	Jeongsoo Choi et.al.	2306.16003	null
2024-01-16	EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model	Bingyuan Zhang et.al.	2401.08049	null
2024-01-12	DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder	Tao Liu et.al.	2311.01811	link
2024-01-11	Dubbing for Everyone: Data-Efficient Visual Dubbing using Neural Rendering Priors	Jack Saunders et.al.	2401.06126	null
2024-01-11	Jump Cut Smoothing for Talking Heads	Xiaojuan Wang et.al.	2401.04718	null
2024-01-08	AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation	Liyang Chen et.al.	2310.07236	null
2024-01-07	Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness	Sicheng Yang et.al.	2401.03476	null
2024-01-04	Expressive Speech-driven Facial Animation with controllable emotions	Yutong Chen et.al.	2301.02008	link
2023-12-23	TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation	Xize Cheng et.al.	2312.15197	null
2023-12-21	DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation	Chenxu Zhang et.al.	2312.13578	null
2023-12-20	FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability	Linze Li et.al.	2312.03775	null
2023-12-19	Learning Dense Correspondence for NeRF-Based Face Reenactment	Songlin Yang et.al.	2312.10422	null
2023-12-19	Gaussian3Diff: 3D Gaussian Diffusion for 3D Full Head Synthesis and Editing	Yushi Lan et.al.	2312.03763	null
2023-12-18	VectorTalker: SVG Talking Face Generation with Progressive Vectorisation	Hao Hu et.al.	2312.11568	null
2023-12-18	AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis	Dongze Li et.al.	2312.10921	null
2023-12-18	Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation	Hui Fu et.al.	2312.10877	null
2023-12-15	DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models	Yifeng Ma et.al.	2312.09767	link
2023-12-15	Attention-Based VR Facial Animation with Visual Mouth Camera Guidance for Immersive Telepresence Avatars	Andre Rochow et.al.	2312.09750	null
2023-12-13	uTalk: Bridging the Gap Between Humans and AI	Hussam Azzuni et.al.	2310.02739	null
2023-12-13	MMFace4D: A Large-Scale Multi-Modal 4D Face Dataset for Audio-Driven 3D Face Animation	Haozhe Wu et.al.	2303.09797	null
2023-12-12	GMTalker: Gaussian Mixture based Emotional talking video Portraits	Yibo Xia et.al.	2312.07669	null
2023-12-12	GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance	Haiming Zhang et.al.	2312.07385	null
2023-12-11	Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism	Georgios Milis et.al.	2312.06613	link
2023-12-11	Study of Non-Verbal Behavior in Conversational Agents	Camila Vicari Maccari et.al.	2312.06530	null
2023-12-11	DiT-Head: High-Resolution Talking Head Synthesis using Diffusion Transformers	Aaron Mir et.al.	2312.06400	null
2023-12-11	Audio-driven Talking Face Generation by Overcoming Unintended Information Flow	Dogucan Yaman et.al.	2307.09368	null
2023-12-10	DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation	Fa-Ting Hong et.al.	2305.06225	link
2023-12-09	R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning	Zhiling Ye et.al.	2312.05572	null
2023-12-09	FT2TF: First-Person Statement Text-To-Talking Face Generation	Xingjian Diao et.al.	2312.05430	null
2023-12-08	SingingHead: A Large-scale 4D Dataset for Singing Head Animation	Sijing Wu et.al.	2312.04369	null
2023-12-07	VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior	Xusen Sun et.al.	2312.01841	null
2023-12-05	PMMTalk: Speech-Driven 3D Facial Animation from Complementary Pseudo Multi-modal Features	Tianshun Han et.al.	2312.02781	null
2023-12-05	MyPortrait: Morphable Prior-Guided Personalized Portrait Generation	Bo Ding et.al.	2312.02703	null
2023-12-02	DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser	Peng Chen et.al.	2311.16565	null
2023-12-01	3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing	Balamurugan Thambiraja et.al.	2312.00870	null
2023-11-30	Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data	Yu Deng et.al.	2311.18729	null
2023-11-30	Talking Head(?) Anime from a Single Image 4: Improved Model and Its Distillation	Pramook Khungurn et.al.	2311.17409	null
2023-11-29	SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis	Ziqiao Peng et.al.	2311.17590	link
2023-11-28	THInImg: Cross-modal Steganography for Presenting Talking Heads in Images	Lin Zhao et.al.	2311.17177	null
2023-11-28	BakedAvatar: Baking Neural Fields for Real-Time Head Avatar Synthesis	Hao-Bin Duan et.al.	2311.05521	link
2023-11-28	Continuously Controllable Facial Expression Editing in Talking Face Videos	Zhiyao Sun et.al.	2209.08289	null
2023-11-20	MemoryCompanion: A Smart Healthcare Solution to Empower Efficient Alzheimer's Care Via Unleashing Generative AI	Lifei Zheng et.al.	2311.14730	null
2023-11-15	CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding	Jianzong Wang et.al.	2311.08673	null
2023-11-13	DualTalker: A Cross-Modal Dual Learning Approach for Speech-Driven 3D Facial Animation	Guinan Su et.al.	2311.04766	null
2023-11-12	ChatAnything: Facetime Chat with LLM-Enhanced Personas	Yilin Zhao et.al.	2311.06772	null
2023-11-08	Synthetic Speaking Children -- Why We Need Them and How to Make Them	Muhammad Ali Farooq et.al.	2311.06307	null
2023-11-06	RADIO: Reference-Agnostic Dubbing Video Synthesis	Dongyeun Lee et.al.	2309.01950	null
2023-11-05	3D-Aware Talking-Head Video Motion Transfer	Haomiao Ni et.al.	2311.02549	null
2023-11-03	Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading	Songtao Luo et.al.	2310.05058	link
2023-11-02	LaughTalk: Expressive 3D Talking Head Generation with Laughter	Kim Sung-Bin et.al.	2311.00994	null
2023-11-02	High-Fidelity and Freely Controllable Talking Head Video Generation	Yue Gao et.al.	2304.10168	null
2023-10-31	Breathing Life into Faces: Speech-driven 3D Facial Animation with Natural Head Pose and Detailed Shape	Wei Zhao et.al.	2310.20240	null
2023-10-29	On the Vulnerability of DeepFake Detectors to Attacks Generated by Denoising Diffusion Models	Marija Ivanovska et.al.	2307.05397	null
2023-10-25	Personalized Speech-driven Expressive 3D Facial Animation Synthesis with Style Control	Elif Bozkurt et.al.	2310.17011	null
2023-10-23	The Self 2.0: How AI-Enhanced Self-Clones Transform Self-Perception and Improve Presentation Skills	Qingxiao Zheng et.al.	2310.15112	null
2023-10-19	Gemino: Practical and Robust Neural Compression for Video Conferencing	Vibhaalakshmi Sivaraman et.al.	2209.10507	null
2023-10-17	CorrTalk: Correlation Between Hierarchical Speech and Facial Activity Variances for 3D Animation	Zhaojie Chu et.al.	2310.11295	null
2023-10-15	HyperLips: Hyper Control Lips with High Resolution Decoder for Talking Face Generation	Yaosen Chen et.al.	2310.05720	link
2023-10-12	CleftGAN: Adapting A Style-Based Generative Adversarial Network To Create Images Depicting Cleft Lip Deformity	Abdullah Hayajneh et.al.	2310.07969	link
2023-10-12	Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation	Yuan Gan et.al.	2309.04946	link
2023-10-08	GestSync: Determining who is speaking without a talking head	Sindhu B Hegde et.al.	2310.05304	link
2023-09-30	DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models	Zhiyao Sun et.al.	2310.00434	null
2023-09-28	OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions	Jin Liu et.al.	2309.16148	null
2023-09-26	Emotional Speech-Driven Animation with Content-Emotion Disentanglement	Radek Daněček et.al.	2306.08990	null
2023-09-20	FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion	Stefan Stan et.al.	2309.11306	link
2023-09-20	Context-Aware Talking-Head Video Editing	Songlin Yang et.al.	2308.00462	null
2023-09-18	That's What I Said: Fully-Controllable Talking Face Generation	Youngjoon Jang et.al.	2304.03275	null
2023-09-15	Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech	Junjie Li et.al.	2309.08408	link
2023-09-14	DT-NeRF: Decomposed Triplane-Hash Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis	Yaoyu Su et.al.	2309.07752	null
2023-09-14	DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks	Zipeng Qi et.al.	2309.07509	null
2023-09-14	HDTR-Net: A Real-Time High-Definition Teeth Restoration Network for Arbitrary Talking Face Generation Methods	Yongyuan Li et.al.	2309.07495	link
2023-09-13	PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network	Qinghua Liu et.al.	2309.06723	null
2023-09-12	DF-TransFusion: Multimodal Deepfake Detection via Lip-Audio Cross-Attention and Facial Self-Attention	Aaditya Kharel et.al.	2309.06511	null
2023-09-12	Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos	Ekta Prashnani et.al.	2305.03713	null
2023-09-11	ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment	Yicheng Zhong et.al.	2308.14448	null
2023-09-10	MaskRenderer: 3D-Infused Multi-Mask Realistic Face Reenactment	Tina Behrouzi et.al.	2309.05095	null
2023-09-09	Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video	Xiuzhe Wu et.al.	2309.04814	link
2023-09-01	Unsupervised Learning of Style-Aware Facial Animation from Real Acting Performances	Wolfgang Paier et.al.	2306.10006	null
2023-08-30	From Pixels to Portraits: A Comprehensive Survey of Talking Head Generation Techniques and Applications	Shreyank N Gowda et.al.	2308.16041	null
2023-08-30	SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces	Ziqiao Peng et.al.	2306.10799	link
2023-08-30	Laughing Matters: Introducing Laughing-Face Generation using Diffusion Models	Antoni Bigata Casademunt et.al.	2305.08854	link
2023-08-29	Papeos: Augmenting Research Papers with Talk Videos	Tae Soo Kim et.al.	2308.15224	null
2023-08-25	EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation	Ziqiao Peng et.al.	2303.11089	link
2023-08-24	ToonTalker: Cross-Domain Face Reenactment	Yuan Gong et.al.	2308.12866	null
2023-08-24	Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis	Jiahe Li et.al.	2307.09323	link
2023-08-23	DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion	Se Jin Park et.al.	2310.05934	null
2023-08-21	Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesis	Tong Sha et.al.	2109.02081	null
2023-08-18	Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization	Soumik Mukhopadhyay et.al.	2308.09716	link
2023-08-18	Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation	Fa-Ting Hong et.al.	2307.09906	link
2023-08-17	A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation	Li Liu et.al.	2308.08849	link
2023-08-16	Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with Instructions	Yuqi Sun et.al.	2306.10813	null
2023-08-12	Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation	Zhichao Wang et.al.	2308.06457	link
2023-08-12	DialogueNeRF: Towards Realistic Avatar Face-to-Face Conversation Video Generation	Yichao Yan et.al.	2203.07931	null
2023-08-11	Versatile Face Animator: Driving Arbitrary 3D Facial Avatar in RGBD Space	Haoyu Wang et.al.	2308.06076	link
2023-08-11	VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer	Liyang Chen et.al.	2308.04830	null
2023-08-10	Near-realtime Facial Animation by Deep 3D Simulation Super-Resolution	Hyojoon Park et.al.	2305.03216	null
2023-08-02	Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis	Zhenhui Ye et.al.	2306.03504	null
2023-07-29	Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation	Michał Stypułkowski et.al.	2301.03396	null
2023-07-26	Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation	Federico Nocentini et.al.	2306.01415	link
2023-07-20	HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces	Stella Bounareli et.al.	2307.10797	link
2023-07-19	MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions	Yunfei Liu et.al.	2307.10008	null
2023-07-19	Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline	Zhigang Chang et.al.	2307.09821	null
2023-07-19	OPHAvatars: One-shot Photo-realistic Head Avatars	Shaoxu Li et.al.	2307.09153	link
2023-07-18	FACTS: Facial Animation Creation using the Transfer of Styles	Jack Saunders et.al.	2307.09480	null
2023-07-09	Predictive Coding For Animation-Based Video Compression	Goluck Konuko et.al.	2307.04187	null
2023-07-08	FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction	Ganglai Wang et.al.	2307.03990	null
2023-07-05	Interactive Conversational Head Generation	Mohan Zhou et.al.	2307.02090	null
2023-07-04	A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation	Louis Airale et.al.	2307.03270	link
2023-07-04	Generating Animatable 3D Cartoon Faces from Single Portraits	Chuanyu Pan et.al.	2307.01468	null
2023-07-03	RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting Self-Supervised Representations	Neha Sahipjohn et.al.	2307.01233	null
2023-06-20	Audio-Driven 3D Facial Animation from In-the-Wild Videos	Liying Lu et.al.	2306.11541	null
2023-06-13	Parametric Implicit Face Representation for Audio-Driven Facial Reenactment	Ricong Huang et.al.	2306.07579	null
2023-06-13	AniFaceDrawing: Anime Portrait Exploration during Your Sketching	Zhengyu Huang et.al.	2306.07476	null
2023-06-12	NPVForensics: Jointing Non-critical Phonemes and Visemes for Deepfake Detection	Yu Chen et.al.	2306.06885	null
2023-06-10	StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles	Yifeng Ma et.al.	2301.01081	link
2023-06-08	ReliableSwap: Boosting General Face Swapping Via Reliable Supervision	Ge Yuan et.al.	2306.05356	link
2023-06-06	Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks	Jianrong Wang et.al.	2306.03594	null
2023-06-05	Instruct-Video2Avatar: Video-to-Avatar Generation with Instructions	Shaoxu Li et.al.	2306.02903	link
2023-05-31	High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning	Chao Xu et.al.	2305.02572	null
2023-05-23	CPNet: Exploiting CLIP-based Attention Condenser and Probability Map Guidance for High-fidelity Talking Face Generation	Jingning Xu et.al.	2305.13962	null
2023-05-22	RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars	Dongwei Pan et.al.	2305.13353	link
2023-05-19	UniFLG: Unified Facial Landmark Generator from Text or Speech	Kentaro Mitsui et.al.	2302.14337	null
2023-05-18	An Android Robot Head as Embodied Conversational Agent	Marcel Heisler et.al.	2305.10945	null
2023-05-18	Audio-Visual Person-of-Interest DeepFake Detection	Davide Cozzolino et.al.	2204.03083	link
2023-05-17	INCLG: Inpainting for Non-Cleft Lip Generation with a Multi-Task Image Processing Network	Shuang Chen et.al.	2305.10589	null
2023-05-17	LPMM: Intuitive Pose Control for Neural Talking-Head Model via Landmark-Parameter Morphable Model	Kwangho Lee et.al.	2305.10456	null
2023-05-15	Identity-Preserving Talking Face Generation with Landmark and Appearance Priors	Weizhi Zhong et.al.	2305.08293	link
2023-05-09	Zero-shot personalized lip-to-speech synthesis with face image based voice control	Zheng-Yan Sheng et.al.	2305.14359	null
2023-05-09	StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator	Jiazhi Guan et.al.	2305.05445	null
2023-05-09	Multimodal-driven Talking Face Generation via a Unified Diffusion-based Generator	Chao Xu et.al.	2305.02594	null
2023-05-01	StyleAvatar: Real-time Photo-realistic Portrait Avatar from a Single Video	Lizhen Wang et.al.	2305.00942	link
2023-05-01	GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation	Zhenhui Ye et.al.	2305.00787	null
2023-04-28	A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation	Bo-Kyeong Kim et.al.	2304.00471	null
2023-04-27	Controllable One-Shot Face Video Synthesis With Semantic Aware Prior	Kangning Liu et.al.	2304.14471	null
2023-04-25	AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head	Rongjie Huang et.al.	2304.12995	link
2023-04-24	VR Facial Animation for Immersive Telepresence Avatars	Andre Rochow et.al.	2304.12051	null
2023-04-21	Implicit Neural Head Synthesis via Controllable Local Deformation Fields	Chuhan Chen et.al.	2304.11113	null
2023-04-20	DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation	Shuai Shen et.al.	2301.03786	link
2023-04-18	Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations	Rongliang Wu et.al.	2304.08945	null
2023-04-17	Autoregressive GAN for Semantic Unconditional Head Motion Generation	Louis Airale et.al.	2211.00987	link
2023-04-11	One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field	Weichuang Li et.al.	2304.05097	null
2023-04-06	Face Animation with an Attribute-Guided Diffusion Model	Bohan Zeng et.al.	2304.03199	link
2023-04-06	4D Agnostic Real-Time Facial Animation Pipeline for Desktop Scenarios	Wei Chen et.al.	2304.02814	null
2023-04-03	CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior	Jinbo Xing et.al.	2301.02379	link
2023-04-01	DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance	Longwen Zhang et.al.	2304.03117	null
2023-04-01	TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles	Yifeng Ma et.al.	2304.00334	null
2023-03-31	FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions	Jin Liu et.al.	2303.17789	null
2023-03-29	Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert	Jiadong Wang et.al.	2303.17480	link
2023-03-27	OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis	Hongyi Xu et.al.	2303.15539	null
2023-03-27	Accurate and Interpretable Solution of the Inverse Rig for Realistic Blendshape Models with Quadratic Corrective Terms	Stevo Racković et.al.	2302.04843	null
2023-03-27	MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation	Bowen Zhang et.al.	2212.08062	link
2023-03-27	A Majorization-Minimization Based Method for Nonconvex Inverse Rig Problems in Facial Animation: Algorithm Derivation	Stevo Racković et.al.	2205.04289	null
2023-03-26	OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering	Zhiyuan Ma et.al.	2303.14662	link
2023-03-26	Emotionally Enhanced Talking Face Generation	Sahil Goyal et.al.	2303.11548	link
2023-03-26	Distributed Solution of the Inverse Rig Problem in Blendshape Facial Animation	Stevo Racković et.al.	2303.06370	null
2023-03-24	Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement	Siddarth Ravichandran et.al.	2209.01320	null
2023-03-23	PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360 $^{\circ}$	Sizhe An et.al.	2303.13071	null
2023-03-22	Style Transfer for 2D Talking Head Animation	Trong-Thang Pham et.al.	2303.09799	link
2023-03-22	MARLIN: Masked Autoencoder for facial video Representation LearnINg	Zhixi Cai et.al.	2211.06627	link
2023-03-14	DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions	Geumbyeol Hwang et.al.	2303.07697	link
2023-03-13	SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation	Wenxuan Zhang et.al.	2211.12194	link
2023-03-09	FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning	Kazi Injamamul Haque et.al.	2303.05416	link
2023-03-09	Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation	Qi Chen et.al.	2303.05322	link
2023-03-07	DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video	Zhimeng Zhang et.al.	2303.03988	link
2023-03-05	Cyber Vaccine for Deepfake Immunity	Ching-Chun Chang et.al.	2303.02659	null
2023-03-04	High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors	Yunpeng Bai et.al.	2211.15064	null
2023-03-01	DPE: Disentanglement of Pose and Expression for General Video Portrait Editing	Youxin Pang et.al.	2301.06281	link
2023-02-27	Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video	Minsu Kim et.al.	2303.08670	null
2023-02-27	Memory-augmented Contrastive Learning for Talking Head Generation	Jianrong Wang et.al.	2302.13469	link
2023-02-24	Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertex Attention	Bin Liu et.al.	2302.12532	null
2023-02-16	OPT: One-shot Pose-Controllable Talking Head Generation	Jin Liu et.al.	2302.08197	null
2023-02-14	Expressive Talking Head Video Encoding in StyleGAN2 Latent-Space	Trevine Oorloff et.al.	2203.14512	link
2023-01-31	GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis	Zhenhui Ye et.al.	2301.13430	null
2023-01-23	Data standardization for robust lip sync	Chun Wang et.al.	2202.06198	null
2023-01-20	Neural Volumetric Blendshapes: Computationally Efficient Physics-Based Facial Blendshapes	Nicolas Wagner et.al.	2212.14784	null
2023-01-15	Learning Audio-Driven Viseme Dynamics for 3D Face Animation	Linchao Bao et.al.	2301.06059	null
2022-12-30	Imitator: Personalized Speech-driven 3D Facial Animation	Balamurugan Thambiraja et.al.	2301.00023	null
2022-12-28	All's well that FID's well? Result quality and metric scores in GAN models for lip-sychronization tasks	Carina Geldhauser et.al.	2212.13810	null
2022-12-23	Dubbing in Practice: A Large Scale Study of Human Localization With Insights for Automatic Dubbing	William Brannon et.al.	2212.12137	null
2022-12-09	Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers	Yasheng Sun et.al.	2212.04970	null
2022-12-07	Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors	Zhentao Yu et.al.	2212.04248	null
2022-12-07	SPACE: Speech-driven Portrait Animation with Controllable Expression	Siddharth Gururani et.al.	2211.09809	null
2022-11-30	Extracting Semantic Knowledge from GANs with Unsupervised Learning	Jianjin Xu et.al.	2211.16710	null
2022-11-27	VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild	Kun Cheng et.al.	2211.14758	null
2022-11-26	Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis	Duomin Wang et.al.	2211.14506	link
2022-11-22	Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition	Jiaxiang Tang et.al.	2211.12368	null
2022-11-10	On the role of Lip Articulation in Visual Speech Perception	Zakaria Aldeneh et.al.	2203.10117	null
2022-11-03	SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory	Se Jin Park et.al.	2211.00924	null
2022-10-21	Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection	Alexandros Haliassos et.al.	2201.07131	link
2022-10-13	Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors	Vladimir Iashin et.al.	2210.07055	link
2022-10-13	Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar	Aolan Sun et.al.	2210.06877	null
2022-10-07	Compressing Video Calls using Synthetic Talking Heads	Madhav Agarwal et.al.	2210.03692	null
2022-10-07	A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis	Yichen Han et.al.	2210.03335	null
2022-10-06	Audio-Visual Face Reenactment	Madhav Agarwal et.al.	2210.02755	link
2022-10-06	Finding Directions in GAN's Latent Space for Neural Face Reenactment	Stella Bounareli et.al.	2202.00046	link
2022-10-04	Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at Scale	Aditya Agarwal et.al.	2208.09796	null
2022-09-29	Facial Landmark Predictions with Applications to Metaverse	Qiao Han et.al.	2209.14698	link
2022-09-27	StyleMask: Disentangling the Style Space of StyleGAN2 for Neural Face Reenactment	Stella Bounareli et.al.	2209.13375	link
2022-09-23	EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model	Xinya Ji et.al.	2205.15278	null
2022-09-21	FNeVR: Neural Volume Rendering for Face Animation	Bohan Zeng et.al.	2209.10340	link
2022-09-19	AutoLV: Automatic Lecture Video Generator	Wenbin Wang et.al.	2209.08795	null
2022-09-09	Talking Head from Speech Audio using a Pre-trained Image Generator	Mohammed M. Alghamdi et.al.	2209.04252	null
2022-09-07	Restructurable Activation Networks	Kartikeya Bhardwaj et.al.	2208.08562	link
2022-08-29	StableFace: Analyzing and Improving Motion Stability for Talking Face Generation	Jun Ling et.al.	2208.13717	null
2022-08-17	Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors	Sindhu B Hegde et.al.	2208.08118	link
2022-08-03	Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control	Michail Christos Doukas et.al.	2208.02210	null
2022-08-02	Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer	Ailin Huang et.al.	2206.12837	link
2022-08-01	A Feasibility Study on Image Inpainting for Non-cleft Lip Generation from Patients with Cleft Lip	Shuang Chen et.al.	2208.01149	link
2022-07-27	A Hybrid Deep Animation Codec for Low-bitrate Video Conferencing	Goluck Konuko et.al.	2207.13530	null
2022-07-24	Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis	Shuai Shen et.al.	2207.11770	link
2022-07-22	Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos	Panagiotis P. Filntisis et.al.	2207.11094	link
2022-07-20	NARRATE: A Normal Assisted Free-View Portrait Stylizer	Youjia Wang et.al.	2207.00974	null
2022-07-20	VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection	Joanna Hong et.al.	2206.07458	null
2022-07-20	Responsive Listening Head Generation: A Benchmark Dataset and Baseline	Mohan Zhou et.al.	2112.13548	null
2022-07-13	FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis	Yongqi Wang et.al.	2207.03800	link
2022-06-29	Cut Inner Layers: A Structured Pruning Strategy for Efficient U-Net GANs	Bo-Kyeong Kim et.al.	2206.14658	null
2022-06-09	Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos	Alexander Waibel et.al.	2206.04523	null
2022-05-31	Text/Speech-Driven Full-Body Animation	Wenlin Zhuang et.al.	2205.15573	null
2022-05-27	Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast	Boqing Zhu et.al.	2204.14057	link
2022-05-26	One-Shot Face Reenactment on Megapixels	Wonjun Kang et.al.	2205.13368	null
2022-05-24	Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts	Debjoy Saha et.al.	2205.12194	link
2022-05-20	MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement	Alexander Richard et.al.	2104.08223	link
2022-05-13	Talking Face Generation with Multilingual TTS	Hyoung-Kyu Song et.al.	2205.06421	null
2022-05-02	Emotion-Controllable Generalized Talking Face Generation	Sanjana Sinha et.al.	2205.01155	null
2022-05-02	A Novel Speech-Driven Lip-Sync Model with CNN and LSTM	Xiaohong Li et.al.	2205.00916	null
2022-04-27	Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion	Sen Chen et.al.	2204.12756	null
2022-04-25	Fast Facial Landmark Detection and Applications: A Survey	Kostiantyn Khabarlak et.al.	2101.10808	null
2022-04-13	Dynamic Neural Textures: Generating Talking-Face Videos with Continuously Controllable Expressions	Zipeng Ye et.al.	2204.06180	null
2022-04-06	Transformer-S2A: Robust and Efficient Speech-to-Animation	Liyang Chen et.al.	2111.09771	null
2022-04-03	Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text	Pulkit Tandon et.al.	2106.14014	link
2022-03-30	End to End Lip Synchronization with a Temporal AutoEncoder	Yoav Shalev et.al.	2203.16224	link
2022-03-29	Thin-Plate Spline Motion Model for Image Animation	Jian Zhao et.al.	2203.14367	link
2022-03-17	StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN	Fei Yin et.al.	2203.04036	link
2022-03-17	FaceFormer: Speech-Driven 3D Facial Animation with Transformers	Yingruo Fan et.al.	2112.05329	link
2022-03-16	Efficient conditioned face animation using frontally-viewed embedding	Maxime Oquab et.al.	2203.08765	null
2022-03-15	Depth-Aware Generative Adversarial Network for Talking Head Video Generation	Fa-Ting Hong et.al.	2203.06605	link
2022-03-10	An Audio-Visual Attention Based Multimodal Network for Fake Talking Face Videos Detection	Ganglai Wang et.al.	2203.05178	null
2022-03-08	Attention-Based Lip Audio-Visual Synthesis for Talking Face Generation in the Wild	Ganglai Wang et.al.	2203.03984	null
2022-03-04	Multi-modality Deep Restoration of Extremely Compressed Face Videos	Xi Zhang et.al.	2107.05548	null
2022-03-01	FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset	Hasam Khalid et.al.	2108.05080	link
2022-02-25	FSGANv2: Improved Subject Agnostic Face Swapping and Reenactment	Yuval Nirkin et.al.	2202.12972	null
2022-02-22	Thinking the Fusion Strategy of Multi-reference Face Reenactment	Takuya Yashima et.al.	2202.10758	null
2022-01-24	Selective Listening by Synchronizing Speech with Lips	Zexu Pan et.al.	2106.07150	link
2022-01-22	Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary	Sibo Zhang et.al.	2104.14631	null
2022-01-21	Stitch it in Time: GAN-Based Facial Editing of Real Videos	Rotem Tzaban et.al.	2201.08361	link
2022-01-17	Towards Realistic Visual Dubbing with Heterogeneous Sources	Tianyi Xie et.al.	2201.06260	null
2022-01-16	Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels	Zipeng Ye et.al.	2201.05986	null
2022-01-03	DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering	Shunyu Yao et.al.	2201.00791	null
2021-12-20	Parallel and High-Fidelity Text-to-Lip Generation	Jinglin Liu et.al.	2107.06831	link
2021-12-19	Initiative Defense against Facial Manipulation	Qidong Huang et.al.	2112.10098	link
2021-12-07	Joint Audio-Text Model for Expressive Speech-Driven 3D Facial Animation	Yingruo Fan et.al.	2112.02214	null
2021-12-06	One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning	Suzhen Wang et.al.	2112.02749	null
2021-11-29	Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates	Shenhan Qian et.al.	2108.08020	link
2021-11-04	FEAFA+: An Extended Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation	Wei Gan et.al.	2111.02751	null
2021-11-02	BiosecurID: a multimodal biometric database	Julian Fierrez et.al.	2111.03472	null
2021-10-30	Imitating Arbitrary Talking Style for Realistic Audio-DrivenTalking Face Synthesis	Haozhe Wu et.al.	2111.00203	link
2021-10-26	Emotion recognition in talking-face videos using persistent entropy and neural networks	Eduardo Paluzo-Hidalgo et.al.	2110.13571	link
2021-10-26	ViDA-MAN: Visual Dialog with Digital Humans	Tong Shen et.al.	2110.13384	null
2021-10-22	Invertible Frowns: Video-to-Video Facial Emotion Translation	Ian Magnusson et.al.	2109.08061	null
2021-10-19	Talking Head Generation with Audio and Speech Related Facial Action Units	Sen Chen et.al.	2110.09951	null
2021-10-16	Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor	Anchit Gupta et.al.	2110.08580	null
2021-10-12	Fine-grained Identity Preserving Landmark Synthesis for Face Reenactment	Haichao Zhang et.al.	2110.04708	null
2021-10-07	Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution	Yangyang Shi et.al.	2110.05241	null
2021-09-24	Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation	Yuanxun Lu et.al.	2109.10595	null
2021-09-20	Accurate, Interpretable, and Fast Animation: An Iterative, Sparse, and Nonconvex Approach	Stevo Rackovic et.al.	2109.08356	null
2021-09-17	Detection of GAN-synthesized street videos	Omran Alamayreh et.al.	2109.04991	null
2021-08-30	Audiovisual Speech Synthesis using Tacotron2	Ahmed Hussen Abdelaziz et.al.	2008.00620	null
2021-08-23	KoDF: A Large-scale Korean DeepFake Detection Dataset	Patrick Kwon et.al.	2103.10094	null
2021-08-23	HeadGAN: One-shot Neural Head Synthesis and Editing	Michail Christos Doukas et.al.	2012.08261	null
2021-08-19	AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis	Yudong Guo et.al.	2103.11078	link
2021-08-18	DeepFake MNIST+: A DeepFake Facial Animation Dataset	Jiajun Huang et.al.	2108.07949	link
2021-08-18	FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning	Chenxu Zhang et.al.	2108.07938	link
2021-08-12	UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing	Meng Cao et.al.	2108.05650	null
2021-08-11	AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person	Xinsheng Wang et.al.	2108.04325	null
2021-08-06	SofGAN: A Portrait Image Generator with Dynamic Styling	Anpei Chen et.al.	2007.03780	link
2021-07-27	Beyond Voice Identity Conversion: Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations	Laurent Benaroya et.al.	2107.12346	null
2021-07-21	Speech Driven Talking Face Generation from a Single Image and an Emotion Condition	Sefik Emre Eskimez et.al.	2008.03592	link
2021-07-20	Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion	Suzhen Wang et.al.	2107.09293	link
2021-07-10	Speech2Video: Cross-Modal Distillation for Speech to Video Generation	Shijing Si et.al.	2107.04806	null
2021-07-07	Egocentric Videoconferencing	Mohamed Elgharib et.al.	2107.03109	null
2021-06-08	LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization	Avisek Lahiri et.al.	2106.04185	null
2021-05-20	Audio-Driven Emotional Video Portraits	Xinya Ji et.al.	2104.07452	null
2021-05-07	Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation	Lincheng Li et.al.	2104.07995	link
2021-05-05	A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors	Ruobing Zheng et.al.	2002.08700	null
2021-04-29	Learned Spatial Representations for Few-shot Talking-Head Synthesis	Moustafa Meshry et.al.	2104.14557	null
2021-04-26	One-shot Face Reenactment Using Appearance Adaptive Normalization	Guangming Yao et.al.	2102.03984	null
2021-04-25	3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head	Qianyun Wang et.al.	2104.12051	null
2021-04-22	Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation	Hang Zhou et.al.	2104.11116	link
2021-04-07	Single Source One Shot Reenactment using Weighted motion From Paired Feature Points	Soumya Tripathy et.al.	2104.03117	null
2021-04-07	Everything's Talkin': Pareidolia Face Reenactment	Linsen Song et.al.	2104.03061	link
2021-04-07	LI-Net: Large-Pose Identity-Preserving Face Reenactment Network	Jin Liu et.al.	2104.02850	null
2021-04-02	One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing	Ting-Chun Wang et.al.	2011.15126	null
2021-03-20	Not made for each other- Audio-Visual Dissonance-based Deepfake Detection and Localization	Komal Chugh et.al.	2005.14405	link
2021-03-19	End-to-End Lip Synchronisation Based on Pattern Classification	You Jin Kim et.al.	2005.08606	null
2021-03-05	Real-time RGBD-based Extended Body Pose Estimation	Renat Bashirov et.al.	2103.03663	link
2021-03-03	Estimating Uniqueness of I-Vector Representation of Human Voice	Erkam Sinan Tandogan et.al.	2008.11985	null
2021-02-25	MakeItTalk: Speaker-Aware Talking-Head Animation	Yang Zhou et.al.	2004.12992	null
2021-02-19	One Shot Audio to Animated Video Generation	Neeraj Kumar et.al.	2102.09737	null
2021-02-18	AudioVisual Speech Synthesis: A brief literature review	Efthymios Georgiou et.al.	2103.03927	null
2020-12-14	Robust One Shot Audio to Video Generation	Neeraj Kumar et.al.	2012.07842	null
2020-12-14	Multi Modal Adaptive Normalization for Audio to Video Generation	Neeraj Kumar et.al.	2012.07304	null
2020-11-30	Adaptive Compact Attention For Few-shot Video-to-video Translation	Risheng Huang et.al.	2011.14695	null
2020-11-21	Stochastic Talking Face Generation Using Latent Distribution Matching	Ravindra Yadav et.al.	2011.10727	link
2020-11-21	Iterative Text-based Editing of Talking-heads Using Neural Retargeting	Xinwei Yao et.al.	2011.10688	null
2020-11-09	FACEGAN: Facial Attribute Controllable rEenactment GAN	Soumya Tripathy et.al.	2011.04439	null
2020-11-06	Large-scale multilingual audio visual dubbing	Yi Yang et.al.	2011.03530	null
2020-11-02	Facial Keypoint Sequence Generation from Audio	Prateek Manocha et.al.	2011.01114	null
2020-10-25	APB2FaceV2: Real-Time Audio-Guided Multi-Face Reenactment	Jiangning Zhang et.al.	2010.13017	link
2020-10-12	Intuitive Facial Animation Editing Based On A Generative RNN Framework	Eloïse Berson et.al.	2010.05655	null
2020-10-05	SMILE: Semantically-guided Multi-attribute Image and Layout Editing	Andrés Romero et.al.	2010.02315	link
2020-10-05	Dynamic Facial Asset and Rig Generation from a Single Scan	Jiaman Li et.al.	2010.00560	null
2020-09-20	An Improved Approach of Intention Discovery with Machine Learning for POMDP-based Dialogue Management	Ruturaj Raval et.al.	2009.09354	null
2020-09-18	Mesh Guided One-shot Face Reenactment using Graph Convolutional Networks	Guangming Yao et.al.	2008.07783	null
2020-09-12	DualLip: A System for Joint Lip Reading and Generation	Weicong Chen et.al.	2009.05784	null
2020-09-02	Seeing wake words: Audio-visual Keyword Spotting	Liliane Momeni et.al.	2009.01225	null
2020-08-29	"It took me almost 30 minutes to practice this". Performance and Production Practices in Dance Challenge Videos on TikTok	Daniel Klug et.al.	2008.13040	null
2020-08-23	A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild	K R Prajwal et.al.	2008.10010	link
2020-08-11	Audio- and Gaze-driven Facial Animation of Codec Avatars	Alexander Richard et.al.	2008.05023	null
2020-08-04	Speaker dependent acoustic-to-articulatory inversion using real-time MRI of the vocal tract	Tamás Gábor Csapó et.al.	2008.02098	link
2020-08-04	Real-Time Cleaning and Refinement of Facial Animation Signals	Eloïse Berson et.al.	2008.01332	null
2020-08-02	Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos	Yanhui Guo et.al.	2008.01652	null
2020-07-29	Neural Voice Puppetry: Audio-driven Facial Reenactment	Justus Thies et.al.	1912.05566	link
2020-07-20	Deformable Style Transfer	Sunnie S. Y. Kim et.al.	2003.11038	link
2020-07-18	A Robust Interactive Facial Animation Editing System	Eloïse Berson et.al.	2007.09367	null
2020-07-16	Talking-head Generation with Rhythmic Head Motion	Lele Chen et.al.	2007.08547	link
2020-07-08	Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision	Abhinav Shukla et.al.	2007.04134	null
2020-06-20	Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams	Huirong Huang et.al.	2006.11610	null
2020-05-27	Modality Dropout for Improved Performance-driven Talking Faces	Ahmed Hussen Abdelaziz et.al.	2005.13616	null
2020-05-25	Identity-Preserving Realistic Talking Face Generation	Sanjana Sinha et.al.	2005.12318	null
2020-05-22	Head2Head: Video-based Neural Head Synthesis	Mohammad Rami Koujan et.al.	2005.10954	null
2020-05-16	FReeNet: Multi-Identity Face Reenactment	Jiangning Zhang et.al.	1905.11805	null
2020-05-13	FaR-GAN for One-Shot Face Reenactment	Hanxiang Hao et.al.	2005.06402	null
2020-05-13	Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning	Hao Zhu et.al.	1812.06589	null
2020-05-11	Dancing to the Partisan Beat: A First Analysis of Political Communication on TikTok	Juan Carlos Medina Serrano et.al.	2004.05478	link
2020-05-07	What comprises a good talking-head video generation?: A Survey and Benchmark	Lele Chen et.al.	2005.03201	link
2020-05-04	Disentangled Speech Embeddings using Cross-modal Self-supervision	Arsha Nagrani et.al.	2002.08742	null
2020-04-30	APB2Face: Audio-guided face reenactment with auxiliary pose and blink signals	Jiangning Zhang et.al.	2004.14569	null
2020-03-30	ActGAN: Flexible and Efficient One-shot Face Reenactment	Ivan Kosarevych et.al.	2003.13840	null
2020-03-29	Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose	Xianfang Zeng et.al.	2003.12957	null
2020-03-26	High-Accuracy Facial Depth Models derived from 3D Synthetic Data	Faisal Khan et.al.	2003.06211	null
2020-03-05	Talking-Heads Attention	Noam Shazeer et.al.	2003.02436	link
2020-03-05	Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose	Ran Yi et.al.	2002.10137	link
2020-03-01	Towards Automatic Face-to-Face Translation	Prajwal K R et.al.	2003.00418	link
2020-02-19	Speech-driven facial animation using polynomial fusion of features	Triantafyllos Kefalas et.al.	1912.05833	null
2020-01-17	ICface: Interpretable and Controllable Face Reenactment Using GANs	Soumya Tripathy et.al.	1904.01909	null
2019-12-20	Disentangling Style and Content in Anime Illustrations	Sitao Xiang et.al.	1905.10742	null
2019-11-21	FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis	Kuangxiao Gu et.al.	1911.09224	null
2019-11-19	MarioNETte: Few-shot Face Reenactment Preserving Identity of Unseen Targets	Sungjoo Ha et.al.	1911.08139	null
2019-10-28	Few-shot Video-to-Video Synthesis	Ting-Chun Wang et.al.	1910.12713	null
2019-10-19	Real-Time Lip Sync for Live 2D Animation	Deepali Aneja et.al.	1910.08685	link
2019-10-16	Designing Style Matching Conversational Agents	Deepali Aneja et.al.	1910.07514	null
2019-10-15	A High-Fidelity Open Embodied Avatar with Lip Syncing and Expression Capabilities	Deepali Aneja et.al.	1909.08766	link
2019-10-09	EmoCo: Visual Analysis of Emotion Coherence in Presentation Videos	Haipeng Zeng et.al.	1907.12918	null
2019-10-02	Animating Face using Disentangled Audio Representations	Gaurav Mittal et.al.	1910.00726	null
2019-09-25	Few-Shot Adversarial Learning of Realistic Neural Talking Head Models	Egor Zakharov et.al.	1905.08233	null
2019-09-06	Neural Style-Preserving Visual Dubbing	Hyeongwoo Kim et.al.	1909.02518	null
2019-08-29	3D Face Pose and Animation Tracking via Eigen-Decomposition based Bayesian Approach	Ngoc-Trung Tran et.al.	1908.11039	null
2019-08-20	Prosodic Phrase Alignment for Machine Dubbing	Alp Öktem et.al.	1908.07226	link
2019-08-16	FSGAN: Subject Agnostic Face Swapping and Reenactment	Yuval Nirkin et.al.	1908.05932	link
2019-08-11	Emotion Dependent Facial Animation from Affective Speech	Rizwan Sadiq et.al.	1908.03904	null
2019-08-05	One-shot Face Reenactment	Yunxuan Zhang et.al.	1908.03251	link
2019-07-25	Talking Face Generation by Conditional Recurrent Adversarial Network	Yang Song et.al.	1804.04786	link
2019-07-24	Data-Driven Physical Face Inversion	Yeara Kozlov et.al.	1907.10402	null
2019-07-23	A system for efficient 3D printed stop-motion face animation	Rinat Abdrashitov et.al.	1907.10163	null
2019-06-14	Realistic Speech-Driven Facial Animation with GANs	Konstantinos Vougioukas et.al.	1906.06337	null
2019-06-04	Text-based Editing of Talking-head Video	Ohad Fried et.al.	1906.01524	null
2019-05-27	Audio2Face: Generating Speech/Face Animation from Single Audio with Attention-Based Bidirectional LSTM Networks	Guanzhong Tian et.al.	1905.11142	null
2019-05-09	Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss	Lele Chen et.al.	1905.03820	link
2019-05-08	Capture, Learning, and Synthesis of 3D Speaking Styles	Daniel Cudeiro et.al.	1905.03079	link
2019-04-23	Talking Face Generation by Adversarially Disentangled Audio-Visual Representation	Hang Zhou et.al.	1807.07860	null
2019-04-02	FEAFA: A Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation	Yanfu Yan et.al.	1904.01509	null
2019-03-13	Animating an Autonomous 3D Talking Avatar	Dominik Borer et.al.	1903.05448	null
2018-12-22	Deep Audio-Visual Speech Recognition	Triantafyllos Afouras et.al.	1809.02108	null
2018-12-20	DeepFakes: a New Threat to Face Recognition? Assessment and Detection	Pavel Korshunov et.al.	1812.08685	null
2018-11-22	Towards Highly Accurate and Stable Face Alignment for High-Resolution Videos	Ying Tai et.al.	1811.00342	link
2018-11-16	Influence of visual cues on head and eye movements during listening tasks in multi-talker audiovisual environments with animated characters	Maartje M. E. Hendrikse et.al.	1812.02088	null
2018-08-28	GANimation: Anatomically-aware Facial Animation from a Single Image	Albert Pumarola et.al.	1807.09251	link
2018-08-19	Dynamic Temporal Alignment of Speech to Lips	Tavi Halperin et.al.	1808.06250	link
2018-07-29	ReenactGAN: Learning to Reenact Faces via Boundary Transfer	Wayne Wu et.al.	1807.11079	link
2018-07-26	Learnable PINs: Cross-Modal Embeddings for Person Identity	Arsha Nagrani et.al.	1805.00833	null
2018-07-19	End-to-End Speech-Driven Facial Animation with Temporal GANs	Konstantinos Vougioukas et.al.	1805.09313	null
2018-05-29	Deep Video Portraits	Hyeongwoo Kim et.al.	1805.11714	null
2018-05-24	VisemeNet: Audio-Driven Animator-Centric Speech Animation	Yang Zhou et.al.	1805.09488	null
2018-05-21	Anime Style Space Exploration Using Metric Learning and Generative Adversarial Networks	Sitao Xiang et.al.	1805.07997	null
2018-04-23	Generating Talking Face Landmarks from Speech	Sefik Emre Eskimez et.al.	1803.09803	null
2018-03-28	Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network	Hai X. Pham et.al.	1803.07716	null
2018-03-20	Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks	Seyed Ali Jalalifar et.al.	1803.07461	null
2017-12-07	End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech	Hai X. Pham et.al.	1710.00920	null
2017-12-06	ObamaNet: Photo-realistic lip-sync from text	Rithesh Kumar et.al.	1801.01442	null
2017-07-30	Kernel Projection of Latent Structures Regression for Facial Animation Retargeting	Christos Ouzounis et.al.	1707.09629	null
2017-07-26	Fast Deep Matting for Portrait Animation on Mobile Phone	Bingke Zhu et.al.	1707.08289	null
2017-07-21	Multichannel Attention Network for Analyzing Visual Behavior in Public Speaking	Rahul Sharma et.al.	1707.06830	null
2017-07-18	You said that?	Joon Son Chung et.al.	1705.02966	null
2017-01-30	Lip Reading Sentences in the Wild	Joon Son Chung et.al.	1611.05358	link
2016-10-28	Galaxy gas as obscurer: II. Separating the galaxy-scale and nuclear obscurers of Active Galactic Nuclei	Johannes Buchner et.al.	1610.09380	link
2016-07-11	Large-Scale MIMO is Capable of Eliminating Power-Thirsty Channel Coding for Wireless Transmission of HEVC/H.265 Video	Shaoshi Yang et.al.	1601.06684	null
2016-05-22	Improving Facial Analysis and Performance Driven Animation through Disentangling Identity and Expression	David Rim et.al.	1512.08212	null
2016-02-08	Automatic Face Reenactment	Pablo Garrido et.al.	1602.02651	null
2015-11-20	ExpressionBot: An Emotive Lifelike Robotic Face for Face-to-Face Communication	Ali Mollahosseini et.al.	1511.06502	null
2014-09-03	Visual Speech Recognition	Ahmad B. A. Hassanat et.al.	1409.1411	null
2012-09-22	Using multimodal speech production data to evaluate articulatory animation for audiovisual speech synthesis	Ingmar Steiner et.al.	1209.4982	null
2012-03-30	Face Expression Recognition and Analysis: The State of the Art	Vinay Bettadapura et.al.	1203.6722	null
2012-01-19	Progress in animation of an EMA-controlled tongue model for acoustic-visual speech synthesis	Ingmar Steiner et.al.	1201.4080	null
2010-03-01	Re-verification of a Lip Synchronization Protocol using Robust Reachability	Piotr Kordy et.al.	1003.0431	null

(back to top)

Image Animation

Publish Date	Title	Authors	PDF	Code
2025-05-30	MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation	Yanbo Ding et.al.	2505.10238	link
2025-05-29	HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions	Shuolin Xu et.al.	2505.22977	link
2025-05-24	EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation	Qiang Qu et.al.	2503.18552	null
2025-05-18	DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation	Haoyu Zhao et.al.	2503.21246	link
2025-05-13	TT-DF: A Large-Scale Diffusion-Based Dataset and Benchmark for Human Body Forgery Detection	Wenkui Yang et.al.	2505.08437	link
2025-04-28	AnimateAnywhere: Rouse the Background in Human Image Animation	Xiaoyu Liu et.al.	2504.19834	null
2025-04-20	DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance	Yuxuan Luo et.al.	2504.01724	null
2025-04-15	UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer	Xiang Wang et.al.	2504.11289	link
2025-04-15	Taming Consistency Distillation for Accelerated Human Image Animation	Xiang Wang et.al.	2504.11143	null
2025-04-05	Multi-identity Human Image Animation with Structural Video Diffusion	Zhenzhi Wang et.al.	2504.04126	null
2025-04-04	Optimizing 4D Gaussians for Dynamic Scene Video from Single Landscape Images	In-Hwan Jin et.al.	2504.05458	link
2025-04-01	VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer	Xinyu Liu et.al.	2502.05979	null
2025-03-23	MotiF: Making Text Count in Image Animation with Motion Focal Loss	Shijie Wang et.al.	2412.16153	null
2025-03-13	Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer	Jiahao Cui et.al.	2412.00733	link
2025-03-10	Perception-as-Control: Fine-grained Controllable Image Animation with 3D-aware Motion Representation	Yingjie Chen et.al.	2501.05020	null
2025-02-25	DisPose: Disentangling Pose Guidance for Controllable Human Image Animation	Hongxiang Li et.al.	2412.09349	link
2025-02-24	X-Dancer: Expressive Music to Human Dance Video Generation	Zeyuan Chen et.al.	2502.17414	null
2025-02-15	SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers	Di Qiu et.al.	2502.10841	link
2025-02-10	Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance	Li Hu et.al.	2502.06145	null
2025-02-06	MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation	Jinbo Xing et.al.	2502.04299	null
2025-01-30	Every Image Listens, Every Image Dances: Music-Driven Image Animation	Zhikang Dong et.al.	2501.18801	null
2025-01-20	X-Dyna: Expressive Dynamic Human Image Animation	Di Chang et.al.	2501.10021	link
2025-01-15	Joint Learning of Depth and Appearance for Portrait Image Animation	Xinya Ji et.al.	2501.08649	null
2024-12-11	Animate-X: Universal Character Image Animation with Enhanced Motion Representation	Shuai Tan et.al.	2410.10306	null
2024-12-04	FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait	Taekyung Ki et.al.	2412.01064	null
2024-11-30	DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses	Yatian Pang et.al.	2412.00397	null
2024-11-28	JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation	Xuyang Cao et.al.	2411.09209	link
2024-11-27	StableAnimator: High-Quality Identity-Preserving Human Image Animation	Shuyuan Tu et.al.	2411.17697	link
2024-11-24	LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis	Haojie Zhang et.al.	2411.16748	null
2024-11-21	HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation	Zhenzhi Wang et.al.	2407.17438	link
2024-10-31	TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation	Sunjae Yoon et.al.	2410.24037	null
2024-10-20	FrameBridge: Improving Image-to-Video Generation with Bridge Models	Yuji Wang et.al.	2410.15371	null
2024-10-14	Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation	Jiahao Cui et.al.	2410.07718	link
2024-09-30	Illustrious: an Open Advanced Illustration Model	Sang Hyun Park et.al.	2409.19946	null
2024-09-29	High Quality Human Image Animation using Regional Supervision and Motion Blur Condition	Zhongcong Xu et.al.	2409.19580	null
2024-09-22	Dormant: Defending against Pose-driven Human Image Animation	Jiachen Zhou et.al.	2409.14424	link
2024-07-23	Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models	Xin Ma et.al.	2407.15642	link
2024-07-12	TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models	Jeongho Kim et.al.	2407.09012	null
2024-07-12	EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions	Zhiyuan Chen et.al.	2407.08136	link
2024-07-11	MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model	Muyao Niu et.al.	2405.20222	link
2024-06-16	Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation	Mingwang Xu et.al.	2406.08801	null
2024-06-13	Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control	Jingyun Xue et.al.	2406.03035	null
2024-06-03	UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation	Xiang Wang et.al.	2406.01188	null
2024-06-01	Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance	Shenhao Zhu et.al.	2403.14781	link
2024-05-29	Evaluating the efectiveness of sonifcation in science education using Edukoi	Lucrezia Guiotto Nai Fovino et.al.	2405.18908	null
2024-05-28	VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation	Qilin Wang et.al.	2405.18156	null
2024-05-28	Controllable Longer Image Animation with Diffusion Models	Qiang Wang et.al.	2405.17306	null
2024-03-25	PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models	Yiming Zhang et.al.	2312.13964	link
2024-03-13	Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts	Yue Ma et.al.	2403.08268	link
2024-03-08	Audio-Synchronized Visual Animation	Lin Zhang et.al.	2403.05659	link
2024-03-05	Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation	Weijie Li et.al.	2403.02827	null
2024-01-17	Continuous Piecewise-Affine Based Motion Model for Image Animation	Hexiang Wang et.al.	2401.09146	link
2024-01-03	Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions	David Junhao Zhang et.al.	2401.01827	link
2023-12-06	AnimateZero: Video Diffusion Models are Zero-Shot Image Animators	Jiwen Yu et.al.	2312.03793	link
2023-12-05	LivePhoto: Real Image Animation with Text-guided Motion Control	Xi Chen et.al.	2312.02928	null
2023-12-04	AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance	Zuozhuo Dai et.al.	2311.12886	link
2023-11-30	Motion-Conditioned Image Animation for Video Editing	Wilson Yan et.al.	2311.18827	null
2023-11-27	MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model	Zhongcong Xu et.al.	2311.16498	null
2023-11-27	DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors	Jinbo Xing et.al.	2310.12190	link
2023-11-19	Differential Motion Evolution for Fine-Grained Motion Deformation in Unsupervised Image Animation	Peirong Liu et.al.	2110.04658	null
2023-10-16	LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation	Ruiqi Wu et.al.	2310.10769	link
2023-10-11	LEO: Generative Latent Image Animator for Human Video Synthesis	Yaohui Wang et.al.	2305.03989	link
2023-09-26	Text-Guided Synthesis of Eulerian Cinemagraphs	Aniruddha Mahapatra et.al.	2307.03190	link
2023-09-25	Automatic Animation of Hair Blowing in Still Portrait Photos	Wenpeng Xiao et.al.	2309.14207	null
2023-07-10	AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning	Yuwei Guo et.al.	2307.04725	link
2023-07-09	Predictive Coding For Animation-Based Video Compression	Goluck Konuko et.al.	2307.04187	null
2023-04-12	VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs	Moayed Haji Ali et.al.	2304.06020	null
2023-03-10	3D Cinemagraphy from a Single Image	Xingyi Li et.al.	2303.05724	null
2023-02-02	Dreamix: Video Diffusion Models are General Video Editors	Eyal Molad et.al.	2302.01329	null
2023-01-14	Continuous odor profile monitoring to study olfactory navigation in small animals	Kevin S. Chen et.al.	2301.05905	null
2022-11-30	NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-shot Real Image Animation	Yu Yin et.al.	2211.17235	null
2022-10-04	Implicit Warping for Animation with Image Sets	Arun Mallya et.al.	2210.01794	null
2022-09-28	Motion Transformer for Unsupervised Image Animation	Jiale Tao et.al.	2209.14024	link
2022-07-19	Single Stage Virtual Try-on via Deformable Attention Flows	Shuai Bai et.al.	2207.09161	link
2022-07-08	Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation	Yucheng Suo et.al.	2207.03714	null
2022-06-11	Bayesian Statistics Guided Label Refurbishment Mechanism: Mitigating Label Noise in Medical Image Classification	Mengdi Gao et.al.	2106.12284	link
2022-04-05	Neural Fields in Visual Computing and Beyond	Yiheng Xie et.al.	2111.11426	null
2022-03-29	Thin-Plate Spline Motion Model for Image Animation	Jian Zhao et.al.	2203.14367	link
2022-03-29	Image Animation with Perturbed Masks	Yoav Shalev et.al.	2011.06922	link
2022-03-25	3D GAN Inversion for Controllable Portrait Image Animation	Connor Z. Lin et.al.	2203.13441	null
2022-03-17	Latent Image Animator: Learning to Animate Images via Latent Space Navigation	Yaohui Wang et.al.	2203.09043	null
2021-12-21	Image Animation with Keypoint Mask	Or Toledano et.al.	2112.10457	link
2021-12-19	Move As You Like: Image Animation in E-Commerce Scenario	Borun Xu et.al.	2112.13647	null
2021-12-17	AI-Empowered Persuasive Video Generation: A Survey	Chang Liu et.al.	2112.09401	null
2021-10-26	Incremental Learning for Animal Pose Estimation using RBF k-DPP	Gaurav Kumar Nayak et.al.	2110.13598	null
2021-09-03	Sparse to Dense Motion Transfer for Face Image Animation	Ruiqi Zhao et.al.	2109.00471	null
2021-08-18	DeepFake MNIST+: A DeepFake Facial Animation Dataset	Jiajun Huang et.al.	2108.07949	link
2021-06-23	Analisis Kualitas Layanan Website E-Commerce Bukalapak Terhadap Kepuasan Pengguna Mahasiswa Universitas Bina Darma Menggunakan Metode Webqual 4.0	Adellia et.al.	2106.15342	null
2021-04-07	Single Source One Shot Reenactment using Weighted motion From Paired Feature Points	Soumya Tripathy et.al.	2104.03117	null
2021-03-22	PriorityCut: Occlusion-guided Regularization for Warp-based Image Animation	Wai Ting Cheung et.al.	2103.11600	null
2020-12-01	Ultra-low bitrate video conferencing using deep image animation	Goluck Konuko et.al.	2012.00346	null
2020-10-01	First Order Motion Model for Image Animation	Aliaksandr Siarohin et.al.	2003.00196	link
2020-08-27	Deep Spatial Transformation for Pose-Guided Person Image Generation and Animation	Yurui Ren et.al.	2008.12606	link
2019-08-30	Animating Arbitrary Objects via Deep Motion Transfer	Aliaksandr Siarohin et.al.	1812.08861	link
2018-10-09	3D model silhouette-based tracking in depth images for puppet suit dynamic video-mapping	Guillaume Caron et.al.	1810.03956	null
2018-06-24	A Design of FPGA Based Small Animal PET Real Time Digital Signal Processing and Correction Logic	Jiaming Lu et.al.	1806.09117	null
2018-01-31	RAPTOR I: Time-dependent radiative transfer in arbitrary spacetimes	Thomas Bronzwaer et.al.	1801.10452	null
2016-06-23	Gender and Interest Targeting for Sponsored Post Advertising at Tumblr	Mihajlo Grbovic et.al.	1606.07189	null
2015-03-16	Use of Effective Audio in E-learning Courseware	Kisor Ray et.al.	1503.04837	null
2015-02-04	Multimedia-Video for Learning	Kah Hean Chua et.al.	1502.01090	null
2013-01-25	Measurements of Martian Dust Devil Winds with HiRISE	David S. Choi et.al.	1301.6130	null
2010-01-04	Tutoring System for Dance Learning	Rajkumar Kannan et.al.	1001.0440	null

(back to top)

Video Generation

Publish Date	Title	Authors	PDF	Code
2025-06-25	Video Perception Models for 3D Scene Synthesis	Rui Huang et.al.	2506.20601	null
2025-06-25	BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos	Jiahao Lin et.al.	2506.20103	null
2025-06-24	Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation	Xingyang Li et.al.	2506.19852	null
2025-06-24	GenHSI: Controllable Generation of Human-Scene Interaction Videos	Zekun Li et.al.	2506.19840	null
2025-06-24	SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution	Liangbin Xie et.al.	2506.19838	null
2025-06-24	Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Router	Yubo Huang et.al.	2506.19833	null
2025-06-24	Training-Free Motion Customization for Distilled Video Generators with Adaptive Test-Time Distillation	Jintao Rong et.al.	2506.19348	null
2025-06-23	VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory	Runjia Li et.al.	2506.18903	null
2025-06-23	From Virtual Games to Real-World Play	Wenqiang Sun et.al.	2506.18901	null
2025-06-23	FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation	Kaiyi Huang et.al.	2506.18899	null
2025-06-23	MinD: Unified Visual Imagination and Control via Hierarchical World Models	Xiaowei Chi et.al.	2506.18897	null
2025-06-23	OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation	Qijun Gan et.al.	2506.18866	null
2025-06-23	Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset	Zhuowei Chen et.al.	2506.18851	null
2025-06-23	Matrix-Game: Interactive World Foundation Model	Yifan Zhang et.al.	2506.18701	null
2025-06-23	RDPO: Real Data Preference Optimization for Physics Consistency Video Generation	Wenxu Qian et.al.	2506.18655	null
2025-06-23	BulletGen: Improving 4D Reconstruction with Bullet-Time Generation	Denys Rozumnyi et.al.	2506.18601	null
2025-06-23	VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning	Xuanyu Zhang et.al.	2506.18564	null
2025-06-23	Emergent Temporal Correspondences from Video Diffusion Transformers	Jisu Nam et.al.	2506.17220	link
2025-06-21	STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation	Jiamin Wang et.al.	2506.13138	null
2025-06-20	Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition	Jiaqi Li et.al.	2506.17201	null
2025-06-20	Seeing What Matters: Generalizable AI-generated Video Detection with Forensic-Oriented Augmentation	Riccardo Corvi et.al.	2506.16802	null
2025-06-20	Sekai: A Video Dataset towards World Exploration	Zhen Li et.al.	2506.15675	null
2025-06-20	Show-o2: Improved Native Unified Multimodal Models	Jinheng Xie et.al.	2506.15564	link
2025-06-19	VideoGAN-based Trajectory Proposal for Automated Vehicles	Annajoyce Mariani et.al.	2506.16209	link
2025-06-19	FastInit: Fast Noise Initialization for Temporally Consistent Video Generation	Chengyu Bai et.al.	2506.16119	null
2025-06-19	PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models	Tianchen Zhao et.al.	2506.16054	null
2025-06-19	Advanced Sign Language Video Generation with Compressed and Quantized Multi-Condition Tokenization	Cong Wang et.al.	2506.15980	link
2025-06-18	VideoMAR: Autoregressive Video Generatio with Continuous Tokens	Hu Yu et.al.	2506.14168	null
2025-06-18	Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models	Xuanchi Ren et.al.	2506.09042	link
2025-06-17	Causally Steered Diffusion for Automated Video Counterfactual Generation	Nikos Spyrou et.al.	2506.14404	link
2025-06-17	CausalDiffTab: Mixed-Type Causal-Aware Diffusion for Tabular Data Generation	Jia-Chen Zhang et.al.	2506.14206	null
2025-06-16	EchoShot: Multi-Shot Portrait Video Generation	Jiahao Wang et.al.	2506.15838	null
2025-06-16	UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions	Zhucun Xue et.al.	2506.13691	null
2025-06-15	iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer	Zhelun Shen et.al.	2506.12847	null
2025-06-13	SignAligner: Harmonizing Complementary Pose Modalities for Coherent Sign Language Generation	Xu Wang et.al.	2506.11621	null
2025-06-12	GenWorld: Towards Detecting AI-generated Real-world Simulation Videos	Weiliang Chen et.al.	2506.10975	null
2025-06-12	M4V: Multi-Modal Mamba for Text-to-Video Generation	Jiancheng Huang et.al.	2506.10915	null
2025-06-12	GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning	Xiaoyi Bao et.al.	2506.10639	null
2025-06-12	DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers	Lizhen Wang et.al.	2506.10568	null
2025-06-12	AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation	Haoyuan Shi et.al.	2506.10540	null
2025-06-11	AlignHuman: Improving Motion and Fidelity via Timestep-Segment Preference Optimization for Audio-Driven Human Animation	Chao Liang et.al.	2506.11144	null
2025-06-11	PlayerOne: Egocentric World Simulator	Yuanpeng Tu et.al.	2506.09995	null
2025-06-11	InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions	Zhenzhi Wang et.al.	2506.09984	null
2025-06-11	ReSim: Reliable World Simulation for Autonomous Driving	Jiazhi Yang et.al.	2506.09981	null
2025-06-11	DGAE: Diffusion-Guided Autoencoder for Efficient Latent Representation Learning	Dongxu Liu et.al.	2506.09644	null
2025-06-11	Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation	Shanchuan Lin et.al.	2506.09350	null
2025-06-10	Seedance 1.0: Exploring the Boundaries of Video Generation Models	Yu Gao et.al.	2506.09113	null
2025-06-10	FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation	Zheqi He et.al.	2506.09081	null
2025-06-10	VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks	Xinlong Chen et.al.	2506.09079	null
2025-06-10	MagCache: Fast Video Generation with Magnitude-Aware Cache	Zehong Ma et.al.	2506.09045	link
2025-06-10	Product of Experts for Visual Generation	Yunzhi Zhang et.al.	2506.08894	null
2025-06-10	HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation	Ziyao Huang et.al.	2506.08797	null
2025-06-10	RoboSwap: A GAN-driven Video Diffusion Framework For Unsupervised Robot Arm Swapping	Yang Bai et.al.	2506.08632	null
2025-06-10	How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models	Huixuan Zhang et.al.	2506.08351	null
2025-06-10	From Generation to Generalization: Emergent Few-Shot Learning in Video Diffusion Models	Pablo Acuaviva et.al.	2506.07280	null
2025-06-09	Seeing Voices: Generating A-Roll Video from Audio with Mirage	Aditi Sundararaman et.al.	2506.08279	null
2025-06-09	Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion	Xun Huang et.al.	2506.08009	null
2025-06-09	Dreamland: Controllable World Creation with Simulator and Generative Models	Sicheng Mo et.al.	2506.08006	null
2025-06-09	Audio-Sync Video Generation with Multi-Stream Temporal Control	Shuchen Weng et.al.	2506.08003	null
2025-06-09	Generative Modeling of Weights: Generalization or Memorization?	Boya Zeng et.al.	2506.07998	link
2025-06-09	Video Unlearning via Low-Rank Refusal Vector	Simone Facchiano et.al.	2506.07891	null
2025-06-09	EgoM2P: Egocentric Multimodal Multitask Pretraining	Gen Li et.al.	2506.07886	null
2025-06-09	PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement	Teng Hu et.al.	2506.07848	null
2025-06-09	Consistent Video Editing as Flow-Driven Image-to-Video Generation	Ge Wang et.al.	2506.07713	null
2025-06-09	Evaluating Robustness in Latent Diffusion Models via Embedding Level Augmentation	Boris Martirosyan et.al.	2506.07706	null
2025-06-09	Astraea: A GPU-Oriented Token-wise Acceleration Framework for Video Diffusion Transformers	Haosong Liu et.al.	2506.05096	null
2025-06-08	TV-LiVE: Training-Free, Text-Guided Video Editing via Layer Informed Vitality Exploitation	Min-Jung Kim et.al.	2506.07205	null
2025-06-08	Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models	Sangwon Jang et.al.	2506.07177	null
2025-06-08	Hi-VAE: Efficient Video Autoencoding with Global and Detailed Motion	Huaize Liu et.al.	2506.07136	null
2025-06-07	Self-Adapting Improvement Loops for Robotic Learning	Calvin Luo et.al.	2506.06658	null
2025-06-06	Restereo: Diffusion stereo video generation and restoration	Xingchang Huang et.al.	2506.06023	null
2025-06-06	LLIA -- Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models	Haojie Yu et.al.	2506.05806	null
2025-06-06	FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion	Akide Liu et.al.	2506.04648	null
2025-06-05	EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh	Tao Hu et.al.	2506.05554	null
2025-06-05	ContentV: Efficient Training of Video Generation Models with Limited Compute	Wenfeng Lin et.al.	2506.05343	null
2025-06-05	FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation	Huihan Wang et.al.	2506.04956	null
2025-06-05	DualX-VSR: Dual Axial Spatial $\times$ Temporal Transformer for Real-World Video Super-Resolution without Motion Compensation	Shuo Cao et.al.	2506.04830	null
2025-06-05	Follow-Your-Creation: Empowering 4D Creation through Video Inpainting	Yue Ma et.al.	2506.04590	null
2025-06-05	FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers	Xuanhua He et.al.	2506.04213	null
2025-06-05	SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios	Lingwei Dang et.al.	2506.02444	link
2025-06-04	LayerFlow: A Unified Model for Layer-aware Video Generation	Sihui Ji et.al.	2506.04228	null
2025-06-04	UNIC: Unified In-Context Video Editing	Zixuan Ye et.al.	2506.04216	null
2025-06-04	DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models	Ziyi Wu et.al.	2506.03517	null
2025-06-03	Chipmunk: Training-Free Acceleration of Diffusion Transformers with Dynamic Column-Sparse Deltas	Austin Silveria et.al.	2506.03275	null
2025-06-03	IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation	Yuanze Lin et.al.	2506.03150	null
2025-06-03	Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval	Jiwen Yu et.al.	2506.03141	null
2025-06-03	CamCloneMaster: Enabling Reference-based Camera Control for Video Generation	Yawen Luo et.al.	2506.03140	null
2025-06-03	AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation	Lu Qiu et.al.	2506.03126	null
2025-06-03	DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation	Zhengyao Lv et.al.	2506.03123	null
2025-06-03	TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models	Chetwin Low et.al.	2506.03099	null
2025-06-03	SG2VID: Scene Graphs Enable Fine-Grained Control for Video Synthesis	Ssharvien Kumar Sivakumar et.al.	2506.03082	null
2025-06-03	ORV: 4D Occupancy-centric Robot Video Generation	Xiuyu Yang et.al.	2506.03079	link
2025-06-03	Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers	Pengtao Chen et.al.	2506.03065	null
2025-06-03	LinkTo-Anime: A 2D Animation Optical Flow Dataset from 3D Model Rendering	Xiaoyi Feng et.al.	2506.02733	null
2025-06-03	LumosFlow: Motion-Guided Long Video Generation	Jiahao Chen et.al.	2506.02497	null
2025-06-02	Motion aware video generative model	Bowen Xue et.al.	2506.02244	null
2025-06-02	Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control	Xiao Fu et.al.	2506.01943	null
2025-06-02	OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation	Sen Liang et.al.	2506.01801	null
2025-06-02	Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks	Tao Yang et.al.	2506.01758	null
2025-06-02	Respond Beyond Language: A Benchmark for Video Generation in Response to Realistic User Intents	Shuting Wang et.al.	2506.01689	null
2025-06-02	LongDWM: Cross-Granularity Distillation for Building a Long-Term Driving World Model	Xiaodong Wang et.al.	2506.01546	null
2025-06-02	Towards Scalable Video Anomaly Retrieval: A Synthetic Video-Text Benchmark	Shuyu Yang et.al.	2506.01466	null
2025-06-02	DiffuseSlide: Training-Free High Frame Rate Video Generation Diffusion	Geunmin Hwang et.al.	2506.01454	null
2025-05-30	MiniMax-Remover: Taming Bad Noise Helps Video Object Removal	Bojia Zi et.al.	2505.24873	null
2025-05-30	DreamDance: Animating Character Art via Inpainting Stable Gaussian Worlds	Jiaxu Zhang et.al.	2505.24733	null
2025-05-30	UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation	Yang-Tian Sun et.al.	2505.24521	null
2025-05-30	Interactive Video Generation via Domain Adaptation	Ishaan Rawal et.al.	2505.24253	null
2025-05-30	STORK: Improving the Fidelity of Mid-NFE Sampling for Diffusion and Flow Matching Models	Zheng Tan et.al.	2505.24210	link
2025-05-29	MAGREF: Masked Guidance for Any-Reference Video Generation	Yufan Deng et.al.	2505.23742	link
2025-05-29	VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos	Tingyu Song et.al.	2505.23693	link
2025-05-29	VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models	Xiangdong Zhang et.al.	2505.23656	link
2025-05-29	VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation	Shi-Xue Zhang et.al.	2505.23484	link
2025-05-29	Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis	Hengyuan Cao et.al.	2505.23325	null
2025-05-29	RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer	Liu Liu et.al.	2505.23171	null
2025-05-29	Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing	Tongtong Su et.al.	2505.23134	link
2025-05-29	MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation	Siyuan Wang et.al.	2505.23120	link
2025-05-29	GeoMan: Temporally Consistent Human Geometry Estimation using Image-to-Video Diffusion	Gwanghyun Kim et.al.	2505.23085	null
2025-05-29	MOVi: Training-free Text-conditioned Multi-Object Video Generation	Aimon Rahman et.al.	2505.22980	null
2025-05-29	HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions	Shuolin Xu et.al.	2505.22977	link
2025-05-29	Minute-Long Videos with Dual Parallelisms	Zeqing Wang et.al.	2505.21070	link
2025-05-28	ATI: Any Trajectory Instruction for Controllable Video Generation	Angtian Wang et.al.	2505.22944	null
2025-05-28	Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation	Zhe Kong et.al.	2505.22647	link
2025-05-28	Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers	Weilun Feng et.al.	2505.22167	null
2025-05-28	FaceEditTalker: Interactive Talking Head Generation with Facial Attribute Editing	Guanwen Feng et.al.	2505.22141	null
2025-05-28	LatentMove: Towards Complex Human Movement Video Generation	Ashkan Taghipour et.al.	2505.22046	null
2025-05-28	PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms	Yifei Xia et.al.	2505.22016	null
2025-05-28	Learning World Models for Interactive Video Generation	Taiye Chen et.al.	2505.21996	null
2025-05-28	SageAttention2++: A More Efficient Implementation of SageAttention2	Jintao Zhang et.al.	2505.21136	link
2025-05-28	OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation	Shenghai Yuan et.al.	2505.20292	link
2025-05-27	HDRSDR-VQA: A Subjective Video Quality Dataset for HDR and SDR Comparative Evaluation	Bowen Chen et.al.	2505.21831	null
2025-05-27	Think Before You Diffuse: LLMs-Guided Physics-Aware Video Generation	Ke Zhang et.al.	2505.21653	null
2025-05-27	VideoMarkBench: Benchmarking Robustness of Video Watermarking	Zhengyuan Jiang et.al.	2505.21620	link
2025-05-27	Frame In-N-Out: Unbounded Controllable Image-to-Video Generation	Boyang Wang et.al.	2505.21491	null
2025-05-27	Dynamic Vision from EEG Brain Recordings: How much does EEG know?	Prajwal Singh et.al.	2505.21385	null
2025-05-27	RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy	Aiyue Chen et.al.	2505.21036	null
2025-05-27	Frame-Level Captions for Long Video Generation with Complex Multi Scenes	Guangcong Zheng et.al.	2505.20827	null
2025-05-27	Learning Generalizable Robot Policy with Human Demonstration Video as a Prompt	Xiang Zhu et.al.	2505.20795	null
2025-05-27	Photography Perspective Composition: Towards Aesthetic Perspective Recommendation	Lujian Yao et.al.	2505.20655	null
2025-05-27	Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training	Bolin Lai et.al.	2505.20629	null
2025-05-27	Dynamic-I2V: Exploring Image-to-Video Generation Models via Multimodal LLM	Peng Liu et.al.	2505.19901	null
2025-05-26	MotionPro: A Precise Motion Controller for Image-to-Video Generation	Zhongwei Zhang et.al.	2505.20287	null
2025-05-26	DriveCamSim: Generalizable Camera Simulation via Explicit Camera Modeling for Autonomous Driving	Wenchao Sun et.al.	2505.19692	link
2025-05-26	TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs	Juntong Wang et.al.	2505.19535	null
2025-05-26	The Role of Video Generation in Enhancing Data-Limited Action Understanding	Wei Li et.al.	2505.19495	null
2025-05-26	Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals	Nate Gillman et.al.	2505.19386	null
2025-05-25	From Single Images to Motion Policies via Video-Generation Environment Representations	Weiming Zhi et.al.	2505.19306	null
2025-05-25	SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation	Shenggan Cheng et.al.	2505.19151	null
2025-05-25	WorldEval: World Model as Real-World Robot Policies Evaluator	Yaxuan Li et.al.	2505.19017	null
2025-05-25	Geometry-guided Online 3D Video Synthesis with Multi-View Temporal Consistency	Hyunho Ha et.al.	2505.18932	null
2025-05-25	Interspatial Attention for Efficient 4D Human Video Generation	Ruizhi Shao et.al.	2505.15800	null
2025-05-24	Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation	Shuo Yang et.al.	2505.18875	null
2025-05-24	VORTA: Efficient Video Diffusion via Routing Sparse Attention	Wenhao Sun et.al.	2505.18809	link
2025-05-24	DVD-Quant: Data-free Video Diffusion Transformers Quantization	Zhiteng Li et.al.	2505.18663	link
2025-05-24	ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos	Xiaodong Wang et.al.	2505.18650	null
2025-05-23	WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions	Zizhang Li et.al.	2505.18151	null
2025-05-23	DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation	Junhao Chen et.al.	2505.18078	null
2025-05-23	SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain	Jiawei Zhou et.al.	2505.17727	null
2025-05-23	Scaling Image and Video Generation via Test-Time Evolutionary Search	Haoran He et.al.	2505.17618	null
2025-05-23	InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO	Xueji Fang et.al.	2505.17574	link
2025-05-23	Challenger: Affordable Adversarial Driving Video Generation	Zhiyuan Xu et.al.	2505.15880	null
2025-05-22	Temporal Differential Fields for 4D Motion Modeling via Image-to-Video Synthesis	Xin You et.al.	2505.17333	null
2025-05-22	Training-Free Efficient Video Generation via Dynamic Token Carving	Yuechen Zhang et.al.	2505.16864	link
2025-05-22	Action2Dialogue: Generating Character-Centric Narratives from Scene-Level Prompts	Taewon Kang et.al.	2505.16819	null
2025-05-22	MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM	Siwei Meng et.al.	2505.16456	null
2025-05-21	Generative AI for Autonomous Driving: A Review	Katharina Winter et.al.	2505.15863	null
2025-05-21	AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection	Zhipei Xu et.al.	2505.15173	null
2025-05-21	CineTechBench: A Benchmark for Cinematographic Technique Understanding and Generation	Xinran Wang et.al.	2505.15145	link
2025-05-21	BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation	Haiquan Wen et.al.	2505.12620	link
2025-05-21	Video-GPT via Next Clip Diffusion	Shaobin Zhuang et.al.	2505.12489	null
2025-05-20	Programmatic Video Prediction Using Large Language Models	Hao Tang et.al.	2505.14948	link
2025-05-20	Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers	Sucheng Ren et.al.	2505.14687	link
2025-05-20	LMP: Leveraging Motion Prior in Zero-Shot Video Generation with Diffusion Transformer	Changgu Chen et.al.	2505.14167	null
2025-05-20	Hunyuan-Game: Industrial-grade Intelligent Game Creation Model	Ruihuang Li et.al.	2505.14135	null
2025-05-20	MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation	Yanbo Ding et.al.	2505.10238	link
2025-05-19	FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance	Dian Shao et.al.	2505.13437	null
2025-05-19	MAGI-1: Autoregressive Video Generation at Scale	Sand. ai et.al.	2505.13211	link
2025-05-19	DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories	Joel Jang et.al.	2505.12705	link
2025-05-19	Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking	Zihan Su et.al.	2505.12667	null
2025-05-18	EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models	Hu Yue et.al.	2505.09694	link
2025-05-17	FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge	Xuan Shen et.al.	2505.14709	link
2025-05-17	DraftAttention: Fast Video Diffusion via Low-Resolution Attention Guidance	Xuan Shen et.al.	2505.14708	link
2025-05-17	LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation	Jiarui Wang et.al.	2505.12098	link
2025-05-17	VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption	Tianxiong Zhong et.al.	2505.12053	null
2025-05-17	STORYANCHORS: Generating Consistent Multi-Scene Story Frames for Long-Form Narratives	Bo Wang et.al.	2505.08350	null
2025-05-16	QVGen: Pushing the Limit of Quantized Video Generative Models	Yushi Huang et.al.	2505.11497	null
2025-05-16	Face Consistency Benchmark for GenAI Video	Michal Podstawski et.al.	2505.11425	null
2025-05-16	Ophora: A Large-Scale Data-Driven Text-Guided Ophthalmic Surgical Video Generation Model	Wei Li et.al.	2505.07449	link
2025-05-15	ToonifyGB: StyleGAN-based Gaussian Blendshapes for 3D Stylized Head Avatars	Rui-Yang Ju et.al.	2505.10072	null
2025-05-15	Generating time-consistent dynamics with discriminator-guided image diffusion models	Philipp Hess et.al.	2505.09089	null
2025-05-15	Generative Pre-trained Autoregressive Diffusion Transformer	Yuan Zhang et.al.	2505.07344	null
2025-05-14	Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios	Huafeng Shi et.al.	2505.10584	null
2025-05-13	Generative AI for Autonomous Driving: Frontiers and Opportunities	Yuping Wang et.al.	2505.08854	link
2025-05-13	Symbolically-Guided Visual Plan Inference from Uncurated Video Data	Wenyan Yang et.al.	2505.08444	null
2025-05-12	DanceGRPO: Unleashing GRPO on Visual Generation	Zeyue Xue et.al.	2505.07818	null
2025-05-12	ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models	Ozgur Kara et.al.	2505.07652	null
2025-05-11	DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models	Junhao Xia et.al.	2505.07057	null
2025-05-11	BridgeIV: Bridging Customized Image and Video Generation through Test-Time Autoregressive Identity Propagation	Panwen Hu et.al.	2505.06985	null
2025-05-10	Jailbreaking the Text-to-Video Generative Models	Jiayang Liu et.al.	2505.06679	null
2025-05-10	ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images	Xianghao Kong et.al.	2505.06537	null
2025-05-08	3D Scene Generation: A Survey	Beichen Wen et.al.	2505.05474	link
2025-05-08	T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models	Xuyang Guo et.al.	2505.04946	null
2025-05-08	HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation	Teng Hu et.al.	2505.04512	null
2025-05-06	Real-Time Person Image Synthesis Using a Flow Matching Model	Jiwoo Jeong et.al.	2505.03562	link
2025-05-06	Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights	Zhaiming Shen et.al.	2505.03205	null
2025-05-04	DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization	Wenchuan Wang et.al.	2505.02192	null
2025-05-03	GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting	Anushka Agarwal et.al.	2505.01928	null
2025-05-03	PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth	Bu Jin et.al.	2505.01729	null
2025-05-02	VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos	Zongxia Li et.al.	2505.01481	link
2025-05-02	FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis	Jiangtong Tan et.al.	2505.01172	link
2025-05-01	Controllable Weather Synthesis and Removal with Video Diffusion Models	Chih-Hao Lin et.al.	2505.00704	null
2025-05-01	T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation	Xuyang Guo et.al.	2505.00337	null
2025-04-30	Direct Motion Models for Assessing Generated Videos	Kelsey Allen et.al.	2505.00209	null
2025-04-30	Eye2Eye: A Simple Approach for Monocular-to-Stereo Video Synthesis	Michal Geyer et.al.	2505.00135	null
2025-04-30	ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction	Qihao Liu et.al.	2504.21855	null
2025-04-30	HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation	Haiyang Zhou et.al.	2504.21650	link
2025-04-30	Simple Visual Artifact Detection in Sora-Generated Videos	Misora Sugiyama et.al.	2504.21334	null
2025-04-30	Capturing Conditional Dependence via Auto-regressive Diffusion Models	Xunpeng Huang et.al.	2504.21314	null
2025-04-29	TesserAct: Learning 4D Embodied World Models	Haoyu Zhen et.al.	2504.20995	null
2025-04-29	DDPS: Discrete Diffusion Posterior Sampling for Paths in Layered Graphs	Hao Luan et.al.	2504.20754	null
2025-04-29	Advance Fake Video Detection via Vision Transformers	Joy Battocchio et.al.	2504.20669	null
2025-04-28	CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition	Quynh Phung et.al.	2504.19894	null
2025-04-28	DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer	Junpeng Jiang et.al.	2504.19614	null
2025-04-26	Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning	Yifan Xie et.al.	2504.18810	null
2025-04-26	Stealing Creator's Workflow: A Creator-Inspired Agentic Framework with Iterative Feedback Loop for Improved Scientific Short-form Generation	Jong Inn Park et.al.	2504.18805	null
2025-04-25	NoiseController: Towards Consistent Multi-view Video Generation via Noise Decomposition and Collaboration	Haotian Dong et.al.	2504.18448	null
2025-04-25	We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback	Minkyu Choi et.al.	2504.17180	null
2025-04-24	Dynamic Camera Poses and Where to Find Them	Chris Rockwell et.al.	2504.17788	null
2025-04-24	MV-Crafter: An Intelligent System for Music-guided Video Generation	Chuer Chen et.al.	2504.17267	null
2025-04-24	DIVE: Inverting Conditional Diffusion Models for Discriminative Tasks	Yinqi Li et.al.	2504.17253	link
2025-04-23	Subject-driven Video Generation via Disentangled Identity and Motion	Daneul Kim et.al.	2504.17816	null
2025-04-23	BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation	Ruotong Wang et.al.	2504.16907	null
2025-04-23	ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance	Ying Li et.al.	2504.16464	null
2025-04-23	VideoMark: A Distortion-Free Robust Watermarking Framework for Video Diffusion Models	Xuming Hu et.al.	2504.16359	null
2025-04-22	DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment	Xiaofan Li et.al.	2504.18576	link
2025-04-22	Survey of Video Diffusion Models: Foundations, Implementations, and Applications	Yimu Wang et.al.	2504.16081	link
2025-04-22	Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework	Xinyuan Song et.al.	2504.16016	null
2025-04-22	Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning	Wang Lin et.al.	2504.15932	null
2025-04-22	Satellite to GroundScape -- Large-scale Consistent Ground View Generation from Satellite Views	Ningli Xu et.al.	2504.15786	null
2025-04-22	DiTPainter: Efficient Video Inpainting with Diffusion Transformers	Xian Wu et.al.	2504.15661	null
2025-04-21	Solving New Tasks by Adapting Internet Video Knowledge	Calvin Luo et.al.	2504.15369	null
2025-04-21	Tiger200K: Manually Curated High Visual Quality Video Dataset from UGC Platform	Xianpan Zhou et.al.	2504.15182	null
2025-04-21	DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation	Weijie He et.al.	2504.15032	null
2025-04-21	Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation	Chenjie Cao et.al.	2504.14899	link
2025-04-21	SkyReels-V2: Infinite-length Film Generative Model	Guibin Chen et.al.	2504.13074	link
2025-04-21	Packing Input Frame Context in Next-Frame Prediction Models for Video Generation	Lvmin Zhang et.al.	2504.12626	link
2025-04-20	Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis	Jingjing Ren et.al.	2504.14470	null
2025-04-19	SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video Generation via Spherical Latent Representation	Minho Park et.al.	2504.14396	link
2025-04-18	Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting	Jiaxin Huang et.al.	2504.11092	null
2025-04-17	Understanding Attention Mechanism in Video Diffusion Models	Bingyan Liu et.al.	2504.12027	null
2025-04-17	VideoPanda: Video Panoramic Diffusion with Multi-view Attention	Kevin Xie et.al.	2504.11389	null
2025-04-16	VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate	Zhihang Yuan et.al.	2504.12259	link
2025-04-16	Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM	Zirui Pan et.al.	2504.12048	null
2025-04-16	The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation	Bingjie Gao et.al.	2504.11739	null
2025-04-15	InterAnimate: Taming Region-aware Diffusion Model for Realistic Human Interaction Animation	Yukang Lin et.al.	2504.10905	null
2025-04-15	OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding	Dianbing Xi et.al.	2504.10825	null
2025-04-14	H-MoRe: Learning Human-centric Motion Representation for Action Analysis	Zhanbo Huang et.al.	2504.10676	link
2025-04-14	H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models	Yushu Wu et.al.	2504.10567	null
2025-04-14	FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos	Rui Chen et.al.	2504.10358	null
2025-04-14	Aligning Anime Video Generation with Human Feedback	Bingwen Zhu et.al.	2504.10044	null
2025-04-14	EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise	Chao Liu et.al.	2504.09789	null
2025-04-13	CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models	Pooja Guhan et.al.	2504.09472	null
2025-04-11	Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model	Team Seawead et.al.	2504.08685	null
2025-04-11	Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization	Jialu Li et.al.	2504.08641	null
2025-04-11	Diffusion Models for Robotic Manipulation: A Survey	Rosa Wolf et.al.	2504.08438	null
2025-04-11	EasyGenNet: An Efficient Framework for Audio-Driven Gesture Video Generation Based on Diffusion Model	Renda Li et.al.	2504.08344	null
2025-04-11	RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements	Guangcong Zheng et.al.	2504.08212	link
2025-04-11	TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation	Ruineng Li et.al.	2504.08181	null
2025-04-10	Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction	Zeren Jiang et.al.	2504.07961	link
2025-04-10	Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos	Rundong Luo et.al.	2504.07940	null
2025-04-10	Diffusion Transformers for Tabular Data Time Series Generation	Fabrizio Garuti et.al.	2504.07566	link
2025-04-09	EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation	Diljeet Jagpal et.al.	2504.06861	null
2025-04-09	DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation	Wangbo Zhao et.al.	2504.06803	link
2025-04-09	RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism	Elia Peruzzo et.al.	2504.06672	null
2025-04-09	Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception	Ruotian Peng et.al.	2504.06666	null
2025-04-08	CamContextI2V: Context-aware Controllable Video Generation	Luis Denninger et.al.	2504.06022	link
2025-04-08	Physics-aware generative models for turbulent fluid flows through energy-consistent stochastic interpolants	Nikolaj T. Mücke et.al.	2504.05852	link
2025-04-07	One-Minute Video Generation with Test-Time Training	Karan Dalal et.al.	2504.05298	null
2025-04-07	Video-Bench: Human-Aligned Video Generation Benchmark	Hui Han et.al.	2504.04907	null
2025-04-07	Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation	Fa-Ting Hong et.al.	2504.02542	link
2025-04-05	Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization	Yikai Wang et.al.	2504.04153	link
2025-04-05	Multi-identity Human Image Animation with Structural Video Diffusion	Zhenzhi Wang et.al.	2504.04126	null
2025-04-05	Can You Count to Nine? A Human Evaluation Benchmark for Counting Limits in Modern Text-to-Video Models	Xuyang Guo et.al.	2504.04051	null
2025-04-05	DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion	Maksim Siniukov et.al.	2504.04010	null
2025-04-04	Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models	Xuran Ma et.al.	2504.03140	link
2025-04-04	MG-Gen: Single Image to Motion Graphics Generation with Layer Decomposition	Takahiro Shirakawa et.al.	2504.02361	null
2025-04-03	How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models	Pascal Chang et.al.	2504.03072	null
2025-04-03	Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments	Chenyu Zhang et.al.	2504.02918	null
2025-04-03	Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets	Chuning Zhu et.al.	2504.02792	null
2025-04-03	Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model	Shengjun Zhang et.al.	2504.02764	null
2025-04-03	ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer	Jiayi Gao et.al.	2504.02451	link
2025-04-03	SkyReels-A2: Compose Anything in Video Diffusion Transformers	Zhengcong Fei et.al.	2504.02436	link
2025-04-03	OmniCam: Unified Multimodal Video Generation via Camera Control	Xiaoda Yang et.al.	2504.02312	null
2025-04-03	VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step	Hanyang Wang et.al.	2504.01956	null
2025-04-02	Proof of Humanity: A Multi-Layer Network Framework for Certifying Human-Originated Content in an AI-Dominated Internet	Sebastian Barros et.al.	2504.03752	null
2025-04-02	WorldPrompter: Traversable Text-to-Scene Generation	Zhaoyang Zhang et.al.	2504.02045	null
2025-04-02	Towards Physically Plausible Video Generation via VLM Planning	Xindi Yang et.al.	2503.23368	null
2025-04-01	AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction	Junhao Cheng et.al.	2504.01014	link
2025-04-01	WorldScore: A Unified Evaluation Benchmark for World Generation	Haoyi Duan et.al.	2504.00983	null
2025-04-01	DecoFuse: Decomposing and Fusing the "What", "Where", and "How" for Brain-Inspired fMRI-to-Video Decoding	Chong Li et.al.	2504.00432	null
2025-04-01	HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation	Boyuan Wang et.al.	2503.24026	null
2025-04-01	On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices	Bosung Kim et.al.	2503.23796	link
2025-03-31	GazeLLM: Multimodal LLMs incorporating Human Visual Attention	Jun Rekimoto et.al.	2504.00221	null
2025-03-31	Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation	Shengqiong Wu et.al.	2503.24379	null
2025-03-31	JointTuner: Appearance-Motion Adaptive Joint Training for Customized Video Generation	Fangda Chen et.al.	2503.23951	null
2025-03-31	HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation	Kun Liu et.al.	2503.23715	null
2025-03-30	VideoGen-Eval: Agent-based System for Video Generation Evaluation	Yuhang Yang et.al.	2503.23452	link
2025-03-30	JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization	Kai Liu et.al.	2503.23377	null
2025-03-30	MoCha: Towards Movie-Grade Talking Character Synthesis	Cong Wei et.al.	2503.23307	null
2025-03-30	SketchVideo: Sketch-based Video Generation and Editing	Feng-Lin Liu et.al.	2503.23284	null
2025-03-29	Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models	Prin Phunyaphibarn et.al.	2503.20240	null
2025-03-28	Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model	Jangho Park et.al.	2503.22622	null
2025-03-28	EchoFlow: A Foundation Model for Cardiac Ultrasound Image and Video Generation	Hadrien Reynaud et.al.	2503.22357	null
2025-03-28	CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving	Yishen Ji et.al.	2503.22231	null
2025-03-27	VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models	Chi-Pin Huang et.al.	2503.21781	null
2025-03-27	Exploring the Evolution of Physics Cognition in Video Generation: A Survey	Minghui Lin et.al.	2503.21765	link
2025-03-27	VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness	Dian Zheng et.al.	2503.21755	link
2025-03-27	Audio-driven Gesture Generation via Deviation Feature in the Latent Space	Jiahui Chen et.al.	2503.21616	null
2025-03-27	ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model	Jinwei Qi et.al.	2503.21144	null
2025-03-26	Protecting Your Video Content: Disrupting Automated Video-based LLM Annotations	Haitong Liu et.al.	2503.21824	link
2025-03-26	Synthetic Video Enhances Physical Fidelity in Video Synthesis	Qi Zhao et.al.	2503.20822	null
2025-03-26	RecTable: Fast Modeling Tabular Data with Rectified Flow	Masane Fuchi et.al.	2503.20731	link
2025-03-26	AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports	Xiangwen Zhang et.al.	2503.20654	null
2025-03-26	GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving	Lloyd Russell et.al.	2503.20523	null
2025-03-26	VPO: Aligning Text-to-Video Generation Models with Prompt Optimization	Jiale Cheng et.al.	2503.20491	link
2025-03-26	Wan: Open and Advanced Large-Scale Video Generative Models	WanTeam et.al.	2503.20314	link
2025-03-26	Video Motion Graphs	Haiyang Liu et.al.	2503.20218	null
2025-03-26	Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing	Jaihoon Kim et.al.	2503.19385	null
2025-03-26	EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models	Yufei Cai et.al.	2503.19369	link
2025-03-25	Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors	Yuke Lou et.al.	2503.20118	null
2025-03-25	Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals	Stefan Stojanov et.al.	2503.19953	null
2025-03-25	FuXi-RTM: A Physics-Guided Prediction Framework with Radiative Transfer Modeling	Qiusheng Huang et.al.	2503.19940	null
2025-03-25	FullDiT: Multi-Task Video Generative Foundation Model with Full Attention	Xuan Ju et.al.	2503.19907	null
2025-03-25	Mask $^2$ DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation	Tianhao Qi et.al.	2503.19881	null
2025-03-25	AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers	Jiazhi Guan et.al.	2503.19824	null
2025-03-25	AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset	Haiyu Zhang et.al.	2503.19462	null
2025-03-25	MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation	Yukang Lin et.al.	2503.19383	null
2025-03-25	Long-Context Autoregressive Video Modeling with Next-Frame Prediction	Yuchao Gu et.al.	2503.19325	link
2025-03-25	Aether: Geometric-Aware Unified World Modeling	Aether Team et.al.	2503.18945	null
2025-03-25	AMD-Hummingbird: Towards an Efficient Text-to-Video Model	Takashi Isobe et.al.	2503.18559	link
2025-03-25	Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model	Yingying Fan et.al.	2503.16942	null
2025-03-24	Video-T1: Test-Time Scaling for Video Generation	Fangfu Liu et.al.	2503.18942	null
2025-03-24	Training-free Diffusion Acceleration with Bottleneck Sampling	Ye Tian et.al.	2503.18940	null
2025-03-24	EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation	Qiang Qu et.al.	2503.18552	null
2025-03-24	Can Text-to-Video Generation help Video-Language Alignment?	Luca Zanella et.al.	2503.18507	null
2025-03-24	Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation	Dingcheng Zhen et.al.	2503.18429	null
2025-03-24	Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance	Sicong Feng et.al.	2503.18386	null
2025-03-23	LongDiff: Training-Free Long Video Generation in One Go	Zhuoling Li et.al.	2503.18150	null
2025-03-23	TransAnimate: Taming Layer Diffusion to Generate RGBA Video	Xuewei Chen et.al.	2503.17934	null
2025-03-22	RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation	Zhiqiang Yuan et.al.	2503.17735	null
2025-03-21	Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks	Bhishma Dedhia et.al.	2503.17539	null
2025-03-21	Position: Interactive Generative Video as Next-Generation Game Engine	Jiwen Yu et.al.	2503.17359	null
2025-03-21	AnimatePainter: A Self-Supervised Rendering Framework for Reconstructing Painting Process	Junjie Hu et.al.	2503.17029	null
2025-03-21	Enabling Versatile Controls for Video Diffusion Models	Xu Zhang et.al.	2503.16983	link
2025-03-21	SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation	Chun-Han Yao et.al.	2503.16396	null
2025-03-20	A Recipe for Generating 3D Worlds From a Single Image	Katja Schwarz et.al.	2503.16611	null
2025-03-20	XAttention: Block Sparse Attention with Antidiagonal Scoring	Ruyi Xu et.al.	2503.16428	link
2025-03-20	MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance	Quanhao Li et.al.	2503.16421	null
2025-03-20	ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos	Haolin Yang et.al.	2503.16400	null
2025-03-20	PoseTraj: Pose-Aware Trajectory Control in Video Diffusion	Longbin Ji et.al.	2503.16068	null
2025-03-20	Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models	Marc Benedí San Millán et.al.	2503.15996	null
2025-03-20	MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving	Haiguang Wang et.al.	2503.15875	link
2025-03-20	VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling	Hyojun Go et.al.	2503.15855	null
2025-03-20	VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention	Mingzhe Zheng et.al.	2503.15138	null
2025-03-19	Temporal Regularization Makes Your Video Generator Stronger	Harold Haodong Chen et.al.	2503.15417	null
2025-03-19	Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models	Tingxiu Chen et.al.	2503.14966	link
2025-03-18	MusicInfuser: Making Video Diffusion Listen and Dance	Susung Hong et.al.	2503.14505	null
2025-03-18	MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation	Hongyu Zhang et.al.	2503.14428	null
2025-03-18	Impossible Videos	Zechen Bai et.al.	2503.14378	null
2025-03-18	LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models	Yu Cheng et.al.	2503.14325	link
2025-03-18	Concat-ID: Towards Universal Identity-Preserving Video Synthesis	Yong Zhong et.al.	2503.14151	null
2025-03-18	Fast Autoregressive Video Generation with Diagonal Decoding	Yang Ye et.al.	2503.14070	null
2025-03-18	AIGVE-Tool: AI-Generated Video Evaluation Toolkit with Multifaceted Benchmark	Xinhao Xiang et.al.	2503.14064	link
2025-03-17	MagicDistillation: Weak-to-Strong Video Distillation for Large-Scale Portrait Few-Step Synthesis	Shitong Shao et.al.	2503.13319	null
2025-03-17	Language-guided Open-world Video Anomaly Detection	Zihao Liu et.al.	2503.13160	null
2025-03-17	Frame-wise Conditioning Adaptation for Fine-Tuning Diffusion Models in Text-to-Video Prediction	Zheyuan Liu et.al.	2503.12953	null
2025-03-17	AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations	Quang Trung Truong et.al.	2503.12828	null
2025-03-17	Long-Video Audio Synthesis with Multi-Agent Collaboration	Yehang Zhang et.al.	2503.10719	null
2025-03-16	SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse Inputs	Guibiao Liao et.al.	2503.12535	null
2025-03-16	VMBench: A Benchmark for Perception-Aligned Video Motion Generation	Xinran Ling et.al.	2503.10076	link
2025-03-15	ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis	Yu Fang et.al.	2503.14526	null
2025-03-15	A Speech-to-Video Synthesis Approach Using Spatio-Temporal Diffusion for Vocal Tract MRI	Paula Andrea Pérez-Toro et.al.	2503.12102	null
2025-03-15	SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering	Byeongjun Park et.al.	2503.12024	link
2025-03-14	ReCamMaster: Camera-Controlled Generative Rendering from A Single Video	Jianhong Bai et.al.	2503.11647	null
2025-03-14	HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models	Ziqin Zhou et.al.	2503.11513	null
2025-03-14	TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation	Hongxiang Zhao et.al.	2503.11423	null
2025-03-14	Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model	Haoyang Huang et.al.	2503.11251	link
2025-03-14	Cross-Modal Learning for Music-to-Music-Video Description Generation	Zhuoyuan Mao et.al.	2503.11190	null
2025-03-14	On the Limitations of Vision-Language Models in Understanding Image Transforms	Ahmad Mustafa Anis et.al.	2503.09837	null
2025-03-13	CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models	Hao He et.al.	2503.10592	null
2025-03-13	Long Context Tuning for Video Generation	Yuwei Guo et.al.	2503.10589	null
2025-03-13	CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance	Yufan Deng et.al.	2503.10391	null
2025-03-13	Semantic Latent Motion for Portrait Video Generation	Qiyuan Zhang et.al.	2503.10096	null
2025-03-13	UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?	Yuanxin Liu et.al.	2503.09949	link
2025-03-13	Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers	Yasheng Sun et.al.	2503.09942	null
2025-03-13	VideoMerge: Towards Training-free Long Video Generation	Siyang Zhang et.al.	2503.09926	null
2025-03-13	WonderVerse: Extendable 3D Scene Generation with Video Generative Models	Hao Feng et.al.	2503.09160	null
2025-03-12	Error Analyses of Auto-Regressive Video Diffusion Models: A Unified Framework	Jing Wang et.al.	2503.10704	null
2025-03-12	LuciBot: Automated Robot Policy Learning from Generated Videos	Xiaowen Qiu et.al.	2503.09871	null
2025-03-12	I2V3D: Controllable image-to-video generation with 3D guidance	Zhiyuan Zhang et.al.	2503.09733	null
2025-03-12	Accelerating Diffusion Sampling via Exploiting Local Transition Coherence	Shangwen Zhu et.al.	2503.09675	null
2025-03-12	Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k	Xiangyu Peng et.al.	2503.09642	link
2025-03-12	PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop	Chenyu Li et.al.	2503.09595	link
2025-03-12	Unified Dense Prediction of Video Diffusion	Lehan Yang et.al.	2503.09344	null
2025-03-12	Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latant Space	Jian Zhu et.al.	2503.09215	null
2025-03-12	SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video	Chengshu Zhao et.al.	2503.09154	link
2025-03-12	Reangle-A-Video: 4D Video Generation as Video-to-Video Translation	Hyeonho Jeong et.al.	2503.09151	null
2025-03-12	$^R$ FLAV: Rolling Flow matching for infinite Audio Video generation	Alex Ergasti et.al.	2503.08307	link
2025-03-12	Object-Centric World Model for Language-Guided Manipulation	Youngjoon Jeong et.al.	2503.06170	null
2025-03-11	V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video	Jianqi Chen et.al.	2503.09631	null
2025-03-11	REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder	Yitian Zhang et.al.	2503.08665	null
2025-03-11	Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling	Subin Kim et.al.	2503.08605	null
2025-03-11	WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation	Jing Wang et.al.	2503.08153	null
2025-03-11	ObjectMover: Generative Object Movement with Video Prior	Xin Yu et.al.	2503.08037	null
2025-03-11	How Can Video Generative AI Transform K-12 Education? Examining Teachers' Perspectives through TPACK and TAM	Unggi Lee et.al.	2503.08003	null
2025-03-11	VACE: All-in-One Video Creation and Editing	Zeyinzi Jiang et.al.	2503.07598	null
2025-03-11	LightMotion: A Light and Tuning-free Method for Simulating Camera Motion in Video Generation	Quanjian Song et.al.	2503.06508	link
2025-03-10	DreamRelation: Relation-Centric Video Customization	Yujie Wei et.al.	2503.07602	null
2025-03-10	AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion	Mingzhen Sun et.al.	2503.07418	null
2025-03-10	Automated Movie Generation via Multi-Agent CoT Planning	Weijia Wu et.al.	2503.07314	link
2025-03-10	From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers	Jiacheng Liu et.al.	2503.06923	link
2025-03-09	VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation	Hritik Bansal et.al.	2503.06800	null
2025-03-09	TR-DQ: Time-Rotation Diffusion Quantization	Yihua Shao et.al.	2503.06564	null
2025-03-09	QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation	Junyi Wu et.al.	2503.06545	link
2025-03-09	Generative Video Bi-flow	Chen Liu et.al.	2503.06364	null
2025-03-08	Text2Story: Advancing Video Storytelling with Text Guidance	Taewon Kang et.al.	2503.06310	null
2025-03-08	ROCM: RLHF on consistency models	Shivanshu Shekhar et.al.	2503.06171	null
2025-03-08	VACT: A Video Automatic Causal Testing System and a Benchmark	Haotong Yang et.al.	2503.06163	null
2025-03-08	GSV3D: Gaussian Splatting-based Geometric Distillation with Stable Video Diffusion for Single-Image 3D Object Generation	Ye Tao et.al.	2503.06136	null
2025-03-08	DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation	Runze Zhang et.al.	2503.06053	null
2025-03-08	The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation	Aoxiong Yin et.al.	2503.04606	link
2025-03-08	Rethinking Video Tokenization: A Conditioned Diffusion-based Approach	Nianzu Yang et.al.	2503.03708	link
2025-03-07	MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice	Hongwei Yi et.al.	2503.05978	null
2025-03-07	MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio	Xuenan Xu et.al.	2503.05242	link
2025-03-07	Unified Reward Model for Multimodal Understanding and Generation	Yibin Wang et.al.	2503.05236	null
2025-03-07	Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos	Zhiyu Tan et.al.	2502.21314	null
2025-03-06	Toward Lightweight and Fast Decoders for Diffusion Models in Image and Video Generation	Alexey Buzovkin et.al.	2503.04871	link
2025-03-06	FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video	Yue Gao et.al.	2503.04720	null
2025-03-06	What Are You Doing? A Closer Look at Controllable Human Video Generation	Emanuele Bugliarello et.al.	2503.04666	null
2025-03-05	ProReflow: Progressive Reflow with Decomposed Velocity	Lei Ke et.al.	2503.04824	null
2025-03-05	GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control	Xuanchi Ren et.al.	2503.03751	link
2025-03-05	DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance	Zhao Yang et.al.	2503.03689	link
2025-03-05	High-Quality Virtual Single-Viewpoint Surgical Video: Geometric Autocalibration of Multiple Cameras in Surgical Lights	Yuna Kato et.al.	2503.03558	link
2025-03-05	Video Super-Resolution: All You Need is a Video Diffusion Model	Zhihao Zhan et.al.	2503.03355	null
2025-03-04	GRADEO: Towards Human-Like Evaluation for Text-to-Video Generation via Multi-Step Reasoning	Zhun Mou et.al.	2503.02341	null
2025-03-04	Unified Video Action Model	Shuang Li et.al.	2503.00200	null
2025-03-03	VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation	Wenhao Wang et.al.	2503.01739	link
2025-03-03	VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors	Juil Koo et.al.	2503.01107	null
2025-03-03	TransVDM: Motion-Constrained Video Diffusion Model for Transparent Video Synthesis	Menghao Li et.al.	2502.19454	null
2025-03-02	Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think	Jie Tian et.al.	2503.00948	link
2025-03-01	Learning to Animate Images from A Few Videos to Portray Delicate Human Actions	Haoxin Li et.al.	2503.00276	null
2025-02-28	Training-free and Adaptive Sparse Attention for Efficient Long Video Generation	Yifei Xia et.al.	2502.21079	null
2025-02-28	HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models	Xiao Wang et.al.	2502.20811	null
2025-02-28	WorldModelBench: Judging Video Generation Models As World Models	Dacheng Li et.al.	2502.20694	null
2025-02-28	RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers	Ke Cao et.al.	2502.14377	null
2025-02-27	Mobius: Text to Seamless Looping Video Generation via Latent Shift	Xiuli Bi et.al.	2502.20307	link
2025-02-27	FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute	Sotiris Anagnostidis et.al.	2502.20126	null
2025-02-27	C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation	Yuhao Li et.al.	2502.19868	link
2025-02-26	Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical Analysis	Long Cheng et.al.	2503.01873	null
2025-02-26	Glad: A Streaming Scene Generator for Autonomous Driving	Bin Xie et.al.	2503.00045	null
2025-02-26	FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion mode	Lingzhou Mu et.al.	2502.19455	null
2025-02-25	SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference	Jintao Zhang et.al.	2502.18137	link
2025-02-25	ASurvey: Spatiotemporal Consistency in Video Generation	Zhiyu Yin et.al.	2502.17863	null
2025-02-24	X-Dancer: Expressive Music to Human Dance Video Generation	Zeyuan Chen et.al.	2502.17414	null
2025-02-24	VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing	Xiangpeng Yang et.al.	2502.17258	null
2025-02-24	Diffusion Models for Tabular Data: Challenges, Current Progress, and Future Directions	Zhong Li et.al.	2502.17119	link
2025-02-21	RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers	Min Zhao et.al.	2502.15894	null
2025-02-21	VaViM and VaVAM: Autonomous Driving through Video Generative Modeling	Florent Bartoccioni et.al.	2502.15672	link
2025-02-21	LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities	Florian Sestak et.al.	2502.12128	link
2025-02-20	Hardware-Friendly Static Quantization Method for Video Diffusion Transformers	Sanghyun Yi et.al.	2502.15077	null
2025-02-20	LAVID: An Agentic LVLM Framework for Diffusion-Generated Video Detection	Qingyuan Liu et.al.	2502.14994	null
2025-02-20	Improving the Diffusability of Autoencoders	Ivan Skorokhodov et.al.	2502.14831	null
2025-02-20	Designing Parameter and Compute Efficient Diffusion Transformers using Distillation	Vignesh Sundaresha et.al.	2502.14226	null
2025-02-19	FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation	Yunpeng Zhang et.al.	2502.13995	link
2025-02-19	LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation	Junchen Fu et.al.	2502.12945	null
2025-02-18	VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation	Xinlong Chen et.al.	2502.12782	link
2025-02-18	MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation	Sihyun Yu et.al.	2502.12632	null
2025-02-17	DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation	Zhihang Yuan et.al.	2502.11897	link
2025-02-17	Object-Centric Image to Video Generation with Language Guidance	Angel Villar-Corrales et.al.	2502.11655	null
2025-02-17	Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model	Guoqing Ma et.al.	2502.10248	link
2025-02-17	Magic 1-For-1: Generating One Minute Video Clips within One Minute	Hongwei Yi et.al.	2502.07701	link
2025-02-16	MaskFlow: Discrete Flows For Flexible and Efficient Long Video Generation	Michael Fuest et.al.	2502.11234	null
2025-02-16	Phantom: Subject-consistent video generation via cross-modal alignment	Lijie Liu et.al.	2502.11079	null
2025-02-15	SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers	Di Qiu et.al.	2502.10841	link
2025-02-14	RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control	Teng Li et.al.	2502.10059	null
2025-02-14	GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation	Hongyin Zhang et.al.	2502.09268	null
2025-02-13	Enhance-A-Video: Better Generated Video for Free	Yang Luo et.al.	2502.07508	link
2025-02-12	CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation	Qinghe Wang et.al.	2502.08639	null
2025-02-12	FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis	Wonjoon Jin et.al.	2502.08244	null
2025-02-12	Learning Human Skill Generators at Key-Step Levels	Yilu Wu et.al.	2502.08234	null
2025-02-12	AnyCharV: Bootstrap Controllable Character Video Generation with Fine-to-Coarse Guidance	Zhao Wang et.al.	2502.08189	null
2025-02-12	Next Block Prediction: Video Generation via Semi-Autoregressive Modeling	Shuhuai Ren et.al.	2502.07737	null
2025-02-12	VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation	Sixiao Zheng et.al.	2502.07531	null

(back to top)

TryOn

Publish Date	Title	Authors	PDF	Code
2025-06-23	InstructAttribute: Fine-grained Object Attributes editing with Instruction	Xingxi Yin et.al.	2505.00751	null
2025-06-14	Real-Time Per-Garment Virtual Try-On with Temporal Consistency for Loose-Fitting Garments	Zaiqiang Wu et.al.	2506.12348	link
2025-06-13	HF-VTON: High-Fidelity Virtual Try-On via Consistent Geometric and Semantic Alignment	Ming Meng et.al.	2505.19638	null
2025-06-12	Low-Barrier Dataset Collection with Real Human Body for Interactive Per-Garment Virtual Try-On	Zaiqiang Wu et.al.	2506.10468	link
2025-06-06	ChronoTailor: Harnessing Attention Guidance for Fine-Grained Video Virtual Try-On	Jinjuan Wang et.al.	2506.05858	null
2025-06-02	OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation	Sen Liang et.al.	2506.01801	null
2025-06-01	DS-VTON: High-Quality Virtual Try-on via Disentangled Dual-Scale Generation	Xianbing Sun et.al.	2506.00908	null
2025-05-29	VITON-DRR: Details Retention Virtual Try-on via Non-rigid Registration	Ben Li et.al.	2505.23439	link
2025-05-28	MagicTryOn: Harnessing Diffusion Transformer for Garment-Preserving Video Virtual Try-on	Guangyuan Li et.al.	2505.21325	null
2025-05-27	Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals	Davide Lobba et.al.	2505.21062	link
2025-05-26	VTBench: Comprehensive Benchmark Suite Towards Real-World Virtual Try-on Models	Hu Xiaobin et.al.	2505.19571	link
2025-05-22	Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction	Dong Li et.al.	2505.16980	null
2025-05-22	Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On	Siqi Wan et.al.	2505.16977	link
2025-05-15	Single View Garment Reconstruction Using Diffusion Mapping Via Pattern Coordinates	Ren Li et.al.	2504.08353	link
2025-04-29	Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting	Hanxi Liu et.al.	2504.20403	null
2025-04-24	FashionM3: Multimodal, Multitask, and Multiround Fashion Assistant based on Unified Vision-Language Model	Kaicheng Pang et.al.	2504.17826	null
2025-04-24	3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models	Min Wei et.al.	2504.17414	null
2025-04-21	Shape-Guided Clothing Warping for Virtual Try-On	Xiaoyu Han et.al.	2504.15232	link
2025-04-21	Insert Anything: Image Insertion via In-Context Editing in DiT	Wensong Song et.al.	2504.15009	null
2025-04-19	Flux Already Knows -- Activating Subject-Driven Image Generation without Training	Hao Kang et.al.	2504.11478	link
2025-04-19	Concat-ID: Towards Universal Identity-Preserving Video Synthesis	Yong Zhong et.al.	2503.14151	null
2025-04-18	Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation	Fulvio Sanguigni et.al.	2504.14011	null
2025-04-17	Enhancing Person-to-Person Virtual Try-On with Multi-Garment Virtual Try-Off	Riza Velioglu et.al.	2504.13078	link
2025-04-15	ReZero: Enhancing LLM search ability by trying one-more-time	Alan Dao et.al.	2504.11001	null
2025-04-11	VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction	Zijian He et.al.	2503.12165	null
2025-04-04	From Keypoints to Realism: A Realistic and Accurate Virtual Try-on Network from 2D Images	Maliheh Toozandehjani et.al.	2504.03807	null
2025-04-03	MAD: Makeup All-in-One with Cross-Domain Diffusion Model	Bo-Kai Ruan et.al.	2504.02545	null
2025-04-01	Diffusion Model-Based Size Variable Virtual Try-On Technology and Evaluation Method	Shufang Zhang et.al.	2504.00562	null
2025-03-26	ITA-MDT: Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On	Ji Woo Hong et.al.	2503.20418	null
2025-03-26	Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks	Hailong Guo et.al.	2501.15891	null
2025-03-25	Exploring Disentangled and Controllable Human Image Synthesis: From End-to-End to Stage-by-Stage	Zhengwentai Sun et.al.	2503.19486	null
2025-03-20	Shining Yourself: High-Fidelity Ornaments Virtual Try-on with Diffusion Model	Yingmao Miao et.al.	2503.16065	null
2025-03-18	Limb-Aware Virtual Try-On Network with Progressive Clothing Warping	Shengping Zhang et.al.	2503.14074	link
2025-03-16	Progressive Limb-Aware Virtual Try-On	Xiaoyu Han et.al.	2503.12588	link
2025-03-15	ITVTON: Virtual Try-On Diffusion Transformer Based on Integrated Image and Text	Haifeng Ni et.al.	2501.16757	null
2025-03-11	MF-VITON: High-Fidelity Mask-Free Virtual Try-On with Minimal Input	Zhenchen Wan et.al.	2503.08650	null
2025-03-11	RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency	Siqi Li et.al.	2501.08682	null
2025-02-20	CrossVTON: Mimicking the Logic Reasoning on Cross-category Virtual Try-on guided by Tri-zone Priors	Donghao Luo et.al.	2502.14373	null
2025-02-05	Dress-1-to-3: Single Image to Simulation-Ready 3D Outfit with Diffusion Prior and Differentiable Physics	Xuan Li et.al.	2502.03449	null
2025-02-03	MFP-VTON: Enhancing Mask-Free Person-to-Person Virtual Try-On via Diffusion Transformer	Le Shen et.al.	2502.01626	null
2025-01-26	IPVTON: Image-based 3D Virtual Try-on with Image Prompt Adapter	Xiaojing Zhong et.al.	2501.15616	null
2025-01-26	Cross-Cultural Fashion Design via Interactive Large Language Models and Diffusion Models	Spencer Ramsey et.al.	2501.15571	null
2025-01-20	EfficientVITON: An Efficient Virtual Try-On Model using Optimized Diffusion Process	Mostafa Atef et.al.	2501.11776	null
2025-01-20	CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation	Zheng Chong et.al.	2501.11325	link
2025-01-17	Disharmony: Forensics using Reverse Lighting Harmonization	Philip Wootaek Shin et.al.	2501.10212	null
2025-01-12	ODPG: Outfitting Diffusion with Pose Guided Condition	Seohyun Lee et.al.	2501.06769	null
2025-01-10	MC-VTON: Minimal Control Virtual Try-On Diffusion Transformer	Junsheng Luan et.al.	2501.03630	null
2025-01-09	1-2-1: Renaissance of Single-Network Paradigm for Virtual Try-On	Shuliang Ning et.al.	2501.05369	null
2025-01-08	Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling	Nannan Li et.al.	2501.04666	null
2025-01-07	HYB-VITON: A Hybrid Approach to Virtual Try-On Combining Explicit and Implicit Warping	Kosuke Takemoto et.al.	2501.03910	link
2025-01-07	VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control	Yuanpeng Tu et.al.	2501.01427	null
2024-12-25	DRDM: A Disentangled Representations Diffusion Model for Synthesizing Realistic Person Images	Enbo Huang et.al.	2412.18797	null
2024-12-22	PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask	Jeongho Kim et.al.	2412.16978	link
2024-12-19	DiffusionTrend: A Minimalist Approach to Virtual Fashion Try-On	Wengyi Zhan et.al.	2412.14465	null
2024-12-19	FashionComposer: Compositional Fashion Image Generation	Sihui Ji et.al.	2412.14168	null

(back to top)

Visual Edit

Publish Date	Title	Authors	PDF	Code
2025-06-25	EditP23: 3D Editing via Propagation of Image Prompts to Multi-View	Roi Bar-On et.al.	2506.20652	null
2025-06-25	Towards Efficient Exemplar Based Image Editing with Multimodal VLMs	Avadhoot Jadhav et.al.	2506.20155	null
2025-06-25	OmniGen2: Exploration to Advanced Multimodal Generation	Chenyuan Wu et.al.	2506.18871	null
2025-06-24	SceneCrafter: Controllable Multi-View Driving Scene Editing	Zehao Zhu et.al.	2506.19488	null
2025-06-24	LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning	Chenjian Gao et.al.	2506.10082	null
2025-06-23	Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency Models	Ilia Beletskii et.al.	2506.19103	null
2025-06-23	Let Your Video Listen to Your Music!	Xinyu Zhang et.al.	2506.18881	null
2025-06-23	CPAM: Context-Preserving Adaptive Manipulation for Zero-Shot Real Image Editing	Dinh-Khoi Vo et.al.	2506.18438	null
2025-06-23	Instability in Diffusion ODEs: An Explanation for Inaccurate Image Reconstruction	Han Zhang et.al.	2506.18290	null
2025-06-20	BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing	Jiacheng Chen et.al.	2506.17450	null
2025-06-20	FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation	Fan Yang et.al.	2506.16806	null
2025-06-19	Arch-Router: Aligning LLM Routing with Human Preferences	Co Tran et.al.	2506.16655	null
2025-06-18	VectorEdits: A Dataset and Benchmark for Instruction-Based Editing of Vector Graphics	Josef Kuchař et.al.	2506.15903	null
2025-06-17	Causally Steered Diffusion for Automated Video Counterfactual Generation	Nikos Spyrou et.al.	2506.14404	link
2025-06-16	AttentionDrag: Exploiting Latent Correlation Knowledge in Pre-trained Diffusion Models for Image Editing	Biao Yang et.al.	2506.13301	null
2025-06-15	Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing	Zhuoying Li et.al.	2506.13827	null
2025-06-15	ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies	Chenglin Wang et.al.	2506.12830	null
2025-06-14	Good Noise Makes Good Edits: A Training-Free Diffusion-Based Video Editing with Image and Text Prompts	Saemee Choi et.al.	2506.12520	null
2025-06-13	SphereDrag: Spherical Geometry-Aware Panoramic Image Editing	Zhiao Feng et.al.	2506.11863	null
2025-06-13	Consistent Video Editing as Flow-Driven Image-to-Video Generation	Ge Wang et.al.	2506.07713	null
2025-06-12	VINCIE: Unlocking In-context Image Editing from Video	Leigang Qu et.al.	2506.10941	null
2025-06-12	Edit360: 2D Image Edits to 3D Assets from Any Angle	Junchao Huang et.al.	2506.10507	null
2025-06-12	Towards Reliable Identification of Diffusion-based Image Manipulations	Alex Costanzino et.al.	2506.05466	null
2025-06-11	EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits	Ron Yosef et.al.	2506.09988	null
2025-06-11	ELBO-T2IAlign: A Generic ELBO-Based Method for Calibrating Pixel-level Text-Image Alignment in Diffusion Models	Qin Zhou et.al.	2506.09740	null
2025-06-11	Ming-Omni: A Unified Multimodal Model for Perception and Generation	Inclusion AI et.al.	2506.09344	link
2025-06-11	Fine-Grained Spatially Varying Material Selection in Images	Julia Guerrero-Viu et.al.	2506.09023	null
2025-06-10	Do Concept Replacement Techniques Really Erase Unacceptable Concepts?	Anudeep Das et.al.	2506.08991	null
2025-06-10	RoboSwap: A GAN-driven Video Diffusion Framework For Unsupervised Robot Arm Swapping	Yang Bai et.al.	2506.08632	null
2025-06-09	Highly Compressed Tokenizer Can Generate Without Training	L. Lao Beyer et.al.	2506.08257	link
2025-06-09	PairEdit: Learning Semantic Variations for Exemplar-based Image Editing	Haoguang Lu et.al.	2506.07992	link
2025-06-09	Diffusion Counterfactual Generation with Semantic Abduction	Rajat Rasal et.al.	2506.07883	link
2025-06-09	DragNeXt: Rethinking Drag-Based Image Editing	Yuan Zhou et.al.	2506.07611	null
2025-06-09	Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding	Boyu Chen et.al.	2506.07576	null
2025-06-08	Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning	Tianyi Bai et.al.	2506.07227	null
2025-06-08	TV-LiVE: Training-Free, Text-Guided Video Editing via Layer Informed Vitality Exploitation	Min-Jung Kim et.al.	2506.07205	null
2025-06-06	Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models	Yifu Qiu et.al.	2506.06006	link
2025-06-06	FADE: Frequency-Aware Diffusion Model Factorization for Video Editing	Yixuan Zhu et.al.	2506.05934	link
2025-06-06	SeedEdit 3.0: Fast and High-Quality Generative Image Editing	Peng Wang et.al.	2506.05083	null
2025-06-05	FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing	Guangzhao Li et.al.	2506.05046	null
2025-06-05	Invisible Backdoor Triggers in Image Editing Model via Deep Watermarking	Yu-Feng Chen et.al.	2506.04879	null
2025-06-05	FullDiT2: Efficient In-Context Conditioning for Video Diffusion Transformers	Xuanhua He et.al.	2506.04213	null
2025-06-04	HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation	Hermann Kumbong et.al.	2506.04421	null
2025-06-04	Is Perturbation-Based Image Protection Disruptive to Image Editing?	Qiuyu Tang et.al.	2506.04394	null
2025-06-04	UNIC: Unified In-Context Video Editing	Zixuan Ye et.al.	2506.04216	null
2025-06-04	Image Editing As Programs with Diffusion Models	Yujia Hu et.al.	2506.04158	null
2025-06-04	UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation	Bin Lin et.al.	2506.03147	null
2025-06-04	MedEBench: Revisiting Text-instructed Image Editing on Medical Domain	Minghao Liu et.al.	2506.01921	null
2025-06-03	RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions	Bimsara Pathiraja et.al.	2506.03448	null
2025-06-03	ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions	Di Chang et.al.	2506.03107	null
2025-06-03	DCI: Dual-Conditional Inversion for Boosting Diffusion-Based Image Editing	Zixiang Li et.al.	2506.02560	null
2025-06-03	RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers	Yan Gong et.al.	2506.02528	null
2025-06-02	IMAGHarmony: Controllable Image Editing with Consistent Object Quantity and Layout	Fei Shen et.al.	2506.01949	null
2025-06-02	OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation	Sen Liang et.al.	2506.01801	null
2025-06-02	Unlocking Aha Moments via Reinforcement Learning: Advancing Collaborative Visual Comprehension and Generation	Kaihang Pan et.al.	2506.01480	null
2025-06-02	DNAEdit: Direct Noise Alignment for Text-Guided Rectified Flow Editing	Chenxi Xie et.al.	2506.01430	null
2025-06-01	Motion-Aware Concept Alignment for Consistent Video Editing	Tong Zhang et.al.	2506.01004	null
2025-05-31	Concept-Centric Token Interpretation for Vector-Quantized Generative Models	Tianze Yang et.al.	2506.00698	null
2025-05-30	MiniMax-Remover: Taming Bad Noise Helps Video Object Removal	Bojia Zi et.al.	2505.24873	null
2025-05-29	Cora: Correspondence-aware image editing using few step diffusion	Amirhossein Almohammadi et.al.	2505.23907	null
2025-05-29	LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers	Yusuf Dalva et.al.	2505.23758	null
2025-05-29	Weakly-supervised Localization of Manipulated Image Regions Using Multi-resolution Learned Features	Ziyong Wang et.al.	2505.23586	null
2025-05-29	Video Editing for Audio-Visual Dubbing	Binyamin Manela et.al.	2505.23406	link
2025-05-29	FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing	Jeongsol Kim et.al.	2505.23145	link
2025-05-29	Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing	Tongtong Su et.al.	2505.23134	link
2025-05-28	HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer	Qi Cai et.al.	2505.22705	link
2025-05-28	VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use	Mingyuan Wu et.al.	2505.19255	null
2025-05-27	Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion	Yang Yang et.al.	2505.21593	null
2025-05-27	Imago Obscura: An Image Privacy AI Co-pilot to Enable Identification and Mitigation of Risks	Kyzyl Monteiro et.al.	2505.20916	null
2025-05-27	InstGenIE: Generative Image Editing Made Efficient with Mask-aware Caching and Scheduling	Xiaoxiao Jiang et.al.	2505.20600	null
2025-05-26	What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models	Lorenzo Baraldi et.al.	2505.20405	null
2025-05-26	ImgEdit: A Unified Image Editing Dataset and Benchmark	Yang Ye et.al.	2505.20275	link
2025-05-26	StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation	Yi Wu et.al.	2505.19874	null
2025-05-26	TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs	Juntong Wang et.al.	2505.19535	null
2025-05-26	Understanding Generative AI Capabilities in Everyday Image Editing Tasks	Mohammad Reza Taesiri et.al.	2505.16181	null
2025-05-25	Beyond Editing Pairs: Fine-Grained Instructional Image Editing via Multi-Scale Learnable Regions	Chenrui Ma et.al.	2505.19352	null
2025-05-25	SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation	Shenggan Cheng et.al.	2505.19151	null
2025-05-25	MIND-Edit: MLLM Insight-Driven Editing via Language-Vision Projection	Shuyu Wang et.al.	2505.19149	null
2025-05-24	REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing	Weihan Xu et.al.	2505.18880	null
2025-05-24	Affective Image Editing: Shaping Emotional Factors via Text Descriptions	Peixuan Zhang et.al.	2505.18699	null
2025-05-24	Improved Immiscible Diffusion: Accelerate Diffusion Training by Reducing Its Miscibility	Yiheng Li et.al.	2505.18521	link
2025-05-23	DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval	Yuxin Yang et.al.	2505.17796	null
2025-05-23	R-Genie: Reasoning-Guided Generative Image Editing	Dong Zhang et.al.	2505.17768	null
2025-05-22	KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models	Yongliang Wu et.al.	2505.16707	null
2025-05-21	FragFake: A Dataset for Fine-Grained Detection of Edited Images with Vision Language Models	Zhen Sun et.al.	2505.15644	link
2025-05-20	DragLoRA: Online Optimization of LoRA Adapters for Drag-based Image Editing in Diffusion Model	Siwei Xia et.al.	2505.12427	link
2025-05-20	CompBench: Benchmarking Complex Instruction-guided Image Editing	Bohan Jia et.al.	2505.12200	null
2025-05-18	From Shots to Stories: LLM-Assisted Video Editing with Unified Language Representations	Yuzhi Li et.al.	2505.12237	null
2025-05-16	X-Edit: Detecting and Localizing Edits in Images Altered by Text-Guided Diffusion Models	Valentina Bazyleva et.al.	2505.11753	null
2025-05-16	GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing	Yusu Qian et.al.	2505.11493	null
2025-05-15	3D-Fixup: Advancing Photo Editing with 3D Priors	Yen-Chi Cheng et.al.	2505.10566	null
2025-05-15	IntrinsicEdit: Precise generative image manipulation in intrinsic space	Linjie Lyu et.al.	2505.08889	null
2025-05-14	Don't Forget your Inverse DDIM for Image Editing	Guillermo Gomez-Trenado et.al.	2505.09571	null
2025-05-12	MDE-Edit: Masked Dual-Editing for Multi-Object Image Editing via Diffusion Models	Hongyang Zhu et.al.	2505.05101	null
2025-05-11	DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models	Junhao Xia et.al.	2505.07057	null
2025-05-11	Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation	Chao Liao et.al.	2505.05472	null
2025-05-08	GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing	Tong Wang et.al.	2505.04915	null
2025-05-07	Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers	Divyansh Srivastava et.al.	2505.04718	null
2025-05-07	Multi-turn Consistent Image Editing	Zijun Zhou et.al.	2505.04320	null
2025-05-07	Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction	Inclusion AI et.al.	2505.02471	link
2025-05-06	MambaStyle: Efficient StyleGAN Inversion for Real Image Editing with State-Space Models	Jhon Lopez et.al.	2505.15822	null
2025-05-06	Step1X-Edit: A Practical Framework for General Image Editing	Shiyu Liu et.al.	2504.17761	link
2025-05-05	SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing	Ming Li et.al.	2505.02370	link
2025-05-04	Video Forgery Detection for Surveillance Cameras: A Review	Noor B. Tayfor et.al.	2505.03832	null
2025-05-02	Improving Editability in Image Generation with Layer-wise Memory	Daneul Kim et.al.	2505.01079	null
2025-05-02	A Rusty Link in the AI Supply Chain: Detecting Evil Configurations in Model Repositories	Ziqi Ding et.al.	2505.01067	null
2025-05-02	Photoshop Batch Rendering Using Actions for Stylistic Video Editing	Tessa De La Fuente et.al.	2505.01001	null
2025-05-01	InstructAttribute: Fine-grained Object Attributes editing with Instruction	Xingxi Yin et.al.	2505.00751	null
2025-05-01	Controllable Weather Synthesis and Removal with Video Diffusion Models	Chih-Hao Lin et.al.	2505.00704	null
2025-05-01	Towards Scalable Human-aligned Benchmark for Text-guided Image Editing	Suho Ryu et.al.	2505.00502	link
2025-04-30	PixelHacker: Image Inpainting with Structural and Semantic Consistency	Ziyang Xu et.al.	2504.20438	null
2025-04-29	In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer	Zechuan Zhang et.al.	2504.20690	null
2025-04-27	CapsFake: A Multimodal Capsule Network for Detecting Instruction-Guided Deepfakes	Tuan Nguyen et.al.	2504.19212	null
2025-04-26	REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models	Gal Almog et.al.	2504.18989	link
2025-04-24	DCT-Shield: A Robust Frequency Domain Defense against Malicious Image Editing	Aniruddha Bala et.al.	2504.17894	null
2025-04-24	VEU-Bench: Towards Comprehensive Understanding of Video Editing	Bozheng Li et.al.	2504.17828	null
2025-04-24	Generative Fields: Uncovering Hierarchical Feature Control for StyleGAN via Inverted Receptive Fields	Zhuo He et.al.	2504.17712	null
2025-04-24	Enhancing Variational Autoencoders with Smooth Robust Latent Encoding	Hyomin Lee et.al.	2504.17219	null
2025-04-24	Vidi: Large Multimodal Models for Video Understanding and Editing	Vidi Team et.al.	2504.15681	null
2025-04-22	Efficient Temporal Consistency in Diffusion-Based Video Editing with Adaptor Modules: A Theoretical Framework	Xinyuan Song et.al.	2504.16016	null
2025-04-22	Structure-Preserving Zero-Shot Image Editing via Stage-Wise Latent Injection in Diffusion Models	Dasol Jeong et.al.	2504.15723	null
2025-04-21	MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World	Ankit Dhiman et.al.	2504.15397	null
2025-04-21	Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach	Lvpan Cai et.al.	2504.11922	link
2025-04-20	MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation	Siyi Jiao et.al.	2504.14606	null
2025-04-19	Visual Prompting for One-shot Controllable Video Editing without Inversion	Zhengbo Zhang et.al.	2504.14335	null
2025-04-19	PRISM: A Unified Framework for Photorealistic Reconstruction and Intrinsic Scene Modeling	Alara Dirik et.al.	2504.14219	null
2025-04-18	Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation	Fulvio Sanguigni et.al.	2504.14011	null
2025-04-18	Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing	Joowon Kim et.al.	2504.13490	null
2025-04-17	Image Editing with Diffusion Models: A Survey	Jia Wang et.al.	2504.13226	null
2025-04-17	$\texttt{Complex-Edit}$ : CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark	Siwei Yang et.al.	2504.13143	null
2025-04-17	UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models	Guanlong Jiao et.al.	2504.13109	null
2025-04-17	Image-Editing Specialists: An RLAIF Approach for Diffusion Models	Elior Benarous et.al.	2504.12833	link
2025-04-17	SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding	Qianqian Sun et.al.	2504.12704	null
2025-04-17	DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency	Mengshi Qi et.al.	2504.12080	link
2025-04-17	Understanding Attention Mechanism in Video Diffusion Models	Bingyan Liu et.al.	2504.12027	null
2025-04-14	Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing	Taihang Hu et.al.	2504.10434	link
2025-04-14	Analysis of Attention in Video Diffusion Transformers	Yuxin Wen et.al.	2504.10317	null
2025-04-14	TAPNext: Tracking Any Point (TAP) as Next Token Prediction	Artem Zholus et.al.	2504.05579	null
2025-04-13	SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow	Kenan Tang et.al.	2504.09697	link
2025-04-13	CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models	Pooja Guhan et.al.	2504.09472	null
2025-04-11	CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model	Ruohao Zhan et.al.	2504.08259	null
2025-04-10	POEM: Precise Object-level Editing via MLLM control	Marco Schouten et.al.	2504.08111	null
2025-04-10	Learning Universal Features for Generalizable Image Forgery Localization	Hengrun Zhao et.al.	2504.07462	link
2025-04-10	Routing to the Right Expertise: A Trustworthy Judge for Instruction-based Image Editing	Chenxi Sun et.al.	2504.07424	null
2025-04-09	FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution	Gene Chou et.al.	2504.07093	link
2025-04-08	VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing	Juan Luis Gonzalez Bello et.al.	2504.07146	null
2025-04-08	Transfer between Modalities with MetaQueries	Xichen Pan et.al.	2504.06256	null
2025-04-08	Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model	Qi Mao et.al.	2504.05594	null
2025-04-08	Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing	Xiangyu Zhao et.al.	2504.02826	link
2025-04-07	CREA: A Collaborative Multi-Agent Framework for Creative Content Generation with Diffusion Models	Kavana Venkatesh et.al.	2504.05306	null
2025-04-07	Disentangling Instruction Influence in Diffusion Transformers for Parallel Multi-Instruction-Guided Image Editing	Hui Liu et.al.	2504.04784	null
2025-04-07	MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models	Wulin Xie et.al.	2504.03641	null
2025-04-04	Synthesizing Optimal Object Selection Predicates for Image Editing using Lattices	Yang He et.al.	2504.03155	null
2025-04-03	How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models	Pascal Chang et.al.	2504.03072	null
2025-04-03	VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning	Xianwei Zhuang et.al.	2504.02949	link
2025-04-03	Concept Lancet: Image Editing with Compositional Representation Transplant	Jinqi Luo et.al.	2504.02828	null
2025-04-03	GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation	Zhiyuan Yan et.al.	2504.02782	link
2025-04-03	ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement	Runhui Huang et.al.	2504.01934	null
2025-04-02	FreSca: Unveiling the Scaling Space in Diffusion Models	Chao Huang et.al.	2504.02154	null
2025-04-02	A Diffusion-Based Framework for Occluded Object Movement	Zheng-Peng Duan et.al.	2504.01873	null
2025-03-31	AI2Agent: An End-to-End Framework for Deploying AI Projects as Autonomous Agents	Jiaxiang Chen et.al.	2503.23948	link
2025-03-31	Training-Free Text-Guided Image Editing with Visual Autoregressive Model	Yufei Wang et.al.	2503.23897	link
2025-03-30	Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging	Amar Kumar et.al.	2503.23618	null
2025-03-30	ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025	Tianming Liang et.al.	2503.23509	link
2025-03-30	SketchVideo: Sketch-based Video Generation and Editing	Feng-Lin Liu et.al.	2503.23284	null
2025-03-29	FreeInv: Free Lunch for Improving DDIM Inversion	Yuxiang Bao et.al.	2503.23035	null
2025-03-29	FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model	Jun Zhou et.al.	2503.19839	null
2025-03-28	Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance	Haijie Yang et.al.	2503.22225	null
2025-03-28	LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing	Achint Soni et.al.	2503.21541	link
2025-03-26	Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising	Yan-Bo Lin et.al.	2503.20782	null
2025-03-26	EditCLIP: Representation Learning for Image Editing	Qian Wang et.al.	2503.20318	link
2025-03-26	Wan: Open and Advanced Large-Scale Video Generative Models	WanTeam et.al.	2503.20314	link
2025-03-26	InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction	Yuhui Wu et.al.	2503.20287	link
2025-03-25	Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning	Sherry X. Chen et.al.	2503.18406	link
2025-03-25	Shot Sequence Ordering for Video Editing: Benchmarks, Metrics, and Cinematology-Inspired Computing Methods	Yuzhi Li et.al.	2503.17975	null
2025-03-24	FDS: Frequency-Aware Denoising Score for Text-Guided Latent Diffusion Image Editing	Yufan Ren et.al.	2503.19191	null
2025-03-24	Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance	Sicong Feng et.al.	2503.18386	null
2025-03-24	MaSS13K: A Matting-level Semantic Segmentation Benchmark	Chenxi Xie et.al.	2503.18364	link
2025-03-23	Collaborating with AI Agents: Field Experiments on Teamwork, Productivity, and Performance	Harang Ju et.al.	2503.18238	link
2025-03-23	What Time Tells Us? An Explorative Study of Time Awareness Learned from Static Images	Dongheng Lin et.al.	2503.17899	null
2025-03-23	Multi-focal Conditioned Latent Diffusion for Person Image Synthesis	Jiaqi Liu et.al.	2503.15686	link
2025-03-22	InstructVEdit: A Holistic Approach for Instructional Video Editing	Chi Zhang et.al.	2503.17641	null
2025-03-22	Guidance Free Image Editing via Explicit Conditioning	Mehdi Noroozi et.al.	2503.17593	null
2025-03-21	HyperNVD: Accelerating Neural Video Decomposition via Hypernetworks	Maria Pilligua et.al.	2503.17276	null
2025-03-21	DCEdit: Dual-Level Controlled Image Editing via Precisely Localized Semantics	Yihan Hu et.al.	2503.16795	null
2025-03-20	FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing	Tianyi Wei et.al.	2503.16153	null
2025-03-20	Single Image Iterative Subject-driven Generation and Editing	Yair Shpitzer et.al.	2503.16025	link
2025-03-19	VEGGIE: Instructional Editing and Reasoning of Video Concepts with Grounded Generation	Shoubin Yu et.al.	2503.14350	null
2025-03-18	ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing	Yulin Pan et.al.	2503.14482	null
2025-03-18	TarPro: Targeted Protection against Malicious Image Editing	Kaixin Shen et.al.	2503.13994	null
2025-03-17	FiVE: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models	Minghan Li et.al.	2503.13684	null
2025-03-17	Unified Autoregressive Visual Generation and Understanding with Continuous Tokens	Lijie Fan et.al.	2503.13436	null
2025-03-17	Edit Transfer: Learning Image Editing via Vision In-Context Relations	Lan Chen et.al.	2503.13327	null
2025-03-17	GIFT: Generated Indoor video frames for Texture-less point tracking	Jianzheng Huang et.al.	2503.12944	null
2025-03-17	DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode	Junjia Huang et.al.	2503.12838	null
2025-03-16	UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing	Tsu-Jui Fu et.al.	2503.12652	null
2025-03-16	Personalize Anything for Free with Diffusion Transformer	Haoran Feng et.al.	2503.12590	null
2025-03-14	Upcycling Text-to-Image Diffusion Models for Multi-Task Capabilities	Ruchika Chavhan et.al.	2503.11905	null
2025-03-14	RASA: Replace Anyone, Say Anything -- A Training-Free Framework for Audio-Driven and Universal Portrait Video Editing	Tianrui Pan et.al.	2503.11571	null
2025-03-14	LUSD: Localized Update Score Distillation for Text-Guided Image Editing	Worameth Chinchuthakun et.al.	2503.11054	link
2025-03-14	V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes	Yanming Zhang et.al.	2503.10634	null
2025-03-14	On the Limitations of Vision-Language Models in Understanding Image Transforms	Ahmad Mustafa Anis et.al.	2503.09837	null
2025-03-13	Fine-Tuning Diffusion Generative Models via Rich Preference Optimization	Hanyang Zhao et.al.	2503.11720	null
2025-03-13	CoSTA $\ast$ : Cost-Sensitive Toolpath Agent for Multi-turn Image Editing	Advait Gupta et.al.	2503.10613	link
2025-03-13	EEdit : Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing	Zexuan Yan et.al.	2503.10270	link
2025-03-13	MoEdit: On Learning Quantity Perception for Multi-object Image Editing	Yanfeng Li et.al.	2503.10112	link
2025-03-13	Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models	Armando Fortes et.al.	2503.08434	null
2025-03-12	Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space	Yifan Zhou et.al.	2503.09419	link
2025-03-12	InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images	Jiun Tian Hoe et.al.	2503.09130	null
2025-03-12	OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting	Yongsheng Yu et.al.	2503.08677	null
2025-03-11	Aligning Text to Image in Diffusion Models is Easier Than You Think	Jaa-Yeon Lee et.al.	2503.08250	link
2025-03-11	ObjectMover: Generative Object Movement with Video Prior	Xin Yu et.al.	2503.08037	null
2025-03-11	CAD-VAE: Leveraging Correlation-Aware Latents for Comprehensive Fair Disentanglement	Chenrui Ma et.al.	2503.07938	null
2025-03-11	VACE: All-in-One Video Creation and Editing	Zeyinzi Jiang et.al.	2503.07598	null
2025-03-10	Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model	Lixue Gong et.al.	2503.07703	null
2025-03-10	TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation	Victor Shea-Jay Huang et.al.	2503.07050	null
2025-03-10	Interactive Tumor Progression Modeling via Sketch-Based Image Editing	Gexin Huang et.al.	2503.06809	null
2025-03-10	VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control	Yuxuan Bian et.al.	2503.05639	link
2025-03-09	Consistent Image Layout Editing with Diffusion Models	Tao Xia et.al.	2503.06419	null
2025-03-08	Get In Video: Add Anything You Want to the Video	Shaobin Zhuang et.al.	2503.06268	null
2025-03-08	X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation	Jian Ma et.al.	2503.06134	link
2025-03-07	Towards Locally Explaining Prediction Behavior via Gradual Interventions and Measuring Property Gradients	Niklas Penzel et.al.	2503.05424	null
2025-03-06	Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models	Rui Jiang et.al.	2503.04215	null
2025-03-05	GuardDoor: Safeguarding Against Malicious Diffusion Editing via Protective Backdoors	Yaopei Zeng et.al.	2503.03944	null
2025-03-04	h-Edit: Effective and Flexible Diffusion-Based Editing via Doob's h-Transform	Toan Nguyen et.al.	2503.02187	link
2025-03-03	VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors	Juil Koo et.al.	2503.01107	null
2025-03-01	GenVDM: Generating Vector Displacement Maps From a Single Image	Yuezhi Yang et.al.	2503.00605	null
2025-02-27	Tight Inversion: Image-Conditioned Inversion for Real Image Editing	Edo Kadosh et.al.	2502.20376	null
2025-02-27	Identity-preserving Distillation Sampling by Fixed-Point Iterator	SeonHwa Kim et.al.	2502.19930	null
2025-02-26	SVGEditBench V2: A Benchmark for Instruction-based SVG Editing	Kunato Nishina et.al.	2502.19453	link
2025-02-26	Bayesian Optimization for Controlled Image Editing via LLMs	Chengkun Cai et.al.	2502.18116	null
2025-02-25	KV-Edit: Training-Free Image Editing for Precise Background Preservation	Tianrui Zhu et.al.	2502.17363	link
2025-02-24	VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing	Xiangpeng Yang et.al.	2502.17258	null
2025-02-23	PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data	Shijie Huang et.al.	2502.14397	link
2025-02-22	DualNeRF: Text-Driven 3D Scene Editing via Dual-Field Representation	Yuxuan Xiong et.al.	2502.16302	null
2025-02-18	AnyRefill: A Unified, Data-Efficient Framework for Left-Prompt-Guided Vision Tasks	Ming Xie et.al.	2502.11158	null
2025-02-14	PromptArtisan: Multi-instruction Image Editing in Single Pass with Complete Attention Control	Kunal Swami et.al.	2502.10258	null
2025-02-14	VideoDiff: Human-AI Video Co-Creation with Alternatives	Mina Huh et.al.	2502.10190	null
2025-02-14	Hands-off Image Editing: Language-guided Editing without any Task-specific Labeling, Masking or even Training	Rodrigo Santos et.al.	2502.10064	null
2025-02-14	SportsBuddy: Designing and Evaluating an AI-Powered Sports Video Storytelling Tool Through Real-World Deployment	Tica Lin et.al.	2502.08621	null
2025-02-10	Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists	Bojia Zi et.al.	2502.06734	null
2025-02-10	Predictive Red Teaming: Breaking Policies Without Breaking Robots	Anirudha Majumdar et.al.	2502.06575	null
2025-02-08	AdaFlow: Efficient Long Video Editing via Adaptive Attention Slimming And Keyframe Selection	Shuheng Zhang et.al.	2502.05433	null
2025-02-06	MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation	Jinbo Xing et.al.	2502.04299	null
2025-02-06	PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models	Aleksandar Cvejic et.al.	2502.04050	null
2025-02-06	DICE: Distilling Classifier-Free Guidance into Text Embeddings	Zhenyu Zhou et.al.	2502.03726	null
2025-02-05	Lost in Edits? A $λ$ -Compass for AIGC Provenance	Wenhao You et.al.	2502.04364	null
2025-02-05	REALEDIT: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations	Peter Sushko et.al.	2502.03629	null
2025-02-04	Exploring the latent space of diffusion models directly through singular value decomposition	Li Wang et.al.	2502.02225	null
2025-02-04	EditIQ: Automated Cinematic Editing of Static Wide-Angle Videos via Dialogue Interpretation and Saliency Cues	Rohit Girmaji et.al.	2502.02172	null
2025-02-04	Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation	JooHyun Kwon et.al.	2502.02091	null
2025-01-30	DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models	Ruofan Liang et.al.	2501.18590	null
2025-01-24	MATCHA:Towards Matching Anything	Fei Xue et.al.	2501.14945	null
2025-01-24	Training-Free Style and Content Transfer by Leveraging U-Net Skip Connections in Stable Diffusion 2.*	Ludovica Schaerf et.al.	2501.14524	null
2025-01-23	IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models	Jiayi Lei et.al.	2501.13920	null

(back to top)

Others

Publish Date	Title	Authors	PDF	Code
2025-06-25	A Computationally Aware Multi Objective Framework for Camera LiDAR Calibration	Venkat Karramreddy et.al.	2506.20636	null
2025-06-25	Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings	Ankit Shah et.al.	2506.20609	null
2025-06-25	Learning-Based Distance Estimation for 360° Single-Sensor Setups	Yitong Quan et.al.	2506.20586	null
2025-06-25	Communication-Aware Map Compression for Online Path-Planning: A Rate-Distortion Approach	Ali Reza Pedram et.al.	2506.20579	null
2025-06-25	HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction	Zhonghao Shi et.al.	2506.20566	null
2025-06-25	Reinforcement Learning Increases Wind Farm Power Production by Enabling Closed-Loop Collaborative Control	Andrew Mole et.al.	2506.20554	null
2025-06-25	Lightweight Multi-Frame Integration for Robust YOLO Object Detection in Videos	Yitong Quan et.al.	2506.20550	null
2025-06-25	{\tt RapidGBM}: An Efficient Tool for Fermi-GBM Visibility Checking and Data Analysis with a Case Study of EP240617a	Yun Wang et.al.	2506.20532	null
2025-06-25	Comparison between Causal and Acausal Diffusion: a Schwinger-Keldysh Effective Field Theory Perspective	Navid Abbasi et.al.	2506.20500	null
2025-06-25	Learning-based safety lifting monitoring system for cranes on construction sites	Hao Chen et.al.	2506.20475	null
2025-06-25	Enhanced Robotic Navigation in Deformable Environments using Learning from Demonstration and Dynamic Modulation	Lingyun Chen et.al.	2506.20376	null
2025-06-25	Producer-Fairness in Sequential Bundle Recommendation	Alexandre Rio et.al.	2506.20329	null
2025-06-25	Finding the Easy Way Through -- the Probabilistic Gap Planner for Social Robot Navigation	Malte Probst et.al.	2506.20320	null
2025-06-25	Computed tomography of propagating microwave photons	Qi-Ming Chen et.al.	2506.20318	null
2025-06-25	Real-Time Obstacle Avoidance Algorithms for Unmanned Aerial and Ground Vehicles	Jingwen Wei et.al.	2506.20311	null
2025-06-25	Analog OFDM based on Real-Time Fourier Transformation	Xiaolu Yang et.al.	2506.20287	null
2025-06-25	Dynamic Bandwidth Allocation for Hybrid Event-RGB Transmission	Pujing Yang et.al.	2506.20222	null
2025-06-25	Personalized Mental State Evaluation in Human-Robot Interaction using Federated Learning	Andrea Bussolan et.al.	2506.20212	null
2025-06-25	RaRa Clipper: A Clipper for Gaussian Splatting Based on Ray Tracer and Rasterizer	Da Li et.al.	2506.20202	null
2025-06-25	First experimental demonstration of plasma shape control in a tokamak through Model Predictive Control	Adriano Mele et.al.	2506.20096	null
2025-06-25	Adaptive Request Scheduling for CodeLLM Serving with SLA Guarantees	Shi Chang et.al.	2506.19677	null
2025-06-24	The Shape of Consumer Behavior: A Symbolic and Topological Analysis of Time Series	Pola Bereta et.al.	2506.19759	null
2025-06-24	MDR-DeePC: Model-Inspired Distributionally Robust Data-Enabled Predictive Control	Shihao Li et.al.	2506.19744	null
2025-06-24	NEAR $^2$ : A Nested Embedding Approach to Efficient Product Retrieval and Ranking	Shenbin Qian et.al.	2506.19743	null
2025-06-24	Dual-energy extraction for proton therapy and imaging: validation on a clinical synchrotron-based facility	Alexander A. Pryanichnikov et.al.	2506.19736	null
2025-06-24	SIP-IFVM: An observation-based magnetohydrodynamic model of coronal mass ejection	Haopeng Wang et.al.	2506.19711	null
2025-06-24	Health Sentinel: An AI Pipeline For Real-time Disease Outbreak Detection	Devesh Pant et.al.	2506.19548	null
2025-06-24	NTRL: Encounter Generation via Reinforcement Learning for Dynamic Difficulty Adjustment in Dungeons and Dragons	Carlo Romeo et.al.	2506.19530	null
2025-06-24	MATE: LLM-Powered Multi-Agent Translation Environment for Accessibility Applications	Aleksandr Algazinov et.al.	2506.19502	null
2025-06-24	An analytical model of depth-dose distributions for carbon-ion beams	Fulya Halıcılar et.al.	2506.19479	null
2025-06-24	Can Movable Antenna-enabled Micro-Mobility Replace UAV-enabled Macro-Mobility? A Physical Layer Security Perspective	Kaixuan Li et.al.	2506.19456	null
2025-06-24	Enhanced Fault Ride-Through Grid Forming with Transient Synchronisation Stability and Current Saturation	Youcefa Brahim Elkhalil et.al.	2506.19444	null
2025-06-24	Mem4Nav: Boosting Vision-and-Language Navigation in Urban Environments with a Hierarchical Spatial-Cognition Long-Short Memory System	Lixuan He et.al.	2506.19433	null
2025-06-24	Virtual Memory for 3D Gaussian Splatting	Jonathan Haberl et.al.	2506.19415	null
2025-06-24	Can theory-driven learning analytics dashboard enhance human-AI collaboration in writing learning? Insights from an empirical experiment	Angxuan Chen et.al.	2506.19364	null
2025-06-24	OpticalAging: Real-time Presbyopia Simulation for Inclusive Design via Tunable Lenses	Qing Zhang et.al.	2506.19307	null
2025-06-24	Ontology Neural Network and ORTSF: A Framework for Topological Reasoning and Delay-Robust Control	Jaehong Oh et.al.	2506.19277	null
2025-06-24	High-throughput spin-bath characterization of spin-defects in semiconductors	Abigail N. Poteshman et.al.	2506.19259	null
2025-06-24	Behavioral Anomaly Detection in Distributed Systems via Federated Contrastive Learning	Renzi Meng et.al.	2506.19246	null
2025-06-24	PicoSAM2: Low-Latency Segmentation In-Sensor for Edge Vision Applications	Pietro Bonazzi et.al.	2506.18807	null
2025-06-23	PRISM: Perceptual Recognition for Identifying Standout Moments in Human-Centric Keyframe Extraction	Mert Can Cakmak et.al.	2506.19168	null
2025-06-23	MinD: Unified Visual Imagination and Control via Hierarchical World Models	Xiaowei Chi et.al.	2506.18897	null
2025-06-23	OmniGen2: Exploration to Advanced Multimodal Generation	Chenyuan Wu et.al.	2506.18871	null
2025-06-23	LIGHTHOUSE: Fast and precise distance to shoreline calculations from anywhere on earth	Patrick Beukema et.al.	2506.18842	null
2025-06-23	STU-PID: Steering Token Usage via PID Controller for Efficient Large Language Model Reasoning	Aryasomayajula Ram Bharadwaj et.al.	2506.18831	null
2025-06-23	MLLP-VRAIN UPV system for the IWSLT 2025 Simultaneous Speech Translation Translation task	Jorge Iranzo-Sánchez et.al.	2506.18828	null
2025-06-23	ModeliHub: A Web-based, Federated Analytics Platform for Modelica-centric, Model-based Systems Engineering	Mohamad Omar Nachawati et.al.	2506.18790	null
2025-06-23	Flow-Aware Diffusion for Real-Time VR Restoration: Enhancing Spatiotemporal Coherence and Efficiency	Yitong Zhu et.al.	2506.18786	null
2025-06-23	NOVA: Navigation via Object-Centric Visual Autonomy for High-Speed Target Tracking in Unstructured GPS-Denied Environments	Alessandro Saviolo et.al.	2506.18689	null
2025-06-23	Efficient and Generalizable Speaker Diarization via Structured Pruning of Self-Supervised Models	Jiangyu Han et.al.	2506.18623	null
2025-06-23	New Power Decoupling Method for Grid Forming Inverter Based on Adaptive Virtual-Synchronous Machine in Weak Grids	Waleed Breesam et.al.	2506.18619	null
2025-06-23	Frequency Control in Microgrids: An Adaptive Fuzzy-Neural-Network Virtual Synchronous Generator	Waleed Breesam et.al.	2506.18611	null
2025-06-23	PG-LIO: Photometric-Geometric fusion for Robust LiDAR-Inertial Odometry	Nikhil Khedekar et.al.	2506.18583	null
2025-06-23	Multi-Rank Subspace Change-Point Detection for Monitoring Robotic Swarms	Jonghyeok Lee et.al.	2506.18562	null
2025-06-23	Efficient Beam Selection for ISAC in Cell-Free Massive MIMO via Digital Twin-Assisted Deep Reinforcement Learning	Jiexin Zhang et.al.	2506.18560	null
2025-06-23	ADNF-Clustering: An Adaptive and Dynamic Neuro-Fuzzy Clustering for Leukemia Prediction	Marco Aruta et.al.	2506.18396	null
2025-06-23	Robots and Children that Learn Together : Improving Knowledge Retention by Teaching Peer-Like Interactive Robots	Imene Tarakli et.al.	2506.18365	null
2025-06-23	TritonZ: A Remotely Operated Underwater Rover with Manipulator Arm for Exploration and Rescue Operations	Kawser Ahmed et.al.	2506.18343	null
2025-06-23	Programmable electro-optic frequency comb empowers integrated parallel convolution processing	Jinze He et.al.	2506.18310	null
2025-06-23	LLM-Integrated Digital Twins for Hierarchical Resource Allocation in 6G Networks	Majumder Haider et.al.	2506.18293	null
2025-06-20	Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition	Jiaqi Li et.al.	2506.17201	null
2025-06-20	Judo: A User-Friendly Open-Source Package for Sampling-Based Model Predictive Control	Albert H. Li et.al.	2506.17184	null
2025-06-20	A Set-valued Impact Law Approach for Modeling and Analysis of Rigid Contact Universal Joint with Clearance	Junaid Ali et.al.	2506.17183	null
2025-06-20	A tutorial overview of model predictive control for continuous crystallization: current possibilities and future perspectives	Collin R. Johnson et.al.	2506.17146	null
2025-06-20	Real-time Broadband RFI Excision for the Upgraded GMRT	Ruta Kale et.al.	2506.17131	null
2025-06-20	Rapid and Continuous Trust Evaluation for Effective Task Collaboration Through Siamese Model	Botao Zhu et.al.	2506.17128	null
2025-06-20	RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and Tracking	Teng Guo et.al.	2506.17119	null
2025-06-20	JANUS: Resilient and Adaptive Data Transmission for Enabling Timely and Efficient Cross-Facility Scientific Workflows	Vladislav Esaulov et.al.	2506.17084	null
2025-06-20	Opportunities for real-time process control of electrode properties in lithium-ion battery manufacturing	Noël Hallemans et.al.	2506.17048	null
2025-06-20	Probing dynamical axion quasiparticles with two-photon correlations	Daniel Boyanovsky et.al.	2506.17013	null
2025-06-20	Prmpt2Adpt: Prompt-Based Zero-Shot Domain Adaptation for Resource-Constrained Environments	Yasir Ali Farrukh et.al.	2506.16994	null
2025-06-20	Wi-Fi Sensing Tool Release: Gathering 802.11ax Channel State Information from a Commercial Wi-Fi Access Point	Zisheng Wang et.al.	2506.16957	null
2025-06-20	Multimodal Fused Learning for Solving the Generalized Traveling Salesman Problem in Robotic Task Planning	Jiaqi Chen et.al.	2506.16931	null
2025-06-20	Single-shot thermometry of simulated Bose--Einstein condensates using artificial intelligence	Jack Griffiths et.al.	2506.16925	null
2025-06-20	Real-Time Black-Box Optimization for Dynamic Discrete Environments Using Embedded Ising Machines	Tomoya Kashimata et.al.	2506.16924	null
2025-06-20	ROS 2 Agnocast: Supporting Unsized Message Types for True Zero-Copy Publish/Subscribe IPC	Takahiro Ishikawa-Aso et.al.	2506.16882	null
2025-06-20	Revolutionizing Validation and Verification: Explainable Testing Methodologies for Intelligent Automotive Decision-Making Systems	Halit Eris et.al.	2506.16876	null
2025-06-20	RS-Coded Adaptive Dynamic Network for Reliable Long-Term Information Transmission in Disturbed Multimode Fiber	Yang Hu et.al.	2506.16859	null
2025-06-20	Robust Dynamic Material Handling via Adaptive Constrained Evolutionary Reinforcement Learning	Chengpeng Hu et.al.	2506.16795	null
2025-06-20	Reinforcement learning for hybrid charging stations planning and operation considering fixed and mobile chargers	Yanchen Zhu et.al.	2506.16764	null
2025-06-18	Vision in Action: Learning Active Perception from Human Demonstrations	Haoyu Xiong et.al.	2506.15666	null
2025-06-18	BoxFusion: Reconstruction-Free Open-Vocabulary 3D Object Detection via Real-Time Multi-View Box Fusion	Yuqing Lan et.al.	2506.15610	null
2025-06-18	MicroRicci: A Greedy and Local Ricci Flow Solver for Self-Tuning Mesh Smoothing	Le Vu Anh et.al.	2506.15571	null
2025-06-18	PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction	Shufan Li et.al.	2506.15556	null
2025-06-18	Real-Time Initialization of Unknown Anchors for UWB-aided Navigation	Giulio Delama et.al.	2506.15518	null
2025-06-18	Model Predictive Path-Following Control for a Quadrotor	David Leprich et.al.	2506.15447	null
2025-06-18	A Real-time Endoscopic Image Denoising System	Yu Xing et.al.	2506.15395	null
2025-06-18	Evaluation Pipeline for systematically searching for Anomaly Detection Systems	Florian Rokohl et.al.	2506.15388	null
2025-06-18	Efficient Navigation Among Movable Obstacles using a Mobile Manipulator via Hierarchical Policy Learning	Taegeun Yang et.al.	2506.15380	null
2025-06-18	J3DAI: A tiny DNN-Based Edge AI Accelerator for 3D-Stacked CMOS Image Sensor	Benoit Tain et.al.	2506.15316	null
2025-06-18	AI-driven visual monitoring of industrial assembly tasks	Mattia Nardon et.al.	2506.15285	null
2025-06-18	Multi-Agent Reinforcement Learning for Autonomous Multi-Satellite Earth Observation: A Realistic Case Study	Mohamad A. Hady et.al.	2506.15207	null
2025-06-18	In-Context Learning for Gradient-Free Receiver Adaptation: Principles, Applications, and Theory	Matteo Zecchin et.al.	2506.15176	null
2025-06-18	Human Locomotion Implicit Modeling Based Real-Time Gait Phase Estimation	Yuanlong Ji et.al.	2506.15150	null
2025-06-18	I Know You're Listening: Adaptive Voice for HRI	Paige Tuttösí et.al.	2506.15107	null
2025-06-18	EmojiVoice: Towards long-term controllable expressivity in robot speech	Paige Tuttösí et.al.	2506.15085	null
2025-06-18	Make Your AUV Adaptive: An Environment-Aware Reinforcement Learning Framework For Underwater Tasks	Yimian Ding et.al.	2506.15082	null
2025-06-18	ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies	Jinyan Yuan et.al.	2506.14315	null
2025-06-17	GCN-Driven Reinforcement Learning for Probabilistic Real-Time Guarantees in Industrial URLLC	Eman Alqudah et.al.	2506.15011	null
2025-06-17	Mixed Traffic: A Perspective from Long Duration Autonomy	Filippos Tzortzoglou et.al.	2506.15004	null
2025-06-17	CNN-Enabled Scheduling for Probabilistic Real-Time Guarantees in Industrial URLLC	Eman Alqudah et.al.	2506.14987	null
2025-06-17	CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion	Jiahua Ma et.al.	2506.14769	null
2025-06-17	Technosignature Searches with Real-time Alert Brokers	Eleanor M. Gallay et.al.	2506.14744	null
2025-06-17	Casper: Inferring Diverse Intents for Assistive Teleoperation with Vision Language Models	Huihan Liu et.al.	2506.14727	null
2025-06-17	SkinCells: Sparse Skinning using Voronoi Cells	Egor Larionov et.al.	2506.14714	null
2025-06-17	Deep Learning-Based Prediction of High Explosive Induced Fluid Dynamics	Francis G. VanGessel et.al.	2506.14710	null
2025-06-17	Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers	Daniel D'souza et.al.	2506.14702	null
2025-06-17	Design an Editable Speech-to-Sign-Language Transformer System: A Human-Centered AI Approach	Yingchao Li et.al.	2506.14677	null
2025-06-17	ASAP-FE: Energy-Efficient Feature Extraction Enabling Multi-Channel Keyword Spotting on Edge Processors	Jongin Choi et.al.	2506.14657	null
2025-06-17	3DGS-IEval-15K: A Large-scale Image Quality Evaluation Database for 3D Gaussian-Splatting	Yuke Xing et.al.	2506.14642	null
2025-06-17	Low-code to fight climate change: the Climaborough project	Aaron Conrardy et.al.	2506.14623	null
2025-06-17	Deep Learning Surrogates for Real-Time Gas Emission Inversion	Thomas Newman et.al.	2506.14597	null
2025-06-17	Review of Machine Learning for Real-Time Analysis at the Large Hadron Collider experiments ALICE, ATLAS, CMS and LHCb	Laura Boggia et.al.	2506.14578	null
2025-06-17	GAMORA: A Gesture Articulated Meta Operative Robotic Arm for Hazardous Material Handling in Containment-Level Environments	Farha Abdul Wasay et.al.	2506.14513	null
2025-06-17	SimSpark: Interactive Simulation of Social Media Behaviors	Ziyue Lin et.al.	2506.14476	null
2025-06-17	MalGuard: Towards Real-Time, Accurate, and Actionable Detection of Malicious Packages in PyPI Ecosystem	Xingan Gao et.al.	2506.14466	null
2025-06-17	Active Digital Twins via Active Inference	Matteo Torzoni et.al.	2506.14453	null
2025-06-17	Socially Aware Robot Crowd Navigation via Online Uncertainty-Driven Risk Adaptation	Zhirui Sun et.al.	2506.14305	null
2025-06-17	Whole-Body Control Framework for Humanoid Robots with Heavy Limbs: A Model-Based Approach	Tianlin Zhang et.al.	2506.14278	null
2025-06-17	GHz spiking neuromorphic photonic chip with in-situ training	Jinlong Xiang et.al.	2506.14272	null
2025-06-16	Compact representation and long-time extrapolation of real-time data for quantum systems	Andre Erpenbeck et.al.	2506.13760	null
2025-06-16	Robust Recursive Fusion of Multiresolution Multispectral Images with Location-Aware Neural Networks	Haoqing Li et.al.	2506.13733	null
2025-06-16	BanditWare: A Contextual Bandit-based Framework for Hardware Prediction	Tainã Coleman et.al.	2506.13730	null
2025-06-16	How Real is CARLAs Dynamic Vision Sensor? A Study on the Sim-to-Real Gap in Traffic Object Detection	Kaiyuan Tan et.al.	2506.13722	null
2025-06-16	Direct visualization of visible-light hyperbolic plasmon polaritons in real space and time	Atreyie Ghosh et.al.	2506.13719	null
2025-06-16	HARMONI: Haptic-Guided Assistance for Unified Robotic Tele-Manipulation and Tele-Navigation	V. Sripada et.al.	2506.13704	null
2025-06-16	Photomagnetic-Chiral Anisotropy mediated by Chirality-Driven Asymmetric Spin Splitting	Tianwei Ouyang et.al.	2506.13696	null
2025-06-16	Integrated Pipeline for Monocular 3D Reconstruction and Finite Element Simulation in Industrial Applications	Bowen Zheng et.al.	2506.13573	null
2025-06-16	Controlled manipulation of solitons in a recirculating fiber loop using external potentials	François Copie et.al.	2506.13544	null
2025-06-16	UAV Object Detection and Positioning in a Mining Industrial Metaverse with Custom Geo-Referenced Data	Vasiliki Balaska et.al.	2506.13505	null
2025-06-16	Leveraging active learning-enhanced machine-learned interatomic potential for efficient infrared spectra prediction	Nitik Bhatia et.al.	2506.13486	null
2025-06-16	From Flat to Feeling: A Feasibility and Impact Study on Dynamic Facial Emotions in AI-Generated Avatars	Pegah Salehi et.al.	2506.13477	null
2025-06-16	SA-LUT: Spatial Adaptive 4D Look-Up Table for Photorealistic Style Transfer	Zerui Gong et.al.	2506.13465	link
2025-06-16	Block-wise Adaptive Caching for Accelerating Diffusion Policy	Kangye Ji et.al.	2506.13456	null
2025-06-16	Towards real-time additive-free dopamine detection at $10^{-8}$ mM with hardware accelerated platform integrated on camera	Ning Li et.al.	2506.13447	null
2025-06-16	Training Neural Networks by Optimizing Neuron Positions	Laura Erb et.al.	2506.13410	null
2025-06-16	HELENA: High-Efficiency Learning-based channel Estimation using dual Neural Attention	Miguel Camelo Botero et.al.	2506.13408	link
2025-06-16	A Model-Free Detection Method for Internal Short Circuits in Single Lithium-ion Cells Using Pseudo Open-Circuit Voltage Difference	Yangyang Xu et.al.	2506.13394	null
2025-06-16	Joint Optimization of Multi-UAV Deployment and 3D Positioning in Traffic-Aware Aerial Networks	Kamran Shafafi et.al.	2506.13287	null
2025-06-16	SONIC: Sound Optimization for Noise In Crowds	Pranav M N et.al.	2506.13272	null
2025-06-13	Reimagining Dance: Real-time Music Co-creation between Dancers and AI	Olga Vechtomova et.al.	2506.12008	null
2025-06-13	Robustness of Floquet topological phase at room temperature: a first-principles dynamics study	Ruiyi Zhou et.al.	2506.12005	null
2025-06-13	Learning Before Filtering: Real-Time Hardware Learning at the Detector Level	Boštjan Maček et.al.	2506.11981	null
2025-06-13	Secure API-Driven Research Automation to Accelerate Scientific Discovery	Tyler J. Skluzacek et.al.	2506.11950	null
2025-06-13	Palpation Alters Auditory Pain Expressions with Gender-Specific Variations in Robopatients	Chapa Sirithunge et.al.	2506.11906	null
2025-06-13	DMRS-Based Uplink Channel Estimation for MU-MIMO Systems with Location-Specific SCSI Acquisition	Jiawei Zhuang et.al.	2506.11899	null
2025-06-13	Enter: Graduated Realism: A Pedagogical Framework for AI-Powered Avatars in Virtual Reality Teacher Training	Judson Leroy Dean Haynes IV et.al.	2506.11890	null
2025-06-13	An Explainable AI Framework for Dynamic Resource Management in Vehicular Network Slicing	Haochen Sun et.al.	2506.11882	null
2025-06-13	Bistable random momentum transfer in a linear on-chip resonator	Tingyi Gu et.al.	2506.11859	null
2025-06-13	Framework of a multiscale data-driven digital twin of the muscle-skeletal system	Martina Paccini et.al.	2506.11821	null
2025-06-13	Diffusion-Based Electrocardiography Noise Quantification via Anomaly Detection	Tae-Seong Han et.al.	2506.11815	link
2025-06-13	SSPINNpose: A Self-Supervised PINN for Inertial Pose and Dynamics Estimation	Markus Gambietz et.al.	2506.11786	null
2025-06-13	Real-Time Feedback and Benchmark Dataset for Isometric Pose Evaluation	Abhishek Jaiswal et.al.	2506.11774	null
2025-06-13	Dynamic Collaborative Material Distribution System for Intelligent Robots In Smart Manufacturing	Ziren Xiao et.al.	2506.11723	null
2025-06-13	Modeling Urban Air Quality Using Taxis as Sensors	Anastasios Noulas et.al.	2506.11720	null
2025-06-13	Generalised Rate Control Approach For Stream Processing Applications	Ziren Xiao et.al.	2506.11710	null
2025-06-13	DMAF-Net: An Effective Modality Rebalancing Framework for Incomplete Multi-Modal Medical Image Segmentation	Libin Lan et.al.	2506.11691	null
2025-06-13	GraphRAG-Causal: A novel graph-augmented framework for causal reasoning and annotation in news	Abdul Haque et.al.	2506.11600	null
2025-06-13	Camera-based method for the detection of lifted truck axles using convolutional neural networks	Bachir Tchana Tankeu et.al.	2506.11574	null
2025-06-13	Scheduling Agile Earth Observation Satellites with Onboard Processing and Real-Time Monitoring	Antonio M. Mercado-Martínez et.al.	2506.11556	null
2025-06-12	InstaInpaint: Instant 3D-Scene Inpainting with Masked Large Reconstruction Model	Junqi You et.al.	2506.10980	null
2025-06-12	Discovery and Localization of the Swift-Observed FRB 20241228A in a Star-forming Host Galaxy	Alice P. Curtin et.al.	2506.10961	null
2025-06-12	Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors	Chen Yueh-Han et.al.	2506.10949	link
2025-06-12	Execution Guided Line-by-Line Code Generation	Boaz Lavon et.al.	2506.10948	link
2025-06-12	Non-Abelian dynamics on a cube: improving quantum compilation through qudit-based simulations	Jacky Jiang et.al.	2506.10945	null
2025-06-12	Building a Media Ecosystem Observatory from Scratch: Infrastructure, Methodology, and Insights	Zeynep Pehlivan et.al.	2506.10942	null
2025-06-12	MARS: Processing-In-Memory Acceleration of Raw Signal Genome Analysis Inside the Storage Subsystem	Melina Soysal et.al.	2506.10931	null
2025-06-12	Agentic Semantic Control for Autonomous Wireless Space Networks: Extending Space-O-RAN with MCP-Driven Distributed Intelligence	Eduardo Baena et.al.	2506.10925	null
2025-06-12	Adaptive Job Scheduling in Quantum Clouds Using Reinforcement Learning	Waylon Luo et.al.	2506.10889	null
2025-06-12	S3 Mirror: S3Mirror: Making Genomic Data Transfers Fast, Reliable, and Observable with DBOS	Steven Vasquez-Grinnell et.al.	2506.10886	null
2025-06-12	Modeling Trust Dynamics in Robot-Assisted Delivery: Impact of Trust Repair Strategies	Dong Hae Mangalindan et.al.	2506.10884	null
2025-06-12	Enhancing Medical Dialogue Generation through Knowledge Refinement and Dynamic Prompt Adjustment	Hongda Sun et.al.	2506.10877	link
2025-06-12	General Reference Frame Identification and Transformation in Unbalanced Power Systems	Francisco G. Montoya et.al.	2506.10835	null
2025-06-12	A novel visual data-based diagnostic approach for estimation of regime transition in pool boiling	Pranay Nirapure et.al.	2506.10832	null
2025-06-12	Efficiency Robustness of Dynamic Deep Learning Systems	Ravishka Rathnasuriya et.al.	2506.10831	link
2025-06-12	Grasp Prediction based on Local Finger Motion Dynamics	Dimitar Valkov et.al.	2506.10818	null
2025-06-12	Human-Robot Navigation using Event-based Cameras and Reinforcement Learning	Ignacio Bugueno-Cordova et.al.	2506.10790	null
2025-06-12	Hazel Deriver: A Live Editor for Constructing Rule-Based Derivations	Zhiyao Zhong et.al.	2506.10781	null
2025-06-12	Integrating Large Language Models into Text Animation: An Intelligent Editing System with Inline and Chat Interaction	Bao Zhang et.al.	2506.10762	null
2025-06-12	Grounded Vision-Language Navigation for UAVs with Open-Vocabulary Goal Understanding	Yuhang Zhang et.al.	2506.10756	null
2025-06-11	DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos	Chieh Hubert Lin et.al.	2506.09997	null
2025-06-11	Locomotion on Constrained Footholds via Layered Architectures and Model Predictive Control	Zachary Olkin et.al.	2506.09979	null
2025-06-11	SRLAgent: Enhancing Self-Regulated Learning Skills through Gamification and LLM Assistance	Wentao Ge et.al.	2506.09968	null
2025-06-11	Mechanism of Conductivity Enhancement of Polymers Employing Microbubble Lithography	Anand Dev Ranjan et.al.	2506.09957	null
2025-06-11	Microservices and Real-Time Processing in Retail IT: A Review of Open-Source Toolchains and Deployment Strategies	Aaditaa Vashisht et.al.	2506.09938	null
2025-06-11	Repeated ancilla reuse for logical computation on a neutral atom quantum computer	J. A. Muniz et.al.	2506.09936	null
2025-06-11	TransGI: Real-Time Dynamic Global Illumination With Object-Centric Neural Transfer Model	Yijie Deng et.al.	2506.09909	null
2025-06-11	Machine Learning-Based Classification of Oils Using Dielectric Properties and Microwave Resonant Sensing	Amit Baran Dey et.al.	2506.09867	null
2025-06-11	Multi-FPGA Synchronization and Data Communication for Quantum Control and Measurement	Yilun Xu et.al.	2506.09856	null
2025-06-11	Advancing Exchange Rate Forecasting: Leveraging Machine Learning and AI for Enhanced Accuracy in Global Financial Markets	Md. Yeasin Rahat et.al.	2506.09851	null
2025-06-11	Learning Quality from Complexity and Structure: A Feature-Fused XGBoost Model for Video Quality Assessment	Amritha Premkumar et.al.	2506.09795	null
2025-06-11	Human-robot collaborative transport personalization via Dynamic Movement Primitives and velocity scaling	Paolo Franceschi et.al.	2506.09697	null
2025-06-11	Searching for sub-TeV IceCube neutrinos correlated to sub-threshold GW events	Tista Mukherjee et.al.	2506.09694	null
2025-06-11	Early and Accurate Recession Detection Using Classifiers on the Anticipation-Precision Frontier	Pascal Michaillat et.al.	2506.09664	null
2025-06-11	Real-Time Network Traffic Forecasting with Missing Data: A Generative Model Approach	Lei Deng et.al.	2506.09647	null
2025-06-11	VAULT: A Mobile Mapping System for ROS 2-based Autonomous Robots	Miguel Á. González-Santamarta et.al.	2506.09583	null
2025-06-11	Real-time adaptive tracking of fluctuating relaxation rates in superconducting qubits	Fabrizio Berritta et.al.	2506.09576	null
2025-06-11	HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene	Jianing Chen et.al.	2506.09518	null
2025-06-11	A Survey on the Role of Artificial Intelligence and Machine Learning in 6G-V2X Applications	Donglin Wang et.al.	2506.09512	null
2025-06-11	ArcNeural: A Multi-Modal Database for the Gen-AI Era	Wu Min et.al.	2506.09467	null
2025-06-10	Rapid cardiac activation prediction for cardiac resynchronization therapy planning using geometric deep learning	Ehsan Naghavi et.al.	2506.08987	link
2025-06-10	Online Learning Control Strategies for Industrial Processes with Application for Loosening and Conditioning	Yue Wu et.al.	2506.08983	null
2025-06-10	Rethinking Range-View LiDAR Segmentation in Adverse Weather	Longyu Yang et.al.	2506.08979	null
2025-06-10	Yau-YauAL: A computer tool for solving nonlinear filtering problems	Yu Wang et.al.	2506.08976	null
2025-06-10	WIP: Large Language Model-Enhanced Smart Tutor for Undergraduate Circuit Analysis	Liangliang Chen et.al.	2506.08962	null
2025-06-10	CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks	Yixuan Li et.al.	2506.08931	null
2025-06-10	Implementing Keyword Spotting on the MCUX947 Microcontroller with Integrated NPU	Petar Jakuš et.al.	2506.08911	null
2025-06-10	HabSim: Architecture for modelling disruptions, propagation, detection and repair in deep space habitats	Luca Vaccino et.al.	2506.08903	null
2025-06-10	Real-Time Cascade Mitigation in Power Systems Using Influence Graph Improved by Reinforcement Learning	Kai Zhou et.al.	2506.08893	null
2025-06-10	Help or Hindrance: Understanding the Impact of Robot Communication in Action Teams	Tauhid Tanjim et.al.	2506.08892	null
2025-06-10	SmartAttack: Air-Gap Attack via Smartwatches	Mordechai Guri et.al.	2506.08866	null
2025-06-10	StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams	Zike Wu et.al.	2506.08862	link
2025-06-10	Fast Estimation of Globally Optimal Independent Contact Regions for Robust Grasping and Manipulation	Jonathan P. King et.al.	2506.08856	null
2025-06-10	Agile Reinforcement Learning for Real-Time Task Scheduling in Edge Computing	Amin Avan et.al.	2506.08850	link
2025-06-10	FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency	Yifei Su et.al.	2506.08822	null
2025-06-10	Enhancing Synthetic CT from CBCT via Multimodal Fusion: A Study on the Impact of CBCT Quality and Alignment	Maximilian Tschuchnig et.al.	2506.08716	null
2025-06-10	Balancing Fixed Number of Nodes Among Multiple Fixed Clusters	Paritosh Ranjan et.al.	2506.08715	null
2025-06-10	Industrial Flexibility Investment Under Uncertainty: A Multi-Stage Stochastic Framework Considering Energy and Reserve Market Participation	Amund Norland et.al.	2506.08638	null
2025-06-10	Plug-and-Play Linear Attention for Pre-trained Image and Video Restoration Models	Srinivasan Kidambi et.al.	2506.08520	link
2025-06-10	One Patch to Rule Them All: Transforming Static Patches into Dynamic Attacks in the Physical World	Xingshuo Han et.al.	2506.08482	null
2025-06-10	Silencing Empowerment, Allowing Bigotry: Auditing the Moderation of Hate Speech on Twitch	Prarabdh Shukla et.al.	2506.07667	link
2025-06-09	Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion	Xun Huang et.al.	2506.08009	null
2025-06-09	MADFormer: Mixed Autoregressive and Diffusion Transformers for Continuous Image Generation	Junhao Chen et.al.	2506.07999	null
2025-06-09	Unraveling Ethereum's Mempool: The Impact of Fee Fairness, Transaction Prioritization, and Consensus Efficiency	S M Mostaq Hossain et.al.	2506.07988	null
2025-06-09	Real-time Localization of a Soccer Ball from a Single Camera	Dmitrii Vorobev et.al.	2506.07981	null
2025-06-09	Low-Complexity Super-Resolution Signature Estimation of XL-MIMO FMCW Radar	Chandrashekhar Rai et.al.	2506.07979	null
2025-06-09	Predicting Situation Awareness from Physiological Signals	Kieran J. Smith et.al.	2506.07930	null
2025-06-09	LUCIFER: Language Understanding and Context-Infused Framework for Exploration and Behavior Refinement	Dimitris Panagopoulos et.al.	2506.07915	null
2025-06-09	GaussianVAE: Adaptive Learning Dynamics of 3D Gaussians for High-Fidelity Super-Resolution	Shuja Khalid et.al.	2506.07897	null
2025-06-09	CrosswalkNet: An Optimized Deep Learning Framework for Pedestrian Crosswalk Detection in Aerial Images with High-Performance Computing	Zubin Bhuyan et.al.	2506.07885	null
2025-06-09	Spatio-Temporal State Space Model For Efficient Event-Based Optical Flow	Muhammad Ahmed Humais et.al.	2506.07878	link
2025-06-09	Egocentric Event-Based Vision for Ping Pong Ball Trajectory Prediction	Ivan Alberico et.al.	2506.07860	link
2025-06-09	SAM2Auto: Auto Annotation Using FLASH	Arash Rocky et.al.	2506.07850	null
2025-06-09	R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation	William Ljungbergh et.al.	2506.07826	null
2025-06-09	On-The-Fly Symbolic Algorithm for Timed ATL with Abstractions	Nicolaj Ø. Jensen et.al.	2506.07802	null
2025-06-09	Novel software for continuous wavelet analysis enable EEG real-time analysis on portable computers	Shoichiro Nakanishi et.al.	2506.07793	null
2025-06-09	Language-Vision Planner and Executor for Text-to-Visual Reasoning	Yichang Xu et.al.	2506.07778	null
2025-06-09	ETA: Efficiency through Thinking Ahead, A Dual Approach to Self-Driving with Large Models	Shadi Hamdan et.al.	2506.07725	null
2025-06-09	CommSense: A Rapid and Accurate ISAC Paradigm	Sandip Jana et.al.	2506.07685	null
2025-06-09	QUITE: A Query Rewrite System Beyond Rules with LLM Agents	Yuyang Song et.al.	2506.07675	null
2025-06-06	RecGPT: A Foundation Model for Sequential Recommendation	Yangqin Jiang et.al.	2506.06270	link
2025-06-06	Integrating Complexity and Biological Realism: High-Performance Spiking Neural Networks for Breast Cancer Detection	Zofia Rudnicka et.al.	2506.06265	null
2025-06-06	Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens	Jihwan Jeong et.al.	2506.06261	null
2025-06-06	From NLVO to NAO: Reactive Robot Navigation using Velocity and Acceleration Obstacles	Asher Stern et.al.	2506.06255	null
2025-06-06	PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time	Weizhi Zhang et.al.	2506.06254	null
2025-06-06	Correlated Structural and Optical Characterization during Van der Waals Epitaxy of PbI2 on Graphene	C. P. Sonny Tsotezem et.al.	2506.06241	null
2025-06-06	Initial stage jet momentum broadening in tBLFQ formalism	Dana Avramescu et.al.	2506.06206	null
2025-06-06	NAT: Neural Acoustic Transfer for Interactive Scenes in Real Time	Xutong Jin et.al.	2506.06190	null
2025-06-06	Physics-Informed Neural Networks for Control of Single-Phase Flow Systems Governed by Partial Differential Equations	Luis Kin Miyatake et.al.	2506.06188	null
2025-06-06	Technical Report for Egocentric Mistake Detection for the HoloAssist Challenge	Constantin Patsch et.al.	2506.06174	null
2025-06-06	Stream DaQ: Stream-First Data Quality Monitoring	Vasileios Papastergios et.al.	2506.06147	link
2025-06-06	On the Suitability of Wi-Fi for Interconnecting Moving Equipment in Industrial Environments	Pietro Chiavassa et.al.	2506.06074	null
2025-06-06	Conversational Interfaces for Parametric Conceptual Architectural Design: Integrating Mixed Reality with LLM-driven Interaction	Ruochen Ji et.al.	2506.06066	null
2025-06-06	Direct Integration of Recursive Gaussian Process Regression Into Extended Kalman Filters With Application to Vapor Compression Cycle Control	Ricus Husmann et.al.	2506.06065	null
2025-06-06	Enhanced Trust Region Sequential Convex Optimization for Multi-Drone Thermal Screening Trajectory Planning in Urban Environments	Kaiyuan Chen et.al.	2506.06012	link
2025-06-06	MOGO: Residual Quantized Hierarchical Causal Transformer for High-Quality and Real-Time 3D Human Motion Generation	Dongjie Fu et.al.	2506.05952	null
2025-06-06	Neural Visibility Cache for Real-Time Light Sampling	Jakub Bokšanský et.al.	2506.05930	null
2025-06-06	Proactive Assistant Dialogue Generation from Streaming Egocentric Videos	Yichi Zhang et.al.	2506.05904	null
2025-06-06	The Online Data Filter for the KM3NeT Neutrino Telescopes	O. Adriani et.al.	2506.05881	null
2025-06-06	Towards Next-Generation Intelligent Maintenance: Collaborative Fusion of Large and Small Models	Xiaoyi Yuan et.al.	2506.05854	null
2025-06-06	FreeTimeGS: Free Gaussian Primitives at Anytime and Anywhere for Dynamic Scene Reconstruction	Yifan Wang et.al.	2506.05348	null
2025-06-05	SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs	Jiahui Wang et.al.	2506.05344	link
2025-06-05	Generalizable, real-time neural decoding with hybrid state-space models	Avery Hee-Woon Ryoo et.al.	2506.05320	null
2025-06-05	Fast-DataShapley: Neural Modeling for Training Data Valuation	Haifeng Sun et.al.	2506.05281	null
2025-06-05	Vision-Based Autonomous MM-Wave Reflector Using ArUco-Driven Angle-of-Arrival Estimation	Josue Marroquin et.al.	2506.05195	null
2025-06-05	Noise-Driven AI Sensors: Secure Healthcare Monitoring with PUFs	Christiana Chamon et.al.	2506.05135	null
2025-06-05	EDEN: Efficient Dual-Layer Exploration Planning for Fast UAV Autonomous Exploration in Large 3-D Environments	Qianli Dong et.al.	2506.05106	link
2025-06-05	Cloud-Based Interoperability in Residential Energy Systems	Darren Leniston et.al.	2506.05076	null
2025-06-05	PulseRide: A Robotic Wheelchair for Personalized Exertion Control with Human-in-the-Loop Reinforcement Learning	Azizul Zahid et.al.	2506.05056	null
2025-06-05	Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic Reasoning	Mehdi Azarafza et.al.	2506.04998	null
2025-06-05	En Route Path-planning for Partially Occupied Vehicles in Ride-pooling Systems	Pengbo Zhu et.al.	2506.04968	null
2025-06-05	Organic Crystal Active Waveguide as an All-Angle Signal Receiver and Transmission Platform for Visible Light Communication	Ankur Khapre et.al.	2506.04874	null
2025-06-05	Beyond the Desktop: XR-Driven Segmentation with Meta Quest 3 and MX Ink	Lisle Faray de Paiva et.al.	2506.04858	null
2025-06-05	Deep learning image burst stacking to reconstruct high-resolution ground-based solar observations	Christoph Schirninger et.al.	2506.04781	null
2025-06-05	A high-sensitivity frequency counter for free-induction-decay signals	Tong Gong et.al.	2506.04780	null
2025-06-05	Tire Wear Aware Trajectory Tracking Control for Multi-axle Swerve-drive Autonomous Mobile Robots	Tianxin Hu et.al.	2506.04752	null
2025-06-05	SRD: Reinforcement-Learned Semantic Perturbation for Backdoor Defense in VLMs	Shuhan Xu et.al.	2506.04743	null
2025-06-05	Real-Time LPV-Based Non-Linear Model Predictive Control for Robust Trajectory Tracking in Autonomous Vehicles	Nitish Kumar et.al.	2506.04684	null
2025-06-05	Application of SDRE to Achieve Gait Control in a Bipedal Robot for Knee-Type Exoskeleton Testing	Ping-Kong Huang et.al.	2506.04680	null
2025-06-05	Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation	Yuyang Wanyan et.al.	2506.04614	null
2025-06-05	Construction of Urban Greenland Resources Collaborative Management Platform	Dongyang Lyu et.al.	2506.03830	null
2025-06-05	MambaNeXt-YOLO: A Hybrid State Space Model for Real-time Object Detection	Xiaochun Lei et.al.	2506.03654	null
2025-06-04	MS-YOLO: A Multi-Scale Model for Accurate and Efficient Blood Cell Detection	Guohua Wu et.al.	2506.03972	null
2025-06-04	FPGA-Enabled Machine Learning Applications in Earth Observation: A Systematic Review	Cédric Léonard et.al.	2506.03938	link
2025-06-04	Forecasting Seasonal Influenza Epidemics with Physics-Informed Neural Networks	Martina Rama et.al.	2506.03897	null
2025-06-04	JointSplat: Probabilistic Joint Flow-Depth Optimization for Sparse-View Gaussian Splatting	Yang Xiao et.al.	2506.03872	null
2025-06-04	Frame-Level Real-Time Assessment of Stroke Rehabilitation Exercises from Video-Level Labeled Data: Task-Specific vs. Foundation Models	Gonçalo Mesquita et.al.	2506.03752	null
2025-06-04	Probabilistic measures afford fair comparisons of AIWP and NWP model output	Tilmann Gneiting et.al.	2506.03744	link
2025-06-04	Accelerating SfM-based Pose Estimation with Dominating Set	Joji Joseph et.al.	2506.03667	null
2025-06-04	Analyzing Transformer Models and Knowledge Distillation Approaches for Image Captioning on Edge AI	Wing Man Casca Kwok et.al.	2506.03607	null
2025-06-04	SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting	Shengjie Lin et.al.	2506.03594	link
2025-06-04	SwitchVLA: Execution-Aware Task Switching for Vision-Language-Action Models	Meng Li et.al.	2506.03574	null
2025-06-04	Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments	Reo Yoneyama et.al.	2506.03554	null
2025-06-04	A Threat Intelligence Event Extraction Conceptual Model for Cyber Threat Intelligence Feeds	Jamal H. Al-Yasiri et.al.	2506.03551	null
2025-06-04	Topology-Aware Graph Neural Network-based State Estimation for PMU-Unobservable Power Systems	Shiva Moshtagh et.al.	2506.03493	null
2025-06-04	Adaptive Configuration Selection for Multi-Model Inference Pipelines in Edge Computing	Jinhao Sheng et.al.	2506.02814	null
2025-06-04	Voyager: Real-Time Splatting City-Scale 3D Gaussians on Your Phone	Zheng Liu et.al.	2506.02774	null
2025-06-03	StARS DCM: A Sleep Stage-Decoding Forehead EEG Patch for Real-time Modulation of Sleep Physiology	William G. Coon et.al.	2506.03442	null
2025-06-03	Online multi-layer FDR control	Runqiu Wang et.al.	2506.03406	null
2025-06-03	A Multimodal, Multilingual, and Multidimensional Pipeline for Fine-grained Crowdsourcing Earthquake Damage Evaluation	Zihui Ma et.al.	2506.03360	link
2025-06-03	*Spatial Association Between Near-Misses and Accident Blackspots in Sydney, Australia: A Getis-Ord $G_i^$ Analysis**	Artur Grigorev et.al.	2506.03356	null
2025-06-03	Structural Vibration Monitoring with Diffractive Optical Processors	Yuntian Wang et.al.	2506.03317	null
2025-06-03	TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models	Chetwin Low et.al.	2506.03099	null
2025-06-03	InterMamba: Efficient Human-Human Interaction Generation with Adaptive Spatio-Temporal Mamba	Zizhao Wu et.al.	2506.03084	null
2025-06-03	LEG-SLAM: Real-Time Language-Enhanced Gaussian Splatting for SLAM	Roman Titkov et.al.	2506.03073	null
2025-06-03	Diffusion Buffer: Online Diffusion-based Speech Enhancement with Sub-Second Latency	Bunlong Lay et.al.	2506.02908	null
2025-06-03	When Blockchain Meets Crawlers: Real-time Market Analytics in Solana NFT Markets	Chengxin Shen et.al.	2506.02892	null
2025-06-03	OpenFace 3.0: A Lightweight Multitask System for Comprehensive Facial Behavior Analysis	Jiewen Hu et.al.	2506.02891	null
2025-06-03	CLONE: Customizing LLMs for Efficient Latency-Aware Inference at the Edge	Chunlin Tian et.al.	2506.02847	null
2025-06-03	Process Mining on Distributed Data Sources	Maximilian Weisenseel et.al.	2506.02830	null
2025-06-03	Target Sensing Performance in Disaster-Specific ISAC Networks	Ahmet Burak Ozyurt et.al.	2506.02828	null
2025-06-03	AI-Driven Vehicle Condition Monitoring with Cell-Aware Edge Service Migration	Charalampos Kalalas et.al.	2506.02785	null
2025-06-03	SAMJ: Fast Image Annotation on ImageJ/Fiji via Segment Anything Model	Carlos Garcia-Lopez-de-Haro et.al.	2506.02783	null
2025-06-03	RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS	Chuanyu Fu et.al.	2506.02751	null
2025-06-03	Collective Intelligence Outperforms Individual Talent: A Case Study in League of Legends	Angelo Josey Caldeira et.al.	2506.02706	null
2025-06-03	A Pretrained Probabilistic Transformer for City-Scale Traffic Volume Prediction	Shiyu Shen et.al.	2506.02654	null
2025-06-03	From Prompts to Protection: Large Language Model-Enabled In-Context Learning for Smart Public Safety UAV	Yousef Emami et.al.	2506.02649	null
2025-06-03	Phase Topology Stability of an Optical Vortex via an Electrically Controlled Twist-Planar Oriented Liquid Crystal Fresnel Lens	Elena Melnikova et.al.	2506.02632	null
2025-06-03	HORUS: A Mixed Reality Interface for Managing Teams of Mobile Robots	Omotoye Shamsudeen Adekoya et.al.	2506.02622	null
2025-06-03	Hierarchical Question-Answering for Driving Scene Understanding Using Vision-Language Models	Safaa Abdullahi Moallim Mohamud et.al.	2506.02615	null
2025-05-30	PB&J: Peanut Butter and Joints for Damped Articulation	Avery S. Williamson et.al.	2505.24860	link
2025-05-30	Don't Reinvent the Wheel: Efficient Instruction-Following Text Embedding based on Guided Space Transformation	Yingchaojie Feng et.al.	2505.24754	link
2025-05-30	Neural Network-based Universal Formulas for Control	Pol Mestres et.al.	2505.24744	null
2025-05-30	Efficient Text Encoders for Labor Market Analysis	Jens-Joris Decorte et.al.	2505.24640	null
2025-05-30	Co-designed Quantum Discrete Adiabatic Linear System Solver Via Dynamic Circuits	Boxuan Ai et.al.	2505.24626	null
2025-05-30	Frequency-Domain Joint Monitoring of Differential Group Delay and Dependent Loss of Optical Singleand Few-Mode Fiber Channels Based on CAZAC Sequences	Linsheng Fan et.al.	2505.24589	null
2025-05-30	Fine-tuning for Data-enabled Predictive Control of Noisy Systems by Reinforcement Learning	Jinbao Wang et.al.	2505.24572	null
2025-05-30	Airborne Neural Network	Paritosh Ranjan et.al.	2505.24513	null
2025-05-30	How can AI reduce fall injuries in the workplace?	Nicholas Cartocci et.al.	2505.24507	null
2025-05-30	Enhancing the Accuracy of Spatio-Temporal Models for Wind Speed Prediction by Incorporating Bias-Corrected Crowdsourced Data	Eamonn Organ et.al.	2505.24506	link
2025-05-30	Real-time Fall Prevention system for the Next-generation of Workers	Nicholas Cartocci et.al.	2505.24487	null
2025-05-30	Boosting Automatic Exercise Evaluation Through Musculoskeletal Simulation-Based IMU Data Augmentation	Andreas Spilz et.al.	2505.24415	null
2025-05-30	SAH-Drive: A Scenario-Aware Hybrid Planner for Closed-Loop Vehicle Trajectory Generation	Yuqi Fan et.al.	2505.24390	link
2025-05-30	Spatiotemporal Analysis of Forest Machine Operations Using 3D Video Classification	Maciej Wielgosz et.al.	2505.24375	null
2025-05-30	A Novel Coronary Artery Registration Method Based on Super-pixel Particle Swarm Optimization	Peng Qi et.al.	2505.24351	null
2025-05-30	A 3D Mobile Crowdsensing Framework for Sustainable Urban Digital Twins	Taku Yamazaki et.al.	2505.24348	null
2025-05-30	DTR: Delaunay Triangulation-based Racing for Scaled Autonomous Racing	Luca Tognoni et.al.	2505.24320	null
2025-05-30	A Novel Discrete Memristor-Coupled Heterogeneous Dual-Neuron Model and Its Application in Multi-Scenario Image Encryption	Yi Zou et.al.	2505.24294	null
2025-05-30	Proactive Guidance of Multi-Turn Conversation in Industrial Search	Xiaoyu Li et.al.	2505.24251	null
2025-05-30	MOPSA: Mixture of Prompt-Experts Based Speaker Adaptation for Elderly Speech Recognition	Chengxi Deng et.al.	2505.24224	null
2025-05-29	AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views	Lihan Jiang et.al.	2505.23716	null
2025-05-29	From Connectivity to Autonomy: The Dawn of Self-Evolving Communication Systems	Zeinab Nezami et.al.	2505.23710	null
2025-05-29	Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better	Danny Driess et.al.	2505.23705	null
2025-05-29	DiCoFlex: Model-agnostic diverse counterfactuals with flexible control	Oleksii Furman et.al.	2505.23700	null
2025-05-29	Errors in Stereo Geometry Induce Distance Misperception	Raffles Xingqi Zhu et.al.	2505.23685	null
2025-05-29	Differentially Private Space-Efficient Algorithms for Counting Distinct Elements in the Turnstile Model	Rachel Cummings et.al.	2505.23682	null
2025-05-29	Performance Analysis of Wireless Communication Systems Assisted by Fluid Reconfigurable Intelligent Surfaces	Farshad Rostami Ghadi et.al.	2505.23680	null
2025-05-29	Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model	Qingyu Shi et.al.	2505.23606	link
2025-05-29	MAPLE: A Mobile Assistant with Persistent Finite State Machines for Recovery Reasoning	Linqiang Guo et.al.	2505.23596	null
2025-05-29	Scalable decoding protocols for fast transversal logic in the surface code	Mark L. Turner et.al.	2505.23567	null
2025-05-29	The CASE Framework -- A New Architecture for Participatory Research and Digital Health Surveillance	Marco Hirsch et.al.	2505.23516	null
2025-05-29	DeepFilterGAN: A Full-band Real-time Speech Enhancement System with GAN-based Stochastic Regeneration	Sanberk Serbest et.al.	2505.23515	null
2025-05-29	Agentic Robot: A Brain-Inspired Framework for Vision-Language-Action Models in Embodied Agents	Zhejian Yang et.al.	2505.23450	null
2025-05-29	Enhanced DACER Algorithm with High Diffusion Efficiency	Yinuo Wang et.al.	2505.23426	null
2025-05-29	When water phase matters: its effect on the stopping cross section for proton therapy and astrophysics	F. Matias et.al.	2505.23396	null
2025-05-29	CF-DETR: Coarse-to-Fine Transformer for Real-Time Object Detection	Woojin Shin et.al.	2505.23317	null
2025-05-29	Investigating A Geometrical Solution to the Vergence-Accommodation Conflict for Targeted Movements in Virtual Reality	Xiaoye Michael Wang et.al.	2505.23310	null
2025-05-29	MathArena: Evaluating LLMs on Uncontaminated Math Competitions	Mislav Balunović et.al.	2505.23281	link
2025-05-29	Wireless Agentic AI with Retrieval-Augmented Multimodal Semantic Perception	Guangyuan Liu et.al.	2505.23275	null
2025-05-29	Context-Aware Semantic Communication for the Wireless Networks	Guangyuan Liu et.al.	2505.23249	null
2025-05-29	The Meeseeks Mesh: Spatially Consistent 3D Adversarial Objects for BEV Detector	Aixuan Li et.al.	2505.22499	null
2025-05-29	YH-MINER: Multimodal Intelligent System for Natural Ecological Reef Metric Extraction	Mingzhuang Wang et.al.	2505.22250	null
2025-05-28	VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models	Ce Zhang et.al.	2505.22654	null
2025-05-28	VR-Based Control of Multi-Copter Operation	Jack T. Hughes et.al.	2505.22599	null
2025-05-28	A Graph-Based Laser Path Solver Algorithm for Virtual Reality Laboratory Simulations	Andreas Müller et.al.	2505.22540	null
2025-05-28	AI instructional agent improves student's perceived learner control and learning outcome: empirical evidence from a randomized controlled trial	Fei Qin et.al.	2505.22526	null
2025-05-28	CPINN-ABPI: Physics-Informed Neural Networks for Accurate Power Estimation in MPSoCs	Mohamed R. Elshamy et.al.	2505.22469	null
2025-05-28	STDR: Spatio-Temporal Decoupling for Real-Time Dynamic Scene Rendering	Zehao Li et.al.	2505.22400	null
2025-05-28	Learning to Pursue AC Optimal Power Flow Solutions with Feasibility Guarantees	Damola Ajeyemi et.al.	2505.22399	null
2025-05-28	UP-SLAM: Adaptively Structured Gaussian SLAM with Uncertainty Prediction in Dynamic Environments	Wancai Zheng et.al.	2505.22335	null
2025-05-28	Versatile Cardiovascular Signal Generation with a Unified Diffusion Transformer	Zehua Chen et.al.	2505.22306	null
2025-05-28	Two-stage Audio-Visual Target Speaker Extraction System for Real-Time Processing On Edge Device	Zixuan Li et.al.	2505.22229	null
2025-05-28	ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation	Jiawen Yu et.al.	2505.22159	null
2025-05-28	Streaming Remote rendering services: Comparison of QUIC-based and WebRTC Protocols	Daniel Mejías et.al.	2505.22132	null
2025-05-28	Real-Time Blind Defocus Deblurring for Earth Observation: The IMAGIN-e Mission Approach	Alejandro D. Mousist et.al.	2505.22128	null
2025-05-28	Leveraging 5G Physical Layer Monitoring for Adaptive Remote Rendering in XR Applications	Inhar Yeregui et.al.	2505.22123	null
2025-05-28	A simulation framework for autonomous lunar construction work	Mattias Linde et.al.	2505.22091	null
2025-05-28	High Volume Rate 3D Ultrasound Reconstruction with Diffusion Models	Tristan S. W. Stevens et.al.	2505.22090	null
2025-05-28	Cognitively-Inspired Emergent Communication via Knowledge Graphs for Assisting the Visually Impaired	Ruxiao Chen et.al.	2505.22087	null
2025-05-28	On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition	Shujie HU et.al.	2505.22072	null
2025-05-27	Visual Product Graph: Bridging Visual Products And Composite Images For End-to-End Style Recommendations	Yue Li Du et.al.	2505.21454	null
2025-05-27	Hume: Introducing System-2 Thinking in Visual-Language-Action Model	Haoming Song et.al.	2505.21432	null
2025-05-27	Autonomous Multi-Modal LLM Agents for Treatment Planning in Focused Ultrasound Ablation Surgery	Lina Zhao et.al.	2505.21418	null
2025-05-27	AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs	Xuanwen Ding et.al.	2505.21389	link
2025-05-27	A first look at ROS~2 applications written in asynchronous Rust	Martin Škoudlil et.al.	2505.21323	null
2025-05-27	Assured Autonomy with Neuro-Symbolic Perception	R. Spencer Hallyburton et.al.	2505.21322	null
2025-05-27	Data-Driven Cellular Mobility Management via Bayesian Optimization and Reinforcement Learning	Mohamed Benzaghta et.al.	2505.21249	null
2025-05-27	Towards Quantum Simulation of Meson Scattering in a Z2 Lattice Gauge Theory	Yahui Chai et.al.	2505.21240	null
2025-05-27	3D-UIR: 3D Gaussian for Underwater 3D Scene Reconstruction via Physics-Based Appearance-Medium Decouplin	Jieyu Yuan et.al.	2505.21238	null
2025-05-27	Think Twice, Act Once: Token-Aware Compression and Action Reuse for Efficient Inference in Vision-Language-Action Models	Xudong Tan et.al.	2505.21200	null
2025-05-27	Constructive community race: full-density spiking neural network model drives neuromorphic computing	Johanna Senk et.al.	2505.21185	null
2025-05-27	Hybrid Machine Learning and Mathematical Modeling for Tumor Dynamics Prediction: Comparing SPIONs against mNP-FDG	Amit K Chattopadhyay et.al.	2505.21094	null
2025-05-27	All-optical discrete illumination-based compressed ultrafast photography	Long Cheng et.al.	2505.21086	null
2025-05-27	Modeling of Water Evaporation in Hydrogels from Aspect of Mechanical Analytics	Zehua Yu et.al.	2505.21075	null
2025-05-27	Nonreciprocal and long-range three-body interactions in Bose-Einstein condensates induced by optical feedback	Yi-Qing Zhang et.al.	2505.21044	null
2025-05-27	CityGo: Lightweight Urban Modeling and Rendering with Proxy Buildings and Residual Gaussians	Weihang Liu et.al.	2505.21041	null
2025-05-27	ClearSphere: Multi-Earphone Synergy for Enhanced Conversational Clarity	Lixing He et.al.	2505.21004	null
2025-05-27	CNN-Based Channel Map Estimation for Movable Antenna Systems	Yitai Huang et.al.	2505.21001	null
2025-05-27	SCALOFT: An Initial Approach for Situation Coverage-Based Safety Analysis of an Autonomous Aerial Drone in a Mine Environment	Nawshin Mannan Proma et.al.	2505.20969	null
2025-05-27	YOLO-FireAD: Efficient Fire Detection via Attention-Guided Inverted Residual Learning and Dual-Pooling Feature Preservation	Weichao Pan et.al.	2505.20884	null
2025-05-26	Understanding and Supporting Co-viewing Comedy in VR with Embodied Expressive Avatars	Ryo Ohara et.al.	2505.20082	null
2025-05-26	M3DHMR: Monocular 3D Hand Mesh Recovery	Yihong Lin et.al.	2505.20058	null
2025-05-26	Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion	Zheqi Lv et.al.	2505.20053	link
2025-05-26	Uncertainty-Aware Attention Heads: Efficient Unsupervised Uncertainty Quantification for LLMs	Artem Vazhentsev et.al.	2505.20045	null
2025-05-26	Optimizing edge AI models on HPC systems with the edge in the loop	Marcel Aach et.al.	2505.19995	link
2025-05-26	A Cooperative Aerial System of A Payload Drone Equipped with Dexterous Rappelling End Droid for Cluttered Space Pickup	Wenjing Ren et.al.	2505.19980	null
2025-05-26	Dynamically Learned Test-Time Model Routing in Language Model Zoos with Service Level Guarantees	Herbert Woisetschläger et.al.	2505.19947	null
2025-05-26	Weather-Magician: Reconstruction and Rendering Framework for 4D Weather Synthesis In Real Time	Chen Sang et.al.	2505.19919	null
2025-05-26	EMAC+: Embodied Multimodal Agent for Collaborative Planning with VLM+LLM	Shuang Ao et.al.	2505.19905	null
2025-05-26	Adaptive Indexing for Approximate Query Processing in Exploratory Data Analysis	Stavros Maroulis et.al.	2505.19872	null
2025-05-26	PCDCNet: A Surrogate Model for Air Quality Forecasting with Physical-Chemical Dynamics and Constraints	Shuo Wang et.al.	2505.19842	link
2025-05-26	A Cost-efficient Credit-Based Shaper Deployment Framework for Time-Sensitive Networks	Santiago Torres-Borda et.al.	2505.19771	null
2025-05-26	GeoPF: Infusing Geometry into Potential Fields for Reactive Planning in Non-trivial Environments	Yuhe Gong et.al.	2505.19688	null
2025-05-26	A Fluorescent Material Model for Non-Spectral Editing & Rendering	Belcour Laurent et.al.	2505.19672	null
2025-05-26	Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling	Haiyang Sun et.al.	2505.19669	null
2025-05-26	Autonomous Flights inside Narrow Tunnels	Luqi Wang et.al.	2505.19657	link
2025-05-26	Software Engineering for Self-Adaptive Robotics: A Research Agenda	Shaukat Ali et.al.	2505.19629	null
2025-05-26	Indoor Air Quality Detection Robot Model Based on the Internet of Things (IoT)	Anggiat Mora Simamora et.al.	2505.19600	link
2025-05-26	Situationally-Aware Dynamics Learning	Alejandro Murillo-Gonzalez et.al.	2505.19574	null
2025-05-26	LLM-Agent-Controller: A Universal Multi-Agent Large Language Model System as a Control Engineer	Rasoul Zahedifar et.al.	2505.19567	null
2025-05-23	VideoGameBench: Can Vision-Language Models complete popular video games?	Alex L. Zhang et.al.	2505.18134	null
2025-05-23	ManuSearch: Democratizing Deep Search in Large Language Models with a Transparent and Open Multi-Agent Framework	Lisheng Huang et.al.	2505.18105	link
2025-05-23	SHARDeg: A Benchmark for Skeletal Human Action Recognition in Degraded Scenarios	Simon Malzard et.al.	2505.18048	null
2025-05-23	Clip4Retrofit: Enabling Real-Time Image Labeling on Edge Devices via Cross-Architecture CLIP Distillation	Li Zhong et.al.	2505.18039	null
2025-05-23	Clinical Validation of Deep Learning for Real-Time Tissue Oxygenation Estimation Using Spectral Imaging	Jens De Winne et.al.	2505.18010	null
2025-05-23	Re-evaluation of Logical Specification in Behavioural Verification	Radoslaw Klimek et.al.	2505.17979	null
2025-05-23	Evolving Machine Learning: A Survey	Ignacio Cabrera Martin et.al.	2505.17902	null
2025-05-23	Geometric Shape Modelling and Volume Estimation of Dry Bulk Cargo Piles using a Single Image	Debanshu Ratha et.al.	2505.17896	null
2025-05-23	Toward Optimal ANC: Establishing Mutual Information Lower Bound	François Derrida et.al.	2505.17877	null
2025-05-23	Light-Driven Bound State of Interacting Impurities in a Dirac-Like Bath	Vinayak M. Kulkarni et.al.	2505.17811	null
2025-05-23	DialogXpert: Driving Intelligent and Emotion-Aware Conversations through Online Value-Based Reinforcement Learning with LLM Priors	Tazeek Bin Abdur Rakib et.al.	2505.17795	null
2025-05-23	Real-time calibrations for future detectors at FAIR	Valentin Kladov et.al.	2505.17781	null
2025-05-23	Sec5GLoc: Securing 5G Indoor Localization via Adversary-Resilient Deep Learning Architecture	Ildi Alla et.al.	2505.17776	link
2025-05-23	HRSim: An agent-based simulation platform for high-capacity ride-sharing services	Wang Chen et.al.	2505.17758	link
2025-05-23	Instruct2See: Learning to Remove Any Obstructions Across Distributions	Junhang Li et.al.	2505.17649	null
2025-05-23	MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation	Jihan Yao et.al.	2505.17613	null
2025-05-23	A Unified Multi-Scale Attention-Based Network for Automatic 3D Segmentation of Lung Parenchyma & Nodules In Thoracic CT Images	Muhammad Abdullah et.al.	2505.17602	link
2025-05-23	JELAI: Integrating AI and Learning Analytics in Jupyter Notebooks	Manuel Valle Torre et.al.	2505.17593	null
2025-05-23	Distance Estimation in Outdoor Driving Environments Using Phase-only Correlation Method with Event Cameras	Masataka Kobayashi et.al.	2505.17582	null
2025-05-23	Direct Feature Access -- Scaling Network Traffic Feature Collection to Terabit Speed	Lukas Froschauer et.al.	2505.17573	null
2025-05-22	Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models	Junjie Xiong et.al.	2505.16957	null
2025-05-22	From Reality to Virtual Worlds: The Role of Photogrammetry in Game Development	Santiago Berrezueta-Guzman et.al.	2505.16951	null
2025-05-22	Scalable and Interpretable Contextual Bandits: A Literature Review and Retail Offer Prototype	Nikola Tankovic et.al.	2505.16918	null
2025-05-22	Identifying, Evaluating, and Mitigating Risks of AI Thought Partnerships	Kerem Oktar et.al.	2505.16899	null
2025-05-22	FlashBack: Consistency Model-Accelerated Shared Autonomy	Luzhe Sun et.al.	2505.16892	null
2025-05-22	Arbor-TVB: A Novel Multi-Scale Co-Simulation Framework with a Case Study on Neural-Level Seizure Generation and Whole-Brain Propagation	Thorsten Hater et.al.	2505.16861	null
2025-05-22	Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate	Hanglei Zhang et.al.	2505.16845	null
2025-05-22	SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving	Xuesong Chen et.al.	2505.16805	null
2025-05-22	Detecting Fake News Belief via Skin and Blood Flow Signals	Gennie Nguyen et.al.	2505.16730	null
2025-05-22	SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding	Sushant Gautam et.al.	2505.16630	null
2025-05-22	Recursive Offloading for LLM Serving in Multi-tier Networks	Zhiyuan Wu et.al.	2505.16502	link
2025-05-22	Human-like Semantic Navigation for Autonomous Driving using Knowledge Representation and Large Language Models	Augusto Luis Ballardini et.al.	2505.16498	null
2025-05-22	InspectionV3: Enhancing Tobacco Quality Assessment with Deep Convolutional Neural Networks for Automated Workshop Management	Yao Wei et.al.	2505.16485	null
2025-05-22	Beyond Static Testbeds: An Interaction-Centric Agent Simulation Platform for Dynamic Recommender Systems	Song Jin et.al.	2505.16429	null
2025-05-22	Dynamic Caustics by Ultrasonically Modulated Liquid Surface	Koki Nagakura et.al.	2505.16397	null
2025-05-22	Quantum-Driven Multihead Inland Waterbody Detection With Transformer-Encoded CYGNSS Delay-Doppler Map Data	Chia-Hsiang Lin et.al.	2505.16391	null
2025-05-22	Observing dynamics of distinct structural transitions in trapped-ion clusters	Akhil Ayyadevara et.al.	2505.16378	null
2025-05-22	Multimodal Generative AI for Story Point Estimation in Software Development	Mohammad Rubyet Islam et.al.	2505.16290	null
2025-05-22	Energy Spectra of Secondary Particles Induced by Solar Energetic Proton Events and Magnetospheric Effects	A. Chilingarian et.al.	2505.16269	null
2025-05-22	Interpretable Anomaly Detection in Encrypted Traffic Using SHAP with Machine Learning Models	Kalindi Singh et.al.	2505.16261	null
2025-05-21	Direct Detection of Cosmic Walls with Paleo Detectors	Wen Yin et.al.	2505.15764	null
2025-05-21	Majorana Zero Modes in a Heterogenous Structure of Topological and Trivial Domains in FeSe $_{1-x}$Te$_x$	Prashant Gupta et.al.	2505.15745	null
2025-05-21	iBitter-Stack: A Multi-Representation Ensemble Learning Model for Accurate Bitter Peptide Identification	Sarfraz Ahmad et.al.	2505.15730	link
2025-05-21	Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model	Ke Hu et.al.	2505.15670	null
2025-05-21	Lithium Intercalation in the Anisotropic van der Waals Magnetic Semiconductor CrSBr	Kseniia Mosina et.al.	2505.15663	null
2025-05-21	Self-powered smart contact lenses: a multidisciplinary approach to micro-scale energy and 900 MHz - 1.1 GHz bandwidth microfabricated loop antennas communication systems	Patrice Salzenstein et.al.	2505.15593	null
2025-05-21	VP Lab: a PEFT-Enabled Visual Prompting Laboratory for Semantic Segmentation	Niccolo Avogaro et.al.	2505.15592	null
2025-05-21	Decreasing Utilization of Systems with Multi-Rate Cause-Effect Chains While Reducing End-to-End Latencies	Luiz Maia et.al.	2505.15546	null
2025-05-21	Exploiting Age of Information in Network Digital Twins for AI-driven Real-Time Link Blockage Detection	Michele Zhu et.al.	2505.15519	null
2025-05-21	AI-empowered Real-Time Line-of-Sight Identification via Network Digital Twins	Michele Zhu et.al.	2505.15478	null
2025-05-21	FAV-NSS: An HIL Framework for Accelerating Validation of Automotive Network Security Strategies	Changhong Li et.al.	2505.15393	null
2025-05-21	EVA: Expressive Virtual Avatars from Multi-view Videos	Hendrik Junkawitsch et.al.	2505.15385	null
2025-05-21	Real-Time Detection of Insider Threats Using Behavioral Analytics and Deep Evidential Clustering	Anas Ali et.al.	2505.15383	null
2025-05-21	RAZER: Robust Accelerated Zero-Shot 3D Open-Vocabulary Panoptic Reconstruction with Spatio-Temporal Aggregation	Naman Patel et.al.	2505.15373	null
2025-05-21	AI vs. Human Judgment of Content Moderation: LLM-as-a-Judge and Ethics-Based Response Refusals	Stefan Pasch et.al.	2505.15365	null
2025-05-21	Human in the Loop Adaptive Optimization for Improved Time Series Forecasting	Malik Tiomoko et.al.	2505.15354	link
2025-05-21	Subgap pumping of antiferromagnetic Mott insulators: photoexcitation mechanisms and applications	Radu Andrei et.al.	2505.15343	null
2025-05-21	High-Throughput Mechanical Characterization of Giant Unilamellar Vesicles by Real-Time Deformability Cytometry	Maximilian Kloppe et.al.	2505.15341	null
2025-05-21	LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models	Qianyue Hao et.al.	2505.15293	null
2025-05-21	LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval	Zhenyu Ning et.al.	2505.15269	null
2025-05-20	Emerging Properties in Unified Multimodal Pretraining	Chaorui Deng et.al.	2505.14683	null
2025-05-20	NExT-Search: Rebuilding User Feedback Ecosystem for Generative AI Search	Sunhao Dai et.al.	2505.14680	null
2025-05-20	Beyond Words: Multimodal LLM Knows When to Speak	Zikai Liao et.al.	2505.14654	null
2025-05-20	Separatrix configurations in holomorphic flows	Nicolas Kainz et.al.	2505.14594	null
2025-05-20	Representation Learning for Semantic Alignment of Language, Audio, and Visual Modalities	Parthasaarathy Sudarsanam et.al.	2505.14562	null
2025-05-20	Automated, Cross-Layer Root Cause Analysis of 5G Video-Conferencing Quality Degradation	Fan Yi et.al.	2505.14540	null
2025-05-20	PAST: Phonetic-Acoustic Speech Tokenizer	Nadav Har-Tuv et.al.	2505.14470	null
2025-05-20	Efficient Configuration-Constrained Tube MPC via Variables Restriction and Template Selection	Filippo Badalamenti et.al.	2505.14440	link
2025-05-20	Two Empirical Studies on Audiovisual Semiotics of Uncertainty	Sita Vriend et.al.	2505.14379	null
2025-05-20	Information-optimal measurement: From fixed sampling protocols to adaptive spectroscopy	J. Schroeder et.al.	2505.14364	null
2025-05-20	Local Minima Prediction using Dynamic Bayesian Filtering for UGV Navigation in Unstructured Environments	Seung Hun Lee et.al.	2505.14337	null
2025-05-20	Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach	Umberto Cappellazzo et.al.	2505.14336	null
2025-05-20	Exploring Jailbreak Attacks on LLMs through Intent Concealment and Diversion	Tiehan Cui et.al.	2505.14316	null
2025-05-20	Timely CPU Scheduling for Computation-intensive Status Updates	Mengqiu Zhou et.al.	2505.14307	null
2025-05-20	SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors	Maheep Chaudhary et.al.	2505.14300	null
2025-05-20	AquaSignal: An Integrated Framework for Robust Underwater Acoustic Analysis	Eirini Panteli et.al.	2505.14285	null
2025-05-20	Hybrid Adaptive Modeling in Process Monitoring: Leveraging Sequence Encoders and Physics-Informed Neural Networks	Mouad Elaarabi et.al.	2505.14252	null
2025-05-20	Visual Agentic Reinforcement Fine-Tuning	Ziyu Liu et.al.	2505.14246	link
2025-05-20	Automatic Dataset Generation for Knowledge Intensive Question Answering Tasks	Sizhe Yuen et.al.	2505.14212	null
2025-05-20	Dynamic Replanning for Improved Public Transport Routing	Abdallah Abuaisha et.al.	2505.14193	null
2025-05-20	QSVM-QNN: Quantum Support Vector Machine Based Quantum Neural Network Learning Algorithm for Brain-Computer Interfacing Systems	Bikash K. Behera et.al.	2505.14192	null
2025-05-20	Gaming Strategies in European Imbalance Settlement Mechanisms	Seyed Soroush Karimi Madahi et.al.	2505.14133	null
2025-05-20	Place Recognition: A Comprehensive Review, Current Challenges and Future Directions	Zhenyu Li et.al.	2505.14068	link
2025-05-19	Quantum Hardware-in-the-Loop for Optimal Power Flow in Renewable-Integrated Power Systems	Zeynab Kaseb et.al.	2505.13356	null
2025-05-19	Approximating Global Contact-Implicit MPC via Sampling and Local Complementarity	Sharanya Venkatesh et.al.	2505.13350	null
2025-05-19	Level Generation with Quantum Reservoir Computing	João S. Ferreira et.al.	2505.13287	null
2025-05-19	MAGI-1: Autoregressive Video Generation at Scale	Sand. ai et.al.	2505.13211	link
2025-05-19	Combinatorial Sample-and Back-Focal-Plane (BFP) Imaging. Pt. I: Instrument and acquisition parameters affecting BFP images and their analysis	Omer Shavit et.al.	2505.13190	null
2025-05-19	ToolSpectrum : Towards Personalized Tool Utilization for Large Language Models	Zihao Cheng et.al.	2505.13176	null
2025-05-19	A conformally mapped numerical wave tank supporting piston and flap wavemakers	Andreas Holm Akselsen et.al.	2505.13154	null
2025-05-19	Ocean wave spectrum reconstruction from HF radar data and its application to wave height estimation	Kaede Watanabe et.al.	2505.13132	null
2025-05-19	Constraint-Aware Diffusion Guidance for Robotics: Real-Time Obstacle Avoidance for Autonomous Racing	Hao Ma et.al.	2505.13131	null
2025-05-19	Adaptive Image Restoration for Video Surveillance: A Real-Time Approach	Muhammad Awais Amin et.al.	2505.13130	null
2025-05-19	Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation	Guo Chen et.al.	2505.13094	null
2025-05-19	PPTNet: A Hybrid Periodic Pattern-Transformer Architecture for Traffic Flow Prediction and Congestion Identification	Hongrui Kou et.al.	2505.13047	link
2025-05-19	Ultrafast Laser Induces Macroscopic Symmetry-Breaking of Diamond Color Centers	Yang Gao et.al.	2505.12989	null
2025-05-19	Regularized Model Predictive Control	Komeil Nosrati et.al.	2505.12977	null
2025-05-19	Fast, Not Fancy: Rethinking G2P with Rich Data and Rule-Based Models	Mahta Fetrat Qharabagh et.al.	2505.12973	link
2025-05-19	Multiscale Adaptive Conflict-Balancing Model For Multimedia Deepfake Detection	Zihan Xiong et.al.	2505.12966	null
2025-05-19	Effects of the Auto-Correlation of Delays on the Age of Information: A Gaussian Process Framework	Atsushi Inoie et.al.	2505.12885	null
2025-05-19	Optimization of Hybrid Quantum-Classical Algorithms	Lian Remme et.al.	2505.12853	null
2025-05-19	Reasoning BO: Enhancing Bayesian Optimization with Long-Context Reasoning Power of LLMs	Zhuo Yang et.al.	2505.12833	null
2025-05-19	Rethinking Features-Fused-Pyramid-Neck for Object Detection	Hulin Li et.al.	2505.12820	link
2025-05-16	msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML	Zhaolan Huang et.al.	2505.11483	link
2025-05-16	REACT: Runtime-Enabled Active Collision-avoidance Technique for Autonomous Driving	Heye Huang et.al.	2505.11474	null
2025-05-16	Learning Multimodal AI Algorithms for Amplifying Limited User Input into High-dimensional Control Space	Ali Rabiee et.al.	2505.11366	link
2025-05-16	Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models	Keunwoo Peter Yu et.al.	2505.11326	link
2025-05-16	Time-dependent Hole States in Multiconfigurational Time-Dependent Hartree-Fock Approaches: Applications in Photoionization of Water Molecule	Zhao-Han Zhang et.al.	2505.11319	null
2025-05-16	Diffusion Learning with Partial Agent Participation and Local Updates	Elsa Rizk et.al.	2505.11307	null
2025-05-16	MTevent: A Multi-Task Event Camera Dataset for 6D Pose Estimation and Moving Object Detection	Shrutarv Awasthi et.al.	2505.11282	link
2025-05-16	Semantic Caching of Contextual Summaries for Efficient Question-Answering with Language Models	Camille Couturier et.al.	2505.11271	null
2025-05-16	Learning traffic flows: Graph Neural Networks for Metamodelling Traffic Assignment	Oskar Bohn Lassen et.al.	2505.11230	null
2025-05-16	Real-Time Verification of Embodied Reasoning for Generative Skill Acquisition	Bo Yue et.al.	2505.11175	null
2025-05-16	Maximizing Asynchronicity in Event-based Neural Networks	Haiqing Hao et.al.	2505.11165	null
2025-05-16	Sonification of entanglement dynamics in many-qubit systems	Juliette Tudoce et.al.	2505.11159	null
2025-05-16	Open-Source Multi-Viewpoint Surgical Telerobotics	Guido Caccianiga et.al.	2505.11142	null
2025-05-16	A Multi-modal Fusion Network for Terrain Perception Based on Illumination Aware	Rui Wang et.al.	2505.11066	null
2025-05-16	Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment Benchmarking	Changlun Li et.al.	2505.11065	link
2025-05-16	Leveraging Real-Time Data Analysis and Multiple Kernel Learning for Manufacturing of Innovative Steels	Wolfgang Rannetbauer et.al.	2505.11024	null
2025-05-16	DRL-Based Injection Molding Process Parameter Optimization for Adaptive and Profitable Production	Joon-Young Kim et.al.	2505.10988	null
2025-05-16	GROQLoco: Generalist and RObot-agnostic Quadruped Locomotion Control using Offline Datasets	Narayanan PP et.al.	2505.10973	null
2025-05-16	Vaiage: A Multi-Agent Solution to Personalized Travel Planning	Binwen Liu et.al.	2505.10922	null
2025-05-16	Automated Identification of Logical Errors in Programs: Advancing Scalable Analysis of Student Misconceptions	Muntasir Hoq et.al.	2505.10913	null
2025-05-15	An AI-driven framework for the prediction of personalised health response to air pollution	Nazanin Zounemat Kermani et.al.	2505.10556	null
2025-05-15	Real-Time Out-of-Distribution Failure Prevention via Multi-Modal Reasoning	Milan Ganai et.al.	2505.10547	null
2025-05-15	LibIQ: Toward Real-Time Spectrum Classification in O-RAN dApps	Filippo Olimpieri et.al.	2505.10537	link
2025-05-15	Internal State Estimation in Groups via Active Information Gathering	Xuebo Ji et.al.	2505.10415	null
2025-05-15	Two-Stage Generative Model for Intracranial Aneurysm Meshes with Morphological Marker Conditioning	Wenhao Ding et.al.	2505.10407	link
2025-05-15	Schreier-Coset Graph Propagation	Aryan Mishra et.al.	2505.10392	null
2025-05-15	Arbitrarily Small Execution-Time Certificate: What was Missed in Analog Optimization	Liang Wu et.al.	2505.10366	link
2025-05-15	FactsR: A Safer Method for Producing High Quality Healthcare Documentation	Victor Petrén Bach Hansen et.al.	2505.10360	null
2025-05-15	Optimizing Electric Bus Charging Scheduling with Uncertainties Using Hierarchical Deep Reinforcement Learning	Jiaju Qi et.al.	2505.10296	null
2025-05-15	From Questions to Clinical Recommendations: Large Language Models Driving Evidence-Based Clinical Decision Making	Dubai Li et.al.	2505.10282	link
2025-05-15	AttentionGuard: Transformer-based Misbehavior Detection for Secure Vehicular Platoons	Hexu Li et.al.	2505.10273	null
2025-05-15	Defect Detection in Photolithographic Patterns Using Deep Learning Models Trained on Synthetic Data	Prashant P. Shinde et.al.	2505.10192	null
2025-05-15	KAITIAN: A Unified Communication Framework for Enabling Efficient Collaboration Across Heterogeneous Accelerators in Embodied AI Systems	Jieke Lin et.al.	2505.10183	null
2025-05-15	Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence	Xiang He et.al.	2505.10176	link
2025-05-15	High-performance local automaton decoders for defect matching in 1D	Louis Paletta et.al.	2505.10162	null
2025-05-15	CFARNet: Learning-Based High-Resolution Multi-Target Detection for Rainbow Beam Radar	Qiushi Liang et.al.	2505.10150	null
2025-05-15	VRSplat: Fast and Robust Gaussian Splatting for Virtual Reality	Xuechang Tu et.al.	2505.10144	link
2025-05-15	IMITATE: Image Registration with Context for unknown time frame recovery	Ziad Kheil et.al.	2505.10124	link
2025-05-15	Learning Virtual Machine Scheduling in Cloud Computing through Language Agents	JieHao Wu et.al.	2505.10117	null
2025-05-15	LAV: Audio-Driven Dynamic Visual Generation with Neural Compression and StyleGAN2	Jongmin Jung et.al.	2505.10101	null
2025-05-14	UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing	Yung-Hsuan Lai et.al.	2505.09615	link
2025-05-14	Quantum simulation of bubble nucleation across a quantum phase transition	De Luo et.al.	2505.09607	null
2025-05-14	Spec2VolCAMU-Net: A Spectrogram-to-Volume Model for EEG-to-fMRI Reconstruction based on Multi-directional Time-Frequency Convolutional Attention Encoder and Vision-Mamba U-Net	Dongyi He et.al.	2505.09521	link
2025-05-14	Wearable Tracking of Eye and Body Movements During Breaching Training: Towards Real-Time Blast Injury Monitoring	Jeremy P. Kemmerer et.al.	2505.09508	null
2025-05-14	Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput	Bo Zhang et.al.	2505.09498	null
2025-05-14	Decentralized Nonlinear Model Predictive Control-Based Flock Navigation with Real-Time Obstacle Avoidance in Unknown Obstructed Environments	Nuthasith Gerdpratoom et.al.	2505.09434	null
2025-05-14	UMotion: Uncertainty-driven Human Motion Estimation from Inertial and Ultra-wideband Units	Huakun Liu et.al.	2505.09393	link
2025-05-14	Examining Deployment and Refinement of the VIOLA-AI Intracranial Hemorrhage Model Using an Interactive NeoMedSys Platform	Qinghui Liu et.al.	2505.09380	link
2025-05-14	ARCANE -- Early Detection of Interplanetary Coronal Mass Ejections	H. T. Rüdisser et.al.	2505.09365	link
2025-05-14	APR-Transformer: Initial Pose Estimation for Localization in Complex Environments through Absolute Pose Regression	Srinivas Ravuri et.al.	2505.09356	link
2025-05-14	Neural Video Compression using 2D Gaussian Splatting	Lakshya Gupta et.al.	2505.09324	null
2025-05-14	Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging	Hongjin Qian et.al.	2505.09316	null
2025-05-14	Robot-Assisted Drone Recovery on a Wavy Surface Using Error-State Kalman Filter and Receding Horizon Model Predictive Control	Yimou Wu et.al.	2505.09145	null
2025-05-14	Quantum Error-Corrected Computation of Molecular Energies	Kentaro Yamamoto et.al.	2505.09133	null
2025-05-14	Non-equilibrium scalar fields at finite temperature and density	Sebastian Mendizabal et.al.	2505.09104	null
2025-05-14	OpenLKA: An Open Dataset of Lane Keeping Assist from Recent Car Models under Real-world Driving Conditions	Yuhang Wang et.al.	2505.09092	link
2025-05-14	Modeling Interdependent Cybersecurity Threats Using Bayesian Networks: A Case Study on In-Vehicle Infotainment Systems	Sangita Sridar et.al.	2505.09048	null
2025-05-14	RT-cache: Efficient Robot Trajectory Retrieval System	Owen Kwon et.al.	2505.09040	null
2025-05-14	Multiparty Selective Disclosure using Attribute-Based Encryption	Shigenori Ohashi et.al.	2505.09034	null
2025-05-13	Enhancing Aerial Combat Tactics through Hierarchical Multi-Agent Reinforcement Learning	Ardian Selmonaj et.al.	2505.08995	null
2025-05-13	Aya Vision: Advancing the Frontier of Multilingual Multimodality	Saurabh Dash et.al.	2505.08751	null
2025-05-13	A Study of Data-driven Methods for Inventory Optimization	Lee Yeung Ping et.al.	2505.08673	null
2025-05-13	Claycode: Stylable and Deformable 2D Scannable Codes	Marco Maida et.al.	2505.08666	null
2025-05-13	ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking	Haofeng Liu et.al.	2505.08581	link
2025-05-13	End-to-End Multi-Task Policy Learning from NMPC for Quadruped Locomotion	Anudeep Sajja et.al.	2505.08574	null
2025-05-13	Extract the Best, Discard the Rest: CSI Feedback with Offline Large AI Models	Jialin Zhuang et.al.	2505.08566	null
2025-05-13	Towards Digital Twin in Flood Forecasting with Data Assimilation Satellite Earth Observations -- A Proof-of-Concept in the Alzette Catchment	Thanh Huy Nguyen et.al.	2505.08553	null
2025-05-13	The RaspGrade Dataset: Towards Automatic Raspberry Ripeness Grading with Deep Learning	Mohamed Lamine Mekhalfi et.al.	2505.08537	null
2025-05-13	Diffusion-assisted Model Predictive Control Optimization for Power System Real-Time Operation	Linna Xu et.al.	2505.08535	null
2025-05-13	Towards Resilient SDA: Graph Theory and Cooperative Control in Distributed Network Architectures	Nesrine Benchoubane et.al.	2505.08520	null
2025-05-13	Isolation Forest in Novelty Detection Scenario	Adam Ulrich et.al.	2505.08489	null
2025-05-13	BAT: Benchmark for Auto-bidding Task	Alexandra Khirianova et.al.	2505.08485	link
2025-05-13	Large Language Models Meet Stance Detection: A Survey of Tasks, Methods, Applications, Challenges and Future Directions	Lata Pangtey et.al.	2505.08464	null
2025-05-13	Measurements of molecular size and shape on a chip	Xin Zhu et.al.	2505.08452	null
2025-05-13	Anisotropic fluctuations of momentum and angular momentum of heavy quarks in the pre-equilibrium stage of pA collisions at the LHC	Gabriele Parisi et.al.	2505.08441	null
2025-05-13	MDF: Multi-Modal Data Fusion with CNN-Based Object Detection for Enhanced Indoor Localization Using LiDAR-SLAM	Saqi Hussain Kalan et.al.	2505.08388	null
2025-05-13	FauForensics: Boosting Audio-Visual Deepfake Detection with Facial Action Units	Jian Wang et.al.	2505.08294	null
2025-05-13	Ground-based Observations of Temporal Variation of Cosmic Ray Spectrum during Forbush Decreases	W. Mitthumsiri et.al.	2505.08248	null
2025-05-13	Motion Control of High-Dimensional Musculoskeletal Systems with Hierarchical Model-Based Planning	Yunyue Wei et.al.	2505.08238	null
2025-05-13	Online differentially private inference in stochastic gradient descent	Jinhan Xie et.al.	2505.08227	null
2025-05-13	VTutor for High-Impact Tutoring at Scale: Managing Engagement and Real-Time Multi-Screen Monitoring with P2P Connections	Eason Chen et.al.	2505.07736	null
2025-05-12	MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering	Rushi Qiang et.al.	2505.07782	link
2025-05-12	Robo-Taxi Fleet Coordination with Accelerated High-Capacity Ridepooling	Xinling Li et.al.	2505.07776	null
2025-05-12	Benchmarking of CPU-intensive Stream Data Processing in The Edge Computing Systems	Tomasz Szydlo et.al.	2505.07755	null
2025-05-12	Gameplay Highlights Generation	Vignesh Edithal et.al.	2505.07721	null
2025-05-12	Hybrid Control Strategies for Safe and Adaptive Robot-Assisted Dressing	Yasmin Rafiq et.al.	2505.07710	null
2025-05-12	Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications	Biel Tura Vecino et.al.	2505.07701	null
2025-05-12	Verified Purely Functional Catenable Real-Time Deques	Jules Viennot et.al.	2505.07681	null
2025-05-12	SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models	Hang Wu et.al.	2505.07680	null
2025-05-12	Intuitive Human-Robot Interfaces Leveraging on Autonomy Features for the Control of Highly-redundant Robots	Davide Torielli et.al.	2505.07668	link
2025-05-12	Neural Brain: A Neuroscience-inspired Framework for Embodied Agents	Jian Liu et.al.	2505.07634	link
2025-05-12	Deep Learning Advances in Vision-Based Traffic Accident Anticipation: A Comprehensive Review of Methods,Datasets,and Future Directions	Yi Zhang et.al.	2505.07611	null
2025-05-12	AgentFlow: Resilient Adaptive Cloud-Edge Framework for Multi-Agent Coordination	Ching Han Chen et.al.	2505.07603	null
2025-05-12	Decoding Chess Puzzle Play and Standard Cognitive Tasks for BCI: A Low-Cost EEG Study	Matthew Russell et.al.	2505.07592	null
2025-05-12	Privacy-Preserving Real-Time Vietnamese-English Translation on iOS using Edge AI	Cong Le et.al.	2505.07583	null
2025-05-12	Superstring entanglement at finite temperature and its Hagedorn behavior	Daniel Luiz Nedel et.al.	2505.07567	null
2025-05-12	Self-Supervised Event Representations: Towards Accurate, Real-Time Perception on SoC FPGAs	Kamil Jeziorek et.al.	2505.07556	null
2025-05-12	GIFStream: 4D Gaussian-based Immersive Video with Feature Stream	Hao Li et.al.	2505.07539	null
2025-05-12	Convex Trajectory Optimization via Monomial Coordinates Transcription for Cislunar Rendezvous	Omar Regantini et.al.	2505.07521	null
2025-05-12	Lightweight Multispectral Crop-Weed Segmentation for Precision Agriculture	Zeynep Galymzhankyzy et.al.	2505.07444	null
2025-05-09	A Large Language Model-Enhanced Q-learning for Capacitated Vehicle Routing Problem with Time Windows	Linjiang Cao et.al.	2505.06178	null
2025-05-09	Estimating Quality in Therapeutic Conversations: A Multi-Dimensional Natural Language Processing Framework	Alice Rueda et.al.	2505.06151	null
2025-05-09	S2MNet: Speckle-To-Mesh Net for Three-Dimensional Cardiac Morphology Reconstruction via Echocardiogram	Xilin Gong et.al.	2505.06105	null
2025-05-09	HashKitty: Distributed Password Analysis	Pedro Antunes et.al.	2505.06084	link
2025-05-09	Centralized Decision-Making for Platooning By Using SPaT-Driven Reference Speeds	Melih Yazgan et.al.	2505.06071	null
2025-05-09	Context Informed Incremental Learning Improves Myoelectric Control Performance in Virtual Reality Object Manipulation Tasks	Gabriel Gagné et.al.	2505.06064	link
2025-05-09	Fast Differentiable Modal Simulation of Non-linear Strings, Membranes, and Plates	Rodrigo Diaz et.al.	2505.05940	link
2025-05-09	Priority-Driven Safe Model Predictive Control Approach to Autonomous Driving Applications	Francesco Prignoli et.al.	2505.05933	null
2025-05-09	Multi-armed Bandit for Stochastic Shortest Path in Mixed Autonomy	Yu Bai et.al.	2505.05878	null
2025-05-09	DaringFed: A Dynamic Bayesian Persuasion Pricing for Online Federated Learning under Two-sided Incomplete Information	Yun Xin et.al.	2505.05842	null
2025-05-09	Human-in-the-Loop AI for HVAC Management Enhancing Comfort and Energy Efficiency	Xinyu Liang et.al.	2505.05796	null
2025-05-09	Quantitative Hardness Assessment with Vision-based Tactile Sensing for Fruit Classification and Grasping	Zhongyuan Liao et.al.	2505.05725	null
2025-05-08	An Efficient Transport-Based Dissimilarity Measure for Time Series Classification under Warping Distortions	Akram Aldroubi et.al.	2505.05676	null
2025-05-08	Adaptive Stress Testing Black-Box LLM Planners	Neeloy Chakraborty et.al.	2505.05665	null
2025-05-08	UltraGauss: Ultrafast Gaussian Reconstruction of 3D Ultrasound Volumes	Mark C. Eid et.al.	2505.05643	null
2025-05-08	LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities	Kalyan Nakka et.al.	2505.05619	link
2025-05-08	Trading Under Uncertainty: A Distribution-Based Strategy for Futures Markets Using FutureQuant Transformer	Wenhao Guo et.al.	2505.05595	null
2025-05-08	Flight Validation of Learning-Based Trajectory Optimization for the Astrobee Free-Flyer	Somrita Banerjee et.al.	2505.05588	null
2025-05-08	Steepest Descent Density Control for Compact 3D Gaussian Splatting	Peihao Wang et.al.	2505.05587	null
2025-05-08	Quantum-network nodes with real-time noise mitigation using spectator qubits	S. J. H. Loenen et.al.	2505.05582	null
2025-05-08	SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation	Yonwoo Choi et.al.	2505.05475	link
2025-05-08	StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant	Haibo Wang et.al.	2505.05467	null
2025-05-08	EDmamba: A Simple yet Effective Event Denoising Method with State Space Model	Ciyu Ruan et.al.	2505.05391	null
2025-05-08	OcularAge: A Comparative Study of Iris and Periocular Images for Pediatric Age Estimation	Naveenkumar G Venkataswamy et.al.	2505.05374	null
2025-05-08	Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization	Sooyoung Park et.al.	2505.05343	link
2025-05-08	Progressive Inertial Poser: Progressive Real-Time Kinematic Chain Estimation for 3D Full-Body Pose from Three IMU Sensors	Zunjie Zhu et.al.	2505.05336	null
2025-05-08	Advanced Stock Market Prediction Using Long Short-Term Memory Networks: A Comprehensive Deep Learning Framework	Rajneesh Chaudhary et.al.	2505.05325	null
2025-05-08	SmartTrap: Automated Precision Experiments with Optical Tweezers	Martin Selin et.al.	2505.05290	null
2025-05-08	CV-MP: Max-Pressure Control in Heterogeneously Distributed and Partially Connected Vehicle Environments	Chaopeng Tan et.al.	2505.05258	null
2025-05-08	Adaptive Biased User Scheduling for Heterogeneous Wireless Federate Learning Network	Changxiang Wu et.al.	2505.05231	null
2025-05-08	PaniCar: Securing the Perception of Advanced Driving Assistance Systems Against Emergency Vehicle Lighting	Elad Feldman et.al.	2505.05183	null
2025-05-08	Multi-agent Embodied AI: Advances and Future Directions	Zhaohan Feng et.al.	2505.05108	null
2025-05-08	Pairing Real-Time Piano Transcription with Symbol-level Tracking for Precise and Robust Score Following	Silvan Peter et.al.	2505.05078	null
2025-05-08	xTrace: A Facial Expressive Behaviour Analysis Tool for Continuous Affect Recognition	Mani Kumar Tellamekala et.al.	2505.05043	null
2025-05-08	Reality-infused Deep Learning Framework via Angle-resolved Metasurfaces	Wei Chen et.al.	2505.05011	null
2025-05-08	StabStitch++: Unsupervised Online Video Stitching with Spatiotemporal Bidirectional Warps	Lang Nie et.al.	2505.05001	link
2025-05-08	Robust Model-Based In-Hand Manipulation with Integrated Real-Time Motion-Contact Planning and Tracking	Yongpeng Jiang et.al.	2505.04978	null
2025-05-08	The candidates of 2 $α$ condensate around the 16O nucleus studied by the real-time evolution method	Y. M. Htet et.al.	2505.04975	null
2025-05-08	AI and Vision based Autonomous Navigation of Nano-Drones in Partially-Known Environments	Mattia Sartori et.al.	2505.04972	null
2025-05-08	Real-Time Model Predictive Control of Vehicles with Convex-Polygon-Aware Collision Avoidance in Tight Spaces	Haruki Kojima et.al.	2505.04935	null
2025-05-07	EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning	Zhenghao Xing et.al.	2505.04623	link
2025-05-07	Dynamic Network Flow Optimization for Task Scheduling in PTZ Camera Surveillance Systems	Mohammad Merati et.al.	2505.04596	null
2025-05-07	Runtime Advocates: A Persona-Driven Framework for Requirements@Runtime Decision Support	Demetrius Hernandez et.al.	2505.04551	null
2025-05-07	Edge-GPU Based Face Tracking for Face Detection and Recognition Acceleration	Asma Baobaid et.al.	2505.04524	null
2025-05-07	Leveraging Simultaneous Usage of Edge GPU Hardware Engines for Video Face Detection and Recognition	Asma Baobaid et.al.	2505.04502	null
2025-05-07	Estimating Dynamic Soft Continuum Robot States From Boundaries	Tongjia Zheng et.al.	2505.04491	null
2025-05-07	"I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments	Ziyi Zhang et.al.	2505.04488	null
2025-05-07	Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration	Shigeki Karita et.al.	2505.04457	link
2025-05-07	Meta-Learning Driven Lightweight Phase Shift Compression for IRS-Assisted Wireless Systems	Xianhua Yu et.al.	2505.04453	null
2025-05-07	Phase Shift Information Compression in IRS-aided Wireless Systems: Challenges and Opportunities	Xianhua Yu et.al.	2505.04449	null
2025-05-07	SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer	Young-Hu Park et.al.	2505.04394	null
2025-05-07	Predicting Road Surface Anomalies by Visual Tracking of a Preceding Vehicle	Petr Jahoda et.al.	2505.04392	null
2025-05-07	Design and Evaluation of an NDN-Based Network for Distributed Digital Twins	Chen Chen et.al.	2505.04326	null
2025-05-07	Massive MIMO: Instantaneous versus Statistical CSI-Based Power Allocation	Zahra Mobini et.al.	2505.04294	null
2025-05-07	Integrated Airline Fleet and Crew Recovery through Local Search	Philip de Bruin et.al.	2505.04274	null
2025-05-07	RGB-Event Fusion with Self-Attention for Collision Prediction	Pietro Bonazzi et.al.	2505.04258	link
2025-05-07	Multi-Agent Reinforcement Learning-based Cooperative Autonomous Driving in Smart Intersections	Taoyuan Yu et.al.	2505.04231	null
2025-05-07	An Enhanced YOLOv8 Model for Real-Time and Accurate Pothole Detection and Measurement	Mustafa Yurdakul et.al.	2505.04207	null
2025-05-07	Spatial-Wavelength Multiplexing Reliable Photonic Integrated General-Purpose Analog Computing System	Tao Zhu et.al.	2505.04197	null
2025-05-07	Beyond Task Performance: Human Experience in Human-Robot Collaboration	Sean Kille et.al.	2505.04182	null
2025-05-06	VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model	Zuwei Long et.al.	2505.03739	link
2025-05-06	AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control	Jialong Li et.al.	2505.03738	null
2025-05-06	Frenet Corridor Planner: An Optimal Local Path Planning Framework for Autonomous Driving	Faizan M. Tariq et.al.	2505.03695	null
2025-05-06	RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration	Huajie Tan et.al.	2505.03673	link
2025-05-06	PAHA: Parts-Aware Audio-Driven Human Animation with Diffusion Model	Y. B. Wang et.al.	2505.03603	null
2025-05-06	LlamaFirewall: An open source guardrail system for building secure AI agents	Sahana Chennabasappa et.al.	2505.03574	null
2025-05-06	Real-Time Person Image Synthesis Using a Flow Matching Model	Jiwoo Jeong et.al.	2505.03562	link
2025-05-06	Rapid AI-based generation of coverage paths for dispensing applications	Simon Baeuerle et.al.	2505.03560	null
2025-05-06	Real-time small area estimation of food security in Zimbabwe: integrating mobile-phone and face-to-face surveys using joint multilevel regression and poststratification	Sahoko Ishida et.al.	2505.03517	link
2025-05-06	Learning-based Homothetic Tube MPC	Yulong Gao et.al.	2505.03482	link
2025-05-06	A generalised non-linear reconstructor for all Fourier-type wavefront sensors	Victoria Laidlaw et.al.	2505.03477	null
2025-05-06	Simulation to Reality: Testbeds and Architectures for Connected and Automated Vehicles	David Klüner et.al.	2505.03472	null
2025-05-06	Mitigating Backdoor Triggered and Targeted Data Poisoning Attacks in Voice Authentication Systems	Alireza Mohammadi et.al.	2505.03455	null
2025-05-06	Advancing Remote and Continuous Cardiovascular Patient Monitoring through a Novel and Resource-efficient IoT-Driven Framework	Sanam Nayab et.al.	2505.03409	null
2025-05-06	Quantum Feature Space of a Qubit Coupled to an Arbitrary Bath	Chris Wise et.al.	2505.03397	link
2025-05-06	DroidRetriever: An Autonomous Navigation and Information Integration System Facilitating Mobile Sensemaking	Yiheng Bian et.al.	2505.03364	null
2025-05-06	GUAVA: Generalizable Upper Body 3D Gaussian Avatar	Dongbin Zhang et.al.	2505.03351	null
2025-05-06	Artificial Behavior Intelligence: Technology, Challenges, and Future Directions	Kanghyun Jo et.al.	2505.03315	null
2025-05-06	An Active Inference perspective on Neurofeedback Training	Côme Annicchiarico et.al.	2505.03308	null
2025-05-06	Model Predictive Fuzzy Control: A Hierarchical Multi-Agent Control Architecture for Outdoor Search-and-Rescue Robots	Craig Maxwell et.al.	2505.03257	null
2025-05-05	Beyond the Monitor: Mixed Reality Visualization and AI for Enhanced Digital Pathology Workflow	Jai Prakash Veerla et.al.	2505.02780	link
2025-05-05	Teaching the social media generation: rethinking learning without sacrificing quality	Sepinoud Azimi et.al.	2505.02770	null
2025-05-05	Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play	Yemin Shi et.al.	2505.02707	link
2025-05-05	Dance of Fireworks: An Interactive Broadcast Gymnastics Training System Based on Pose Estimation	Haotian Chen et.al.	2505.02690	null
2025-05-05	Adaptive Budgeted Multi-Armed Bandits for IoT with Dynamic Resource Constraints	Shubham Vaishnav et.al.	2505.02640	null
2025-05-05	LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis	Qingkai Fang et.al.	2505.02625	link
2025-05-05	Wise Goose Chase: A Predictive Path Planning Algorithm for Dynamic Rebalancing in Ride-Hailing Systems	Avalpreet Singh Brar et.al.	2505.02603	null
2025-05-05	Maximal Compatibility Matching for Preference-Aware Ride-Hailing Systems	Avalpreet Singh Brar et.al.	2505.02599	null
2025-05-05	LiDAR-Inertial SLAM-Based Navigation and Safety-Oriented AI-Driven Control System for Skid-Steer Robots	Mehdi Heydari Shahna et.al.	2505.02598	null
2025-05-05	Spatiotemporal Non-Uniformity-Aware Online Task Scheduling in Collaborative Edge Computing for Industrial Internet of Things	Yang Li et.al.	2505.02597	null
2025-05-05	HapticVLM: VLM-Driven Texture Recognition Aimed at Intelligent Haptic Interaction	Muhammad Haris Khan et.al.	2505.02569	null
2025-05-05	Machine-Learning-Powered Neural Interfaces for Smart Prosthetics and Diagnostics	MohammadAli Shaeri et.al.	2505.02516	null
2025-05-05	ReeM: Ensemble Building Thermodynamics Model for Efficient HVAC Control via Hierarchical Reinforcement Learning	Yang Deng et.al.	2505.02439	null
2025-05-05	Towards Effective Issue Assignment using Online Machine Learning	Athanasios Michailoudis et.al.	2505.02437	link
2025-05-05	Encrypted Federated Search Using Homomorphic Encryption	Om Rathod et.al.	2505.02409	null
2025-05-05	A Real-Time Control Barrier Function-Based Safety Filter for Motion Planning with Arbitrary Road Boundary Constraints	Jianye Xu et.al.	2505.02395	link
2025-05-05	Sloshing suppression with a controlled elastic baffle via deep reinforcement learning and SPH simulation	Mai Ye et.al.	2505.02354	null
2025-05-05	Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering	Jihao Zhao et.al.	2505.02311	link
2025-05-04	RNBF: Real-Time RGB-D Based Neural Barrier Functions for Safe Robotic Navigation	Satyajeet Das et.al.	2505.02294	null
2025-05-04	Real-time Spatial Retrieval Augmented Generation for Urban Environments	David Nazareno Campo et.al.	2505.02271	null
2025-05-02	FalconWing: An Open-Source Platform for Ultra-Light Fixed-Wing Aircraft Research	Yan Miao et.al.	2505.01383	null
2025-05-02	An Efficient Real-Time Planning Method for Swarm Robotics Based on an Optimal Virtual Tube	Pengda Mao et.al.	2505.01380	null
2025-05-02	Closing the Loop: A Systematic Review of Experience-Driven Game Adaptation	Phil Lopes et.al.	2505.01351	null
2025-05-02	How much to Dereverberate? Low-Latency Single-Channel Speech Enhancement in Distant Microphone Scenarios	Satvik Venkatesh et.al.	2505.01338	null
2025-05-02	Early Detection of Patient Deterioration from Real-Time Wearable Monitoring System	Lo Pang-Yun Ting et.al.	2505.01305	null
2025-05-02	Contactless pulse rate assessment: Results and insights for application in driving simulator	Đorđe D. Nešković et.al.	2505.01299	null
2025-05-02	FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing	Gaoxiang Cong et.al.	2505.01263	null
2025-05-02	CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment	Edson Araujo et.al.	2505.01237	link
2025-05-02	Efficient Vision-based Vehicle Speed Estimation	Andrej Macko et.al.	2505.01203	null
2025-05-02	Machine learning-based prediction of species mass fraction and flame characteristics in partially premixed turbulent jet flame	Amirali Shateri et.al.	2505.01201	null
2025-05-02	AGRO: An Autonomous AI Rover for Precision Agriculture	Simar Ghumman et.al.	2505.01200	null
2025-05-02	A Secured Triad of IoT, Machine Learning, and Blockchain for Crop Forecasting in Agriculture	Najmus Sakib Sizan et.al.	2505.01196	null
2025-05-02	Fast Flow-based Visuomotor Policies via Conditional Optimal Transport Couplings	Andreas Sochopoulos et.al.	2505.01179	null
2025-05-02	Empirical Comparison of Lightweight Forecasting Models for Seasonal and Non-Seasonal Time Series	Thanh Son Nguyen et.al.	2505.01163	null
2025-05-02	Machine Learning for Physical Simulation Challenge Results and Retrospective Analysis: Power Grid Use Case	Milad Leyli-Abadi et.al.	2505.01156	null
2025-05-02	In-Situ Growth and Ionic Switching Behavior of Single-Crystalline Silver Iodide Nanoflakes	Amir Parsi et.al.	2505.01062	null
2025-05-02	Model Tensor Planning	An T. Le et.al.	2505.01059	link
2025-05-02	Kinetic roughening transition of ice crystals and its implications during recrystallization	Jorge H. Melillo et.al.	2505.01055	null
2025-05-02	Tightly Coupled Range Inertial Odometry and Mapping with Exact Point Cloud Downsampling	Kenji Koide et.al.	2505.01017	null
2025-05-02	Identifying Root Cause of bugs by Capturing Changed Code Lines with Relational Graph Neural Networks	Jiaqi Zhang et.al.	2505.00990	link
2025-05-01	A Practical Framework for Simulating Time-Resolved Spectroscopy Based on a Real-time Dyson Expansion	Cian Reeves et.al.	2505.00667	null
2025-05-01	Open-Source LLM-Driven Federated Transformer for Predictive IoV Management	Yazan Otoum et.al.	2505.00651	null
2025-05-01	Deep Learning Assisted Outer Volume Removal for Highly-Accelerated Real-Time Dynamic MRI	Merve Gülle et.al.	2505.00643	null
2025-05-01	Fully passive quantum random number generation with untrusted light	KaiWei Qiu et.al.	2505.00636	null
2025-05-01	A Novel Feature-Aware Chaotic Image Encryption Scheme For Data Security and Privacy in IoT and Edge Networks	Muhammad Shahbaz Khan et.al.	2505.00593	null
2025-05-01	Bridging Cultural and Digital Divides: A Low-Latency JackTrip Framework for Equitable Music Education in the Global South	Tiange Zhou et.al.	2505.00550	null
2025-05-01	Leveraging Partial SMILES Validation Scheme for Enhanced Drug Design in Reinforcement Learning Frameworks	Xinyu Wang et.al.	2505.00530	null
2025-05-01	UserCentrix: An Agentic Memory-augmented AI Framework for Smart Spaces	Alaa Saleh et.al.	2505.00472	null
2025-05-01	HoneyWin: High-Interaction Windows Honeypot in Enterprise Environment	Yan Lin Aung et.al.	2505.00465	null
2025-05-01	Real-Time Animatable 2DGS-Avatars with Detail Enhancement from Monocular Videos	Xia Yuan et.al.	2505.00421	null
2025-05-01	Multi-dimensional optical imaging on a chip	Liheng Bian et.al.	2505.00408	link
2025-05-01	Stealth Signals: Multi-Discriminator GANs for Covert Communications Against Diverse Wardens	Afan Ali et.al.	2505.00399	null
2025-05-01	Urban Air Mobility as a System of Systems: An LLM-Enhanced Holonic Approach	Ahmed R. Sadik et.al.	2505.00368	null
2025-05-01	Edge Large AI Models: Revolutionizing 6G Networks	Zixin Wang et.al.	2505.00321	null
2025-05-01	Avatar Communication Provides More Efficient Online Social Support Than Text Communication	Masanori Takano et.al.	2505.00287	null
2025-05-01	Empowering Agentic Video Analytics Systems with Video Language Models	Yuxuan Yan et.al.	2505.00254	null
2025-05-01	LLM-Based Threat Detection and Prevention Framework for IoT Ecosystems	Yazan Otoum et.al.	2505.00240	null
2025-04-30	Real-Time Brain-Computer Interface Control of Walking Exoskeleton with Bilateral Sensory Feedback	Jeffrey Lim et.al.	2505.00219	null
2025-04-30	PSN Game: Game-theoretic Planning via a Player Selection Network	Tianyu Qiu et.al.	2505.00213	null
2025-04-30	Generative Machine Learning in Adaptive Control of Dynamic Manufacturing Processes: A Review	Suk Ki Lee et.al.	2505.00210	null
2025-04-30	A Survey of Interactive Generative Video	Jiwen Yu et.al.	2504.21853	null
2025-04-30	Differentiable Room Acoustic Rendering with Multi-View Vision Priors	Derong Jin et.al.	2504.21847	null
2025-04-30	Why Compress What You Can Generate? When GPT-4o Generation Ushers in Image Compression Fields	Yixin Gao et.al.	2504.21814	null
2025-04-30	WebThinker: Empowering Large Reasoning Models with Deep Research Capability	Xiaoxi Li et.al.	2504.21776	link
2025-04-30	Smart Environmental Monitoring of Marine Pollution using Edge AI	Mohamed Moursi et.al.	2504.21759	null
2025-04-30	TheraQuest: A Gamified, LLM-Powered Simulation for Massage Therapy Training	Shengqian Wang et.al.	2504.21735	null
2025-04-30	MovementVR: An open-source tool for the study of motor control and learning in virtual reality	Cristina Rossi et.al.	2504.21696	null
2025-04-30	Enhancing Health Mention Classification Performance: A Study on Advancements in Parameter Efficient Tuning	Reem Abdel-Salam et.al.	2504.21685	null
2025-04-30	Effect of eccentric mixing parameters on chaotic characteristics and mixing time for viscous liquid based on sound decibels	Ronfgang Wang et.al.	2504.21621	null
2025-04-30	Real Time Semantic Segmentation of High Resolution Automotive LiDAR Scans	Hannes Reichert et.al.	2504.21602	link
2025-04-30	Real-time Program Evaluation using Anytime-valid Rank Tests	Sam van Meer et.al.	2504.21595	null
2025-04-30	Toward Realization of Low-Altitude Economy Networks: Core Architecture, Integrated Technologies, and Future Directions	Yixian Wang et.al.	2504.21583	null
2025-04-30	Scientific Workflow Scheduling in Cloud Considering Cold Start and Variable Pricing Model	Suvarthi Sarkar et.al.	2504.21536	null
2025-04-30	Efficient Conversational Search via Topical Locality in Dense Retrieval	Cristina Ioana Muntean et.al.	2504.21507	link
2025-04-30	Turning a Disposable Bronchoscope into a Dynamic Speckle Imaging Tool: Yes, It Works	Aurélien plyer et.al.	2504.21469	null
2025-04-30	Integration of a Synthetic Molecular Motor Into a Rotary DNA Nanostructure: A Framework for Single-Molecule Actuation	Seham Helmi et.al.	2504.21434	null
2025-04-30	Enhanced Semi-Supervised Stamping Process Monitoring with Physically-Informed Feature Extraction	Jianyu Zhang et.al.	2504.21389	null
2025-04-30	DGFNet: End-to-End Audio-Visual Source Separation Based on Dynamic Gating Fusion	Yinfeng Yu et.al.	2504.21366	null
2025-04-30	ImaginateAR: AI-Assisted In-Situ Authoring in Augmented Reality	Jaewook Lee et.al.	2504.21360	null
2025-04-30	Generative QoE Modeling: A Lightweight Approach for Telecom Networks	Vinti Nayar et.al.	2504.21353	null
2025-04-29	Real-Time Wayfinding Assistant for Blind and Low-Vision Users	Dabbrata Das et.al.	2504.20976	null
2025-04-29	SVD Based Least Squares for X-Ray Pneumonia Classification Using Deep Features	Mete Erdogan et.al.	2504.20970	null
2025-04-29	AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security	Zikui Cai et.al.	2504.20965	link
2025-04-29	Mìmir: A real-time interactive visualization library for CUDA programs	Francisco Carter et.al.	2504.20937	null
2025-04-29	SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings	Florian Vahl et.al.	2504.20808	null
2025-04-29	Integrating Human Feedback into a Reinforcement Learning-Based Framework for Adaptive User Interfaces	Daniel Gaspar-Figueiredo et.al.	2504.20782	null
2025-04-29	An Online Cross-layered Defense Strategy with Bandwidth Allocation for Multi-channel Systems under DoS Attacks	Liheng Wan et.al.	2504.20762	null
2025-04-29	Confidence-based Intent Prediction for Teleoperation in Bimanual Robotic Suturing	Zhaoyang Jacopo Hu et.al.	2504.20761	null
2025-04-29	Graph-Based Fault Diagnosis for Rotating Machinery: Adaptive Segmentation and Structural Feature Integration	Moirangthem Tiken Singh et.al.	2504.20756	null
2025-04-29	Formal and Empirical Study of Metadata-Based Profiling for Resource Management in the Computing Continuum	Andrea Morichetta et.al.	2504.20740	link
2025-04-29	Intelligent Task Offloading in VANETs: A Hybrid AI-Driven Approach for Low-Latency and Energy Efficiency	Tariq Qayyum et.al.	2504.20735	null
2025-04-29	A High-Granularity Proton CT Enhanced by Track Discrimination	Huang-Chao Shi et.al.	2504.20698	null
2025-04-29	Efficient Listener: Dyadic Facial Motion Synthesis via Action Diffusion	Zesheng Wang et.al.	2504.20685	null
2025-04-29	Quantum Computation for Jets in Heavy Ion Collisions	Wenyang Qian et.al.	2504.20683	null
2025-04-29	FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection	Yao Xiao et.al.	2504.20670	null
2025-04-29	Quantum-Enhanced Hybrid Reinforcement Learning Framework for Dynamic Path Planning in Autonomous Systems	Sahil Tomar et.al.	2504.20660	null
2025-04-29	PaRT: Enhancing Proactive Social Chatbots with Personalized Real-Time Retrieval	Zihan Niu et.al.	2504.20624	null
2025-04-29	Information Retrieval in the Age of Generative AI: The RGB Model	Michele Garetto et.al.	2504.20610	link
2025-04-29	WakeLoc: An Ultra-Low Power, Accurate and Scalable On-Demand RTLS using Wake-Up Radios	Silvano Cortesi et.al.	2504.20545	null
2025-04-29	Digital Twin-Empowered Cooperative Autonomous Car-sharing Services: Proof-of-Concept	Kazuma Nonomura et.al.	2504.20542	null
2025-04-29	Feelbert: A Feedback Linearization-based Embedded Real-Time Quadrupedal Locomotion Framework	Aristide Emanuele Casucci et.al.	2504.19965	null
2025-04-28	Learning Streaming Video Representation via Multitask Training	Yibin Yan et.al.	2504.20041	null
2025-04-28	HJRNO: Hamilton-Jacobi Reachability with Neural Operators	Yankai Li et.al.	2504.19989	null
2025-04-28	Real-Time Imitation of Human Head Motions, Blinks and Emotions by Nao Robot: A Closed-Loop Approach	Keyhan Rayati et.al.	2504.19985	null
2025-04-28	Shopformer: Transformer-Based Framework for Detecting Shoplifting via Human Pose	Narges Rashvand et.al.	2504.19970	null
2025-04-28	Automated decision-making for dynamic task assignment at scale	Riccardo Lo Bianco et.al.	2504.19933	link
2025-04-28	NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks	Chia-Yu Hung et.al.	2504.19854	null
2025-04-28	Optimizing the Charging of Open Quantum Batteries using Long Short-Term Memory-Driven Reinforcement Learning	Shadab Zakavati et.al.	2504.19840	null
2025-04-28	Optimal real-time dynamic treatment regimes with application to oxytocin use in preventing postpartum hemorrhage	Haiyan Zhu et.al.	2504.19831	null
2025-04-28	Digital Twin-based Out-of-Distribution Detection in Autonomous Vessels	Erblin Isaku et.al.	2504.19816	null
2025-04-28	Contrastive Language-Image Learning with Augmented Textual Prompts for 3D/4D FER Using Vision-Language Model	Muzammil Behzad et.al.	2504.19739	null
2025-04-28	The ATLAS of Traffic Lights: A Reliable Perception Framework for Autonomous Driving	Rupert Polley et.al.	2504.19722	null
2025-04-28	Advances in Approximate Bayesian Inference for Models in Epidemiology	Xiahui Li et.al.	2504.19698	null
2025-04-28	GPA-RAM: Grasp-Pretraining Augmented Robotic Attention Mamba for Spatial Task Learning	Juyi Sheng et.al.	2504.19683	null
2025-04-28	Neuronal correlations shape the scaling behavior of memory capacity and nonlinear computational capability of recurrent neural networks	Shotaro Takasu et.al.	2504.19657	null
2025-04-28	Transformation & Translation Occupancy Grid Mapping: 2-Dimensional Deep Learning Refined SLAM	Leon Davies et.al.	2504.19654	null
2025-04-28	GAN-SLAM: Real-Time GAN Aided Floor Plan Creation Through SLAM	Leon Davies et.al.	2504.19653	null
2025-04-28	Robot Motion Planning using One-Step Diffusion with Noise-Optimized Approximate Motions	Tomoharu Aizu et.al.	2504.19652	null
2025-04-28	QFDNN: A Resource-Efficient Variational Quantum Feature Deep Neural Networks for Fraud Detection and Loan Prediction	Subham Das et.al.	2504.19632	null
2025-04-28	ARMOR: Adaptive Meshing with Reinforcement Optimization for Real-time 3D Monitoring in Unexposed Scenes	Yizhe Zhang et.al.	2504.19624	null
2025-04-25	Automating Nanoindentation: Optimizing Workflows for Precision and Accuracy	Vivek Chawla et.al.	2504.18525	null
2025-04-25	Online Distributed Queue Length Estimation	Aditya Bhaskara et.al.	2504.18503	null
2025-04-25	A Taylor Series Approach to Correction of Input Errors in Gaussian Process Regression	Muzaffar Qureshi et.al.	2504.18463	null
2025-04-25	Enhancing Strawberry Yield Forecasting with Backcasted IoT Sensor Data and Machine Learning	Tewodros Alemu Ayall et.al.	2504.18451	null
2025-04-25	Online learning to accelerate nonlinear PDE solvers: applied to multiphase porous media flow	Vinicius L S Silva et.al.	2504.18414	null
2025-04-25	Virial theorem for rigidly rotating matter	Sourav Dey et.al.	2504.18388	null
2025-04-25	Renewable-Colocated Green Hydrogen Production: Optimal Scheduling and Profitability	Siying Li et.al.	2504.18368	null
2025-04-25	SSD-Poser: Avatar Pose Estimation with State Space Duality from Sparse Observations	Shuting Zhao et.al.	2504.18332	null
2025-04-25	STP4D: Spatio-Temporal-Prompt Consistent Modeling for Text-to-4D Gaussian Splatting	Yunze Deng et.al.	2504.18318	null
2025-04-25	Design and Evaluation of a UGV-Based Robotic Platform for Precision Soil Moisture Remote Sensing	Ilektra Tsimpidi et.al.	2504.18284	null
2025-04-25	Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator	Minjae Kang et.al.	2504.18283	null
2025-04-25	SecCityVR: Visualization and Collaborative Exploration of Software Vulnerabilities in Virtual Reality	Dennis Wüppelman et.al.	2504.18238	null
2025-04-25	Time and Frequency Domain-based Anomaly Detection in Smart Meter Data for Distribution Network Studies	Petar Labura et.al.	2504.18231	null
2025-04-25	Sampling-Based Grasp and Collision Prediction for Assisted Teleoperation	Simon Manschitz et.al.	2504.18186	null
2025-04-25	PerfCam: Digital Twinning for Production Lines Using 3D Gaussian Splatting and Vision Models	Michel Gokan Khan et.al.	2504.18165	link
2025-04-25	Evaluation of Distimation's Real-world Performance on a Superconducting Quantum Computer	Hikaru Yokomori et.al.	2504.18141	null
2025-04-25	Study on Real-Time Road Surface Reconstruction Using Stereo Vision	Deepak Ghimire et.al.	2504.18112	null
2025-04-25	Teleportation-based Speed Meter for Precision Measurement	Yohei Nishino et.al.	2504.18111	null
2025-04-25	Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation	Weipeng Tan et.al.	2504.18087	null
2025-04-25	Phonon-Assisted Radiative Lifetimes and Exciton Dynamics from First Principles	Chunhao Guo et.al.	2504.18071	null
2025-04-24	Replay to Remember: Retaining Domain Knowledge in Streaming Language Models	Sneh Pillai et.al.	2504.17780	null
2025-04-24	Disaggregated Deep Learning via In-Physics Computing at Radio Frequency	Zhihui Gao et.al.	2504.17752	null
2025-04-24	BIM-Constrained Optimization for Accurate Localization and Deviation Correction in Construction Monitoring	Asier Bikandi et.al.	2504.17693	null
2025-04-24	Optimized Cloud Resource Allocation Using Genetic Algorithms for Energy Efficiency and QoS Assurance	Caroline Panggabean et.al.	2504.17675	null
2025-04-24	Unifying Complementarity Constraints and Control Barrier Functions for Safe Whole-Body Robot Control	Rafael I. Cabral Muchacho et.al.	2504.17647	null
2025-04-24	Beyond Labels: Zero-Shot Diabetic Foot Ulcer Wound Segmentation with Self-attention Diffusion Models and the Potential for Text-Guided Customization	Abderrachid Hamrani et.al.	2504.17628	null
2025-04-24	TSUE: A Two-Stage Data Update Method for an Erasure Coded Cluster File System	Zheng Wei et.al.	2504.17598	null
2025-04-24	RGB-D Tracking via Hierarchical Modality Aggregation and Distribution Network	Boyue Xu et.al.	2504.17595	null
2025-04-24	A Multi-Agent, Laxity-Based Aggregation Strategy for Cost-Effective Electric Vehicle Charging and Local Transformer Overload Prevention	Kristoffer Christensen et.al.	2504.17575	null
2025-04-24	Flying through cluttered and dynamic environments with LiDAR	Huajie Wu et.al.	2504.17569	null
2025-04-24	IRA: Adaptive Interest-aware Representation and Alignment for Personalized Multi-interest Retrieval	Youngjune Lee et.al.	2504.17529	null
2025-04-24	Adaptive Orchestration of Modular Generative Information Access Systems	Mohanna Hoveyda et.al.	2504.17454	link
2025-04-24	Storing and Querying Evolving Graphs in NoSQL Storage Models	Alexandros Spitalas et.al.	2504.17438	null
2025-04-24	StereoMamba: Real-time and Robust Intraoperative Stereo Disparity Estimation via Long-range Spatial Dependencies	Xu Wang et.al.	2504.17401	null
2025-04-24	Inverse-Designed Metasurfaces for Wavefront Restoration in Under-Display Camera Systems	Jaegang Jo et.al.	2504.17368	null
2025-04-24	TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos	Linli Yao et.al.	2504.17343	link
2025-04-24	Bridging Optical Sensing and Wearable Health Monitoring: A Functionalized Plasmonic Nanopillar for Non-Invasive Sweat Glucose Detection	Ling Liu et.al.	2504.17339	null
2025-04-24	EdgePoint2: Compact Descriptors for Superior Efficiency and Accuracy	Haodi Yao et.al.	2504.17280	null
2025-04-24	MV-Crafter: An Intelligent System for Music-guided Video Generation	Chuer Chen et.al.	2504.17267	null
2025-04-24	Symbolic Representation for Any-to-Any Generative Tasks	Jiaqi Chen et.al.	2504.17261	null
2025-04-24	Fast Online Adaptive Neural MPC via Meta-Learning	Yu Mei et.al.	2504.16369	link
2025-04-23	Meta-Learning Online Dynamics Model Adaptation in Off-Road Autonomous Driving	Jacob Levy et.al.	2504.16923	null
2025-04-23	An Accelerated Camera 3DMA Framework for Efficient Urban GNSS Multipath Estimation	Shiyao Lv et.al.	2504.16906	null
2025-04-23	Reconfigurable Intelligent Surface Control for a Moving Receiver	Hamed Radpour et.al.	2504.16874	null
2025-04-23	Graph2Nav: 3D Object-Relation Graph Generation to Robot Navigation	Tixiao Shan et.al.	2504.16782	null
2025-04-23	Evaluation Framework for AI Systems in "the Wild"	Sarah Jabbour et.al.	2504.16778	null
2025-04-23	Deep photonic reservoir computer for nonlinear equalization of 16-level quadrature amplitude modulation signals	Rui-Qian Li et.al.	2504.16769	null
2025-04-23	Beating the break-even point with autonomous quantum error correction	Yi Li et.al.	2504.16746	null
2025-04-23	PP-Tac: Paper Picking Using Tactile Feedback in Dexterous Robotic Hands	Pei Lin et.al.	2504.16649	null
2025-04-23	Bridging Econometrics and AI: VaR Estimation via Reinforcement Learning and GARCH Models	Fredy Pokou et.al.	2504.16635	null
2025-04-23	Data-Assimilated Model-Based Reinforcement Learning for Partially Observed Chaotic Flows	Defne E. Ozan et.al.	2504.16588	null
2025-04-23	PsyCounAssist: A Full-Cycle AI-Powered Psychological Counseling Assistant System	Xianghe Liu et.al.	2504.16573	null
2025-04-23	A Collaborative Intrusion Detection System Using Snort IDS Nodes	Tom Davies et.al.	2504.16550	null
2025-04-23	6G EdgeAI: Performance Evaluation and Analysis	Chien-Sheng Yang et.al.	2504.16529	null
2025-04-23	Intelligent Depression Prevention via LLM-Based Dialogue Analysis: Overcoming the Limitations of Scale-Dependent Diagnosis through Precise Emotional Pattern Recognition	Zhenguang Zhong et.al.	2504.16504	null
2025-04-23	FeedQUAC: Quick Unobtrusive AI-Generated Commentary	Tao Long et.al.	2504.16416	null
2025-04-23	Circinus: Efficient Query Planner for Compound ML Serving	Banruo Liu et.al.	2504.16397	null
2025-04-23	Fast and Modular Whole-Body Lagrangian Dynamics of Legged Robots with Changing Morphology	Sahand Farghdani et.al.	2504.16383	null
2025-04-23	SILM: A Subjective Intent Based Low-Latency Framework for Multiple Traffic Participants Joint Trajectory Prediction	Qu Weiming et.al.	2504.16377	null
2025-04-23	Revisiting Radar Camera Alignment by Contrastive Learning for 3D Object Detection	Linhua Kong et.al.	2504.16368	null
2025-04-22	PRIME: Fast Primal-Dual Feedback Optimization for Markets with Application to Optimal Power Flow	Nicholas Julian Behr et.al.	2504.16048	link
2025-04-22	A Comparative and Measurement-Based Study on Real-Time Network KPI Extraction Methods for 5G and Beyond Applications	Batuhan Kaplan et.al.	2504.16039	null
2025-04-22	LLMs meet Federated Learning for Scalable and Secure IoT Management	Yazan Otoum et.al.	2504.16032	null
2025-04-22	LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale	Joya Chen et.al.	2504.16030	null
2025-04-22	A UAV-Aided Digital Twin Framework for IoT Networks with High Accuracy and Synchronization	Ghofran Khalaf et.al.	2504.15967	null
2025-04-22	FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation	Zebin Yao et.al.	2504.15958	link
2025-04-22	Monocular inspection of spacecraft under illumination constraints and avoidance regions	Tochukwu Elijah Ogri et.al.	2504.15954	null
2025-04-22	Real-time raw signal genomic analysis using fully integrated memristor hardware	Peiyi He et.al.	2504.15934	link
2025-04-22	Learning the Spoofability of Limit Order Books With Interpretable Probabilistic Neural Networks	Timothée Fabre et.al.	2504.15908	null
2025-04-22	RaSCL: Radar to Satellite Crossview Localization	Blerim Abdullai et.al.	2504.15899	null
2025-04-22	An Extended Horizon Tactical Decision-Making for Automated Driving Based on Monte Carlo Tree Search	Karim Essalmi et.al.	2504.15869	null
2025-04-22	Adaptive PCA-Based Outlier Detection for Multi-Feature Time Series in Space Missions	Jonah Ekelund et.al.	2504.15846	null
2025-04-22	Characterization and ex vivo application of flexible 2D scintillating coatings in ultra-high dose rate electron beams for FLASH radiotherapy	Verdi Vanreusel et.al.	2504.15824	null
2025-04-22	Microstructure and Manipulation: Quantifying Pump-and-Dump Dynamics in Cryptocurrency Markets	Mahya Karbalaii et.al.	2504.15790	null
2025-04-22	Enhancing Tennis Training with Real-Time Swing Data Visualisation in Immersive Virtual Reality	Ryan Najami et.al.	2504.15746	null
2025-04-22	You Sense Only Once Beneath: Ultra-Light Real-Time Underwater Object Detection	Jun Dong et.al.	2504.15694	null
2025-04-22	Comparative Analysis of Evolutionary Algorithms for Energy-Aware Production Scheduling	Sascha C Burmeister et.al.	2504.15672	null
2025-04-22	Symbolic Runtime Verification and Adaptive Decision-Making for Robot-Assisted Dressing	Yasmin Rafiq et.al.	2504.15666	null
2025-04-22	Neural Kinematic Bases for Fluids	Yibo Liu et.al.	2504.15657	null
2025-04-22	A Vision-Enabled Prosthetic Hand for Children with Upper Limb Disabilities	Md Abdul Baset Sarker et.al.	2504.15654	null
2025-04-21	StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians	Cailin Zhuang et.al.	2504.15281	null
2025-04-21	DRAWER: Digital Reconstruction and Articulation With Environment Realism	Hongchi Xia et.al.	2504.15278	null
2025-04-21	Impulsive pattern recognition of a myoelectric hand via Dynamic Time Warping	Mustafa Can Kadilar et.al.	2504.15256	null
2025-04-21	Scalable Discrete Event Simulation Tool for Large-Scale Cyber-Physical Energy Systems: Advancing System Efficiency and Scalability	Khandaker Akramul Haque et.al.	2504.15198	null
2025-04-21	Time-Series Analysis on Edge-AI Hardware for Healthcare Monitoring	Jinhai Hu et.al.	2504.15178	null
2025-04-21	Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture	Meng Cui et.al.	2504.15171	null
2025-04-21	Neural ATTF: A Scalable Solution to Lifelong Multi-Agent Path Planning	Kushal Shah et.al.	2504.15130	null
2025-04-21	A General Infrastructure and Workflow for Quadrotor Deep Reinforcement Learning and Reality Deployment	Kangyao Huang et.al.	2504.15129	null
2025-04-21	Robust and Real-time Surface Normal Estimation from Stereo Disparities using Affine Transformations	Csongor Csanad Kariko et.al.	2504.15121	null
2025-04-21	Muon Imaging of Hydrotreatment Towers	Rafael Armando Martínez-Rivero et.al.	2504.15103	null
2025-04-21	NeuGaze: Reshaping the future BCI	Yiqian Yang et.al.	2504.15101	link
2025-04-21	VistaDepth: Frequency Modulation With Bias Reweighting For Enhanced Long-Range Depth Estimation	Mingxia Zhan et.al.	2504.15095	null
2025-04-21	Reconfiguration and Real-Time Operation of Networked Microgrids Under Load Uncertainty	Hannah Moring et.al.	2504.15084	null
2025-04-21	Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides	Jinghua Zhao et.al.	2504.15066	null
2025-04-21	Beyond Terabit/s Integrated Neuromorphic Photonic Processor for DSP-Free Optical Interconnects	Benshan Wang et.al.	2504.15044	null
2025-04-21	Dual Utilization of Perturbation for Stream Data Publication under Local Differential Privacy	Rong Du et.al.	2504.14993	null
2025-04-21	3D Gaussian Head Avatars with Expressive Dynamic Appearances by Compact Tensorial Representations	Yating Wang et.al.	2504.14967	null
2025-04-21	Dynamic Graph-Like Learning with Contrastive Clustering on Temporally-Factored Ship Motion Data for Imbalanced Sea State Estimation in Autonomous Vessel	Kexin Wang et.al.	2504.14907	null
2025-04-21	Distributed Time-Varying Gaussian Regression via Kalman Filtering	Nicola Taddei et.al.	2504.14900	link
2025-04-21	Physics-Aware Compression of Plasma Distribution Functions with GPU-Accelerated Gaussian Mixture Models	Andong Hu et.al.	2504.14897	null
2025-04-18	ChatNekoHacker: Real-Time Fan Engagement with Conversational Agents	Takuya Sera et.al.	2504.13793	null
2025-04-18	Equi-Euler GraphNet: An Equivariant, Temporal-Dynamics Informed Graph Neural Network for Dual Force and Trajectory Prediction in Multi-Body Systems	Vinay Sharma et.al.	2504.13768	null
2025-04-18	Realizing string breaking dynamics in a $Z_2$ lattice gauge theory on quantum hardware	Constantia Alexandrou et.al.	2504.13760	null
2025-04-18	Intelligent Interaction Strategies for Context-Aware Cognitive Augmentation	Xiangrong et.al.	2504.13684	null
2025-04-18	Lightweight LiDAR-Camera 3D Dynamic Object Detection and Multi-Class Trajectory Prediction	Yushen He et.al.	2504.13647	link
2025-04-18	SupResDiffGAN a new approach for the Super-Resolution task	Dawid Kopeć et.al.	2504.13622	null
2025-04-18	FocusTrack: A Self-Adaptive Local Sampling Algorithm for Efficient Anti-UAV Tracking	Ying Wang et.al.	2504.13604	link
2025-04-18	Memristive chaotic circuit for information processing through time	Manuel Escudero et.al.	2504.13600	null
2025-04-18	RAG Without the Lag: Interactive Debugging for Retrieval-Augmented Generation Pipelines	Quentin Romero Lauro et.al.	2504.13587	null
2025-04-18	Estimating constraints on cosmological parameters via the canonical and the differential redshift drift with SKA HI 21-cm observations	Jiangang Kang et.al.	2504.13583	null
2025-04-18	MAAM: A Lightweight Multi-Agent Aggregation Module for Efficient Image Classification Based on the MindSpore Framework	Zhenkai Qin et.al.	2504.13574	null
2025-04-18	Enhancing Multilingual Sentiment Analysis with Explainability for Sinhala, English, and Code-Mixed Content	Azmarah Rizvi et.al.	2504.13545	null
2025-04-18	Can Local Representation Alignment RNNs Solve Temporal Tasks?	Nikolay Manchev et.al.	2504.13531	null
2025-04-18	Neural Ganglion Sensors: Learning Task-specific Event Cameras Inspired by the Neural Circuit of the Human Retina	Haley M. So et.al.	2504.13457	null
2025-04-18	RT-HDIST: Ray-Tracing Core-based Hausdorff Distance Computation	YoungWoo Kim et.al.	2504.13436	null
2025-04-18	POET: Supporting Prompting Creativity and Personalization with Automated Expansion of Text-to-Image Generation	Evans Xu Han et.al.	2504.13392	null
2025-04-17	Multi-Sensor Fusion-Based Mobile Manipulator Remote Control for Intelligent Smart Home Assistance	Xiao Jin et.al.	2504.13370	null
2025-04-17	AI-Empowered Integrated Sensing and Communications	Mojtaba Vaezi et.al.	2504.13363	null
2025-04-17	Physical Reservoir Computing in Hook-Shaped Rover Wheel Spokes for Real-Time Terrain Identification	Xiao Jin et.al.	2504.13348	null
2025-04-17	Adaptive AI decision interface for autonomous electronic material discovery	Yahao Dai et.al.	2504.13344	null
2025-04-17	Should We Tailor the Talk? Understanding the Impact of Conversational Styles on Preference Elicitation in Conversational Recommender Systems	Ivica Kostric et.al.	2504.13095	link
2025-04-17	EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance	Yang Yue et.al.	2504.13065	link
2025-04-17	Pose and Facial Expression Transfer by using StyleGAN	Petr Jahoda et.al.	2504.13021	null
2025-04-17	GSAC: Leveraging Gaussian Splatting for Photorealistic Avatar Creation with Unity Integration	Rendong Zhang et.al.	2504.12999	link
2025-04-17	New Frontiers in Muon-Spin Spectroscopy Using Si-Pixel Detectors	Heiko Augustin et.al.	2504.12993	null
2025-04-17	Efficient Chebyshev Reconstruction for the Anisotropic Equilibrium Model in Magnetic Particle Imaging	Christine Droigk et.al.	2504.12981	null
2025-04-17	Real-time High-fidelity Gaussian Human Avatars with Position-based Interpolation of Spatially Distributed MLPs	Youyi Zhan et.al.	2504.12909	link
2025-04-17	Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation	Yuyang Li et.al.	2504.12908	link
2025-04-17	Market-Driven Flexibility Provision: A Tri-Level Optimization Approach for Carbon Reduction	Shijie Pan et.al.	2504.12877	null
2025-04-17	AAA-Gaussians: Anti-Aliased and Artifact-Free 3D Gaussian Rendering	Michael Steiner et.al.	2504.12811	null
2025-04-17	Distributed Intelligent Sensing and Communications for 6G: Architecture and Use Cases	Kyriakos Stylianopoulos et.al.	2504.12765	null
2025-04-17	Biasing the Driving Style of an Artificial Race Driver for Online Time-Optimal Maneuver Planning	Sebastiano Taddei et.al.	2504.12744	null
2025-04-17	Chinese-Vicuna: A Chinese Instruction-following Llama-based Model	Chenghao Fan et.al.	2504.12737	null
2025-04-17	Incorporating a Deep Neural Network into Moving Horizon Estimation for Embedded Thermal Torque Derating of an Electric Machine	Alexander Winkler et.al.	2504.12736	null
2025-04-17	Embodied Neuromorphic Control Applied on a 7-DOF Robotic Manipulator	Ziqi Wang et.al.	2504.12702	link
2025-04-17	Predicting Driver's Perceived Risk: a Model Based on Semi-Supervised Learning Strategy	Siwei Huang et.al.	2504.12665	null
2025-04-17	Autonomous Drone for Dynamic Smoke Plume Tracking	Srijan Kumar Pal et.al.	2504.12664	null
2025-04-17	AdaptoVision: A Multi-Resolution Image Recognition Model for Robust and Scalable Classification	Md. Sanaullah Chowdhury Lameya Sabrin et.al.	2504.12652	null
2025-04-17	Observation of the Axion quasiparticle in 2D MnBi $_2$Te$_4$	Jian-Xiang Qiu et.al.	2504.12572	null
2025-04-17	Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions	Yifei Dong et.al.	2504.11967	null
2025-04-17	Real-Time Reconstruction of Ground Motion During Small Magnitude Earthquakes: A Pilot Study	Youngkyu Kim et.al.	2504.11752	null
2025-04-16	Decision-based AI Visual Navigation for Cardiac Ultrasounds	Andy Dimnaku et.al.	2504.12535	null
2025-04-16	SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians	Liam Schoneveld et.al.	2504.12292	null
2025-04-16	An Evaluation of N-Gram Selection Strategies for Regular Expression Indexing in Contemporary Text Analysis Tasks	Ling Zhang et.al.	2504.12251	link
2025-04-16	Data Assimilation for Robust UQ Within Agent-Based Simulation on HPC Systems	Adam Spannaus et.al.	2504.12228	null
2025-04-16	Deep Generative Models for Bayesian Inference on High-Rate Sensor Data: Applications in Automotive Radar and Medical Imaging	Tristan S. W. Stevens et.al.	2504.12154	null
2025-04-16	GripMap: An Efficient, Spatially Resolved Constraint Framework for Offline and Online Trajectory Planning in Autonomous Racing	Frederik Werner et.al.	2504.12115	null
2025-04-16	Self-Supervised Traversability Learning with Online Prototype Adaptation for Off-Road Autonomous Driving	Yafeng Bu et.al.	2504.12109	null
2025-04-16	A Review of YOLOv12: Attention-Based Enhancements vs. Previous Versions	Rahima Khanam et.al.	2504.11995	null
2025-04-16	The Evolution of Zero Trust Architecture (ZTA) from Concept to Implementation	Md Nasiruzzaman et.al.	2504.11984	null
2025-04-16	Flow Intelligence: Robust Feature Matching via Temporal Signature Correlation	Jie Wang et.al.	2504.11949	null
2025-04-16	Mind2Matter: Creating 3D Models from EEG Signals	Xia Deng et.al.	2504.11936	link
2025-04-16	Broadening Participation through Physical Computing: Replicating Sensor-Based Programming Workshops for Rural Students in Sri Lanka	Poornima Meegammana et.al.	2504.11913	null
2025-04-16	Trajectory Dispersion Control for Precision Landing Guidance of Reusable Rockets	Xinglun Chen et.al.	2504.11894	null
2025-04-16	Real-Time Shape Estimation of Tensegrity Structures Using Strut Inclination Angles	Tufail Ahmad Bhat et.al.	2504.11868	null
2025-04-16	Network-Integrated Decoding System for Real-Time Quantum Error Correction with Lattice Surgery	Namitha Liyanage et.al.	2504.11805	null
2025-04-16	TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion	Yiran Wang et.al.	2504.11773	null
2025-04-16	Polarisation-Inclusive Spiking Neural Networks for Real-Time RFI Detection in Modern Radio Telescopes	Nicholas J. Pritchard et.al.	2504.11720	link
2025-04-16	A New Paradigm of User-Centric Wireless Communication Driven by Large Language Models	Kuiyuan Ding et.al.	2504.11696	null
2025-04-16	3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians	Zeming Wei et.al.	2504.11218	link
2025-04-16	Efficient Distributed Retrieval-Augmented Generation for Enhancing Language Model Performance	Shangyu Liu et.al.	2504.11197	null
2025-04-16	A Real-time Anomaly Detection Method for Robots based on a Flexible and Sparse Latent Space	Taewook Kang et.al.	2504.11170	null
2025-04-15	Real-time Object and Event Detection Service through Computer Vision and Edge Computing	Marcos Mendes et.al.	2504.11662	null
2025-04-15	TextArena	Leon Guertler et.al.	2504.11442	link
2025-04-15	Predicting Wave Dynamics using Deep Learning with Multistep Integration Inspired Attention and Physics-Based Loss Decomposition	Indu Kant Deo et.al.	2504.11433	null
2025-04-15	HeatSense: Intelligent Thermal Anomaly Detection for Securing NoC-Enabled MPSoCs	Mahdi Hasanzadeh et.al.	2504.11421	null
2025-04-15	Sensitivity Analysis of State Space Models for Scrap Composition Estimation in EAF and BOF	Yiqing Zhou et.al.	2504.11319	null
2025-04-15	Hybrid Compton-PET Imaging for ion-range verification:A Preclinical Study for Proton-, Helium-, and Carbon-Therapy at HIT	Javier Balibrea-Correa et.al.	2504.11273	null
2025-04-15	Enhanced Small Target Detection via Multi-Modal Fusion and Attention Mechanisms: A YOLOv5 Approach	Xiaoxiao Ma et.al.	2504.11262	null
2025-04-15	Focal Split: Untethered Snapshot Depth from Differential Defocus	Junjie Luo et.al.	2504.11202	null
2025-04-15	QAMA: Quantum annealing multi-head attention operator with classical deep learning framework	Peng Du et.al.	2504.11083	null
2025-04-15	Intraoperative perfusion assessment by continuous, low-latency hyperspectral light-field imaging: development, methodology, and clinical application	Stefan Kray et.al.	2504.10953	null
2025-04-15	A Signal Matrix-Based Local Flaw Detection Framework for Steel Wire Ropes Using Convolutional Neural Networks	Siyu You et.al.	2504.10952	null
2025-04-15	Design and Verification of a Synchronus First In First Out (FIFO)	Yatheeswar Penta et.al.	2504.10901	null
2025-04-15	ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic Grasping	Shun Iwase et.al.	2504.10857	null
2025-04-15	Real-Time Word-Level Temporal Segmentation in Streaming Speech Recognition	Naoto Nishida et.al.	2504.10849	null
2025-04-15	LightFormer: A lightweight and efficient decoder for remote sensing image segmentation	Sihang Chen et.al.	2504.10834	null
2025-04-15	Hallucination-Aware Generative Pretrained Transformer for Cooperative Aerial Mobility Control	Hyojun Ahn et.al.	2504.10831	null
2025-04-15	SonicSieve: Bringing Directional Speech Extraction to Smartphones Using Acoustic Microstructures	Kuang Yuan et.al.	2504.10793	null
2025-04-15	ATLASv2: LLM-Guided Adaptive Landmark Acquisition and Navigation on the Edge	Mikolaj Walczak et.al.	2504.10784	null
2025-04-15	Diversity-Fair Online Selection	Ming Hu et.al.	2504.10389	null
2025-04-15	LL-Gaussian: Low-Light Scene Reconstruction and Enhancement via Gaussian Splatting for Novel View Synthesis	Hao Sun et.al.	2504.10331	null
2025-04-15	WildLive: Near Real-time Visual Wildlife Tracking onboard UAVs	Nguyen Ngoc Dat et.al.	2504.10165	null
2025-04-15	TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models	Jaewoo Lee et.al.	2504.09897	link
2025-04-14	DNF-Avatar: Distilling Neural Fields for Real-time Animatable Avatar Relighting	Zeren Jiang et.al.	2504.10486	link
2025-04-14	HybridCollab: Unifying In-Person and Remote Collaboration for Cardiovascular Surgical Planning in Mobile Augmented Reality	Pratham Darrpan Mehta et.al.	2504.10440	null
2025-04-14	Towards Low-Latency Event-based Obstacle Avoidance on a FPGA-Drone	Pietro Bonazzi et.al.	2504.10400	link
2025-04-14	Patch and Shuffle: A Preprocessing Technique for Texture Classification in Autonomous Cementitious Fabrication	Jeremiah Giordani et.al.	2504.10353	null
2025-04-14	SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model	Zongcan Ding et.al.	2504.10320	null
2025-04-14	CAT: A Conditional Adaptation Tailor for Efficient and Effective Instance-Specific Pansharpening on Real-World Data	Tianyu Xin et.al.	2504.10242	null
2025-04-14	ROSFD: Robust Online Streaming Fraud Detection with Resilience to Concept Drift in Data Streams	Vivek Yelleti et.al.	2504.10229	null
2025-04-14	Unleashing Expert Opinion from Social Media for Stock Prediction	Wanyun Zhou et.al.	2504.10078	link
2025-04-14	DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction	Kiana Hoshanfar et.al.	2504.10070	null
2025-04-14	Time for Timed Monitorability	Thomas M. Grosen et.al.	2504.10008	null
2025-04-14	VR MRI Training for Adolescents: A Comparative Study of Gamified VR, Passive VR, 360 Video, and Traditional Educational Video	Yue Yang et.al.	2504.09955	null
2025-04-14	Efficient Task-specific Conditional Diffusion Policies: Shortcut Model Acceleration and SO(3) Optimization	Haiyong Yu et.al.	2504.09927	null
2025-04-14	Fusing Bluetooth with Pedestrian Dead Reckoning: A Floor Plan-Assisted Positioning Approach	Wenxuan Pan et.al.	2504.09905	null
2025-04-14	LiteTracker: Leveraging Temporal Causality for Accurate Low-latency Tissue Tracking	Mert Asim Karaoglu et.al.	2504.09904	null
2025-04-14	MCBlock: Boosting Neural Radiance Field Training Speed by MCTS-based Dynamic-Resolution Ray Sampling	Yunpeng Tan et.al.	2504.09878	null
2025-04-14	CKMImageNet: A Dataset for AI-Based Channel Knowledge Map Towards Environment-Aware Communication and Sensing	Zijian Wu et.al.	2504.09849	null
2025-04-14	RINGO: Real-time Navigation with a Guiding Trajectory for Aerial Manipulators in Unknown Environments	Zhaopeng Zhang et.al.	2504.08338	null
2025-04-11	TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning	Hang Ni et.al.	2504.08694	null
2025-04-11	Safe Flow Matching: Robot Motion Planning with Control Barrier Functions	Xiaobing Dai et.al.	2504.08661	null
2025-04-11	TinyCenterSpeed: Efficient Center-Based Object Detection for Autonomous Racing	Neil Reichlin et.al.	2504.08655	link
2025-04-11	Enhancing Neutrino Reconstruction in Water-Cherenkov Air Shower Arrays Using Multi-Photosensors	J. Alvarez-Muñiz et.al.	2504.08652	null
2025-04-11	TorchFX: A modern approach to Audio DSP with PyTorch and GPU acceleration	Matteo Spanio et.al.	2504.08624	link
2025-04-11	Enterprise-Grade Security for the Model Context Protocol (MCP): Frameworks and Mitigation Strategies	Vineeth Sai Narajala et.al.	2504.08623	null
2025-04-11	FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment	Sebastián Barbas Laina et.al.	2504.08603	null
2025-04-11	POD-Based Sparse Stochastic Estimation of Wind Turbine Blade Vibrations	Lorenzo Schena et.al.	2504.08505	null
2025-04-11	AI-Driven Smart Sportswear for Real-Time Fitness Monitoring Using Textile Strain Sensors	Chenyu Tang et.al.	2504.08500	null
2025-04-11	A Comparative Study of Recommender Systems under Big Data Constraints	Arimondo Scrivano et.al.	2504.08457	null
2025-04-11	Muon-Accelerated Attention Distillation for Real-Time Edge Synthesis via Optimized Latent Diffusion	Weiye Chen et.al.	2504.08451	link
2025-04-11	The Composite Visual-Laser Navigation Method Applied in Indoor Poultry Farming Environments	Jiafan Lu et.al.	2504.08431	null
2025-04-11	Light-YOLOv8-Flame: A Lightweight High-Performance Flame Detection Algorithm	Jiawei Lan et.al.	2504.08389	null
2025-04-11	MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft	Junliang Guo et.al.	2504.08388	null
2025-04-11	PCA-RAG: Principal Component Analysis for Efficient Retrieval-Augmented Generation	Arman Khaledian et.al.	2504.08386	null
2025-04-11	DRIP: DRop unImportant data Points -- Enhancing Machine Learning Efficiency with Grad-CAM-Based Real-Time Data Prioritization for On-Device Training	Marcus Rüb et.al.	2504.08364	null
2025-04-11	Trabant: A Serverless Architecture for Multi-Tenant Orbital Edge Computing	Tobias Pfandzelter et.al.	2504.08337	link
2025-04-11	Towards a Digital Twin of Noisy Quantum Computers: Calibration-Driven Emulation of Transmon Qubits	Ronny Müller et.al.	2504.08313	null
2025-04-11	Gigabit-rate Quantum Key Distribution on Integrated Photonic Chips	Si Qi Ng et.al.	2504.08298	null
2025-04-10	A Review of HPC-Accelerated CFD in National Security and Defense	James Afful et.al.	2504.07837	null
2025-04-10	A Hybrid Semantic RAN Protocol Stack Design for 6G System and Its Implementation	Luhan wang et.al.	2504.07829	null
2025-04-10	MMLA: Multi-Environment, Multi-Species, Low-Altitude Aerial Footage Dataset	Jenna Kline et.al.	2504.07744	null
2025-04-10	A Novel Deep Learning Approach for Emulating Computationally Expensive Postfire Debris Flows	Palak Patel et.al.	2504.07736	null
2025-04-10	Finite-temperature real-time properties of magnetic polarons in two-dimensional quantum antiferromagnets	Toni Guthardt et.al.	2504.07715	null
2025-04-10	Heart Failure Prediction using Modal Decomposition and Masked Autoencoders for Scarce Echocardiography Databases	Andrés Bell-Navas et.al.	2504.07606	link
2025-04-10	Tuning chirality amplitude at ultrafast timescales	Hiroki Ueda et.al.	2504.07599	null
2025-04-10	MUFFLER: Secure Tor Traffic Obfuscation with Dynamic Connection Shuffling and Splitting	Minjae Seo et.al.	2504.07543	null
2025-04-10	Intelligent DoS and DDoS Detection: A Hybrid GRU-NTM Approach to Network Security	Caroline Panggabean et.al.	2504.07478	null
2025-04-10	Nonlinear Optimal Guidance for Intercepting Moving Targets	Han Wang et.al.	2504.07430	null
2025-04-10	ThermoStereoRT: Thermal Stereo Matching in Real Time via Knowledge Distillation and Attention-based Refinement	Anning Hu et.al.	2504.07418	null
2025-04-10	WK-Pnet: FM-Based Positioning via Wavelet Packet Decomposition and Knowledge Distillation	Shilian Zheng et.al.	2504.07399	null
2025-04-10	MicroNAS: An Automated Framework for Developing a Fall Detection System	Seyed Mojtaba Mohasel et.al.	2504.07397	null
2025-04-09	CiMBA: Accelerating Genome Sequencing through On-Device Basecalling via Compute-in-Memory	William Andrew Simon et.al.	2504.07298	null
2025-04-09	Data-Enabled Neighboring Extremal: Case Study on Model-Free Trajectory Tracking for Robotic Arm	Amin Vahidi-Moghaddam et.al.	2504.07292	null
2025-04-09	Enabling Continuous 5G Connectivity in Aircraft through Low Earth Orbit Satellites	Raúl Parada et.al.	2504.07262	null
2025-04-09	Visual-Aware Speech Recognition for Noisy Scenarios	Lakshmipathi Balaji et.al.	2504.07229	null
2025-04-09	Discovery of extreme Quasi-Periodic Eruptions in a newly accreting massive black hole	Lorena Hernández-García et.al.	2504.07169	null
2025-04-09	OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens	Jiacheng Liu et.al.	2504.07096	null
2025-04-09	FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution	Gene Chou et.al.	2504.07093	link
2025-04-09	Cerebral blood flow monitoring using a deep learning implementation of the two-layer DCS analytical model with a 512 512 SPAD array	Mingliang Pan et.al.	2504.06997	null
2025-04-09	Audio-visual Event Localization on Portrait Mode Short Videos	Wuyang Liu et.al.	2504.06884	null
2025-04-09	Determining Fetal Orientations From Blind Sweep Ultrasound Video	Jakub Maciej Wiśniewski et.al.	2504.06836	null
2025-04-09	Integrated Sensing and Communications Over the Years: An Evolution Perspective	Di Zhang et.al.	2504.06830	null
2025-04-09	SVG-IR: Spatially-Varying Gaussian Splatting for Inverse Rendering	Hanxiao Sun et.al.	2504.06815	link
2025-04-09	Modeling and analysis methods for early detection of leakage points in gas transmission systems	Ilgar Aliyev et.al.	2504.06809	null
2025-04-09	How do Copilot Suggestions Impact Developers' Frustration and Productivity?	Emanuela Guglielmi et.al.	2504.06808	null
2025-04-09	Controllable Automatic Foley Artist	Roi Benita et.al.	2504.06778	link
2025-04-09	Bridging Research and Standardization: Innovations and Methodology for 6G Standard Contributions	Francesca Conserva et.al.	2504.06682	null
2025-04-09	Dynamic Residual Safe Reinforcement Learning for Multi-Agent Safety-Critical Scenarios Decision-Making	Kaifeng Wang et.al.	2504.06670	null
2025-04-09	Robust and Noise-resilient Long-Term Prediction of Spatiotemporal Data Using Variational Mode Graph Neural Networks with 3D Attention	Osama Ahmad et.al.	2504.06660	null
2025-04-09	InstantSticker: Realistic Decal Blending via Disentangled Object Reconstruction	Yi Zhang et.al.	2504.06620	null
2025-04-09	InteractRank: Personalized Web-Scale Search Pre-Ranking with Cross Interaction Features	Sujay Khandagale et.al.	2504.06609	link
2025-04-09	Overcoming Dynamic Environments: A Hybrid Approach to Motion Planning for Manipulators	Ho Minh Quang Ngo et.al.	2504.06596	null
2025-04-09	NAPER: Fault Protection for Real-Time Resource-Constrained Deep Neural Networks	Rian Adam Rajagede et.al.	2504.06591	null
2025-04-09	A Streamable Neural Audio Codec with Residual Scalar-Vector Quantization for Real-Time Communication	Xiao-Hang Jiang et.al.	2504.06561	link
2025-04-09	ICPS: Real-Time Resource Configuration for Cloud Serverless Functions Considering Affinity	Long Chen et.al.	2504.06512	null
2025-04-09	Equivalent Circuit Modeling of a Lumped-element Loaded Metasurface under Arbitrary Incidence and Polarization	Athanasios Nousiou et.al.	2504.06501	null
2025-04-08	A Case for Network-wide Orchestration of Host-based Intrusion Detection and Response	Mark Timmons et.al.	2504.06241	null
2025-04-08	Accessible and Pedagogically-Grounded Explainability for Human-Robot Interaction: A Framework Based on UDL and Symbolic Interfaces	Francisco J. Rodríguez Lera et.al.	2504.06189	link
2025-04-08	Efficient algorithms to solve atom reconfiguration problems. III. The bird and batching algorithms and other parallel implementations on GPUs	Fouad Afiouni et.al.	2504.06182	null
2025-04-08	Real-Time Pitch/F0 Detection Using Spectrogram Images and Convolutional Neural Networks	Xufang Zhao et.al.	2504.06165	null
2025-04-08	Accelerating Vehicle Routing via AI-Initialized Genetic Algorithms	Ido Greenberg et.al.	2504.06126	null
2025-04-08	Safe Interaction via Monte Carlo Linear-Quadratic Games	Benjamin A. Christie et.al.	2504.06124	link
2025-04-08	A Robust Real-Time Lane Detection Method with Fog-Enhanced Feature Fusion for Foggy Conditions	Ronghui Zhang et.al.	2504.06121	null
2025-04-08	Real-Time LaCAM	Runzhe Liang et.al.	2504.06091	null
2025-04-08	$L_\textrm{dT}$ : An ionospheric activity index based on distributions in GNSS-derived TEC rates of change	Paul Kinsler et.al.	2504.06056	null
2025-04-08	Modular Soft Wearable Glove for Real-Time Gesture Recognition and Dynamic 3D Shape Reconstruction	Huazhi Dong et.al.	2504.05983	null
2025-04-08	An Empirical Study of GPT-4o Image Generation Capabilities	Sixiang Chen et.al.	2504.05979	link
2025-04-08	Context-aware Rate Adaptation for Predictive Flying Networks using Contextual Bandits	Ruben Queiros et.al.	2504.05964	null
2025-04-08	Hybrid Control as a Proxy for Detection and Mitigation of Sensor Attacks in Cooperative Driving	Mischa Huisman et.al.	2504.05958	link
2025-04-08	InstructMPC: A Human-LLM-in-the-Loop Framework for Context-Aware Control	Ruixiang Wu et.al.	2504.05946	null
2025-04-08	Réduire le bruit grâce à la réalité augmentée sonore -- Auditory Concealer	Clara Boukhemia et.al.	2504.05847	null
2025-04-08	Negotiating Strict Latency Limits for Dynamic Real-Time Services in Vehicular Time-Sensitive Networks	Timo Häckel et.al.	2504.05793	null
2025-04-08	Residual U-Net for accurate and efficient prediction of hemodynamics in two-dimensional asymmetric stenosis	Xintong Zou et.al.	2504.05778	null
2025-04-08	A Lightweight Multi-Module Fusion Approach for Korean Character Recognition	Inho Jake Park et.al.	2504.05770	null
2025-04-08	Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation	Zhihua Xu et.al.	2504.05746	null
2025-04-08	Micro-splatting: Maximizing Isotropic Constraints for Refined Optimization in 3D Gaussian Splatting	Jee Won Lee et.al.	2504.05740	null
2025-04-08	REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning	Jihyun Lee et.al.	2504.04956	null
2025-04-07	Using Physiological Measures, Gaze, and Facial Expressions to Model Human Trust in a Robot Partner	Haley N. Green et.al.	2504.05291	null
2025-04-07	RobustDexGrasp: Robust Dexterous Grasping of General Objects from Single-view Perception	Hui Zhang et.al.	2504.05287	null
2025-04-07	A Telecentric Offset Reflective Imaging System (TORIS) for Terahertz Imaging and Spectroscopy	Pouyan Rezapoor et.al.	2504.05267	null
2025-04-07	From Sparse Signal to Smooth Motion: Real-Time Motion Generation with Rolling Prediction Models	German Barquero et.al.	2504.05265	null
2025-04-07	Vision-Language Model Predictive Control for Manipulation Planning and Trajectory Generation	Jiaming Chen et.al.	2504.05225	link
2025-04-07	LLM-Alignment Live-Streaming Recommendation	Yueyang Liu et.al.	2504.05217	null
2025-04-07	Post-Training Language Models for Continual Relation Extraction	Sefika Efeoglu et.al.	2504.05214	null
2025-04-07	Stereo-LiDAR Fusion by Semi-Global Matching With Discrete Disparity-Matching Cost and Semidensification	Yasuhiro Yao et.al.	2504.05148	link
2025-04-07	Decentralized Semantic Federated Learning for Real-Time Public Safety Tasks: Challenges, Methods, and Directions	Baosheng Li et.al.	2504.05107	null
2025-04-07	SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation	Stephen Brade et.al.	2504.05106	null
2025-04-07	Speech-to-Trajectory: Learning Human-Like Verbal Guidance for Robot Motion	Eran Beeri Bamani et.al.	2504.05084	null
2025-04-07	AI-Driven Tactical Communications and Networking for Defense: A Survey and Emerging Trends	Victor Monzon Baeza et.al.	2504.05071	null
2025-04-07	SILVIA: Ultra-precision formation flying demonstration for space-based interferometry	Takahiro Ito et.al.	2504.05001	null
2025-04-07	Transforming Future Data Center Operations and Management via Physical AI	Zhiwei Cao et.al.	2504.04982	null
2025-04-07	Boosting Relational Deep Learning with Pretrained Tabular Models	Veronica Lachi et.al.	2504.04934	link
2025-04-07	Real-time tuneable bright bonding plasmonic modes in Ga nanostructures	Renu Raman Sahu et.al.	2504.04922	null
2025-04-07	Parallelization is All System Identification Needs: End-to-end Vibration Diagnostics on a multi-core RISC-V edge device	Amirhossein Kiamarzi et.al.	2504.04884	null
2025-04-07	Closed-Loop Neural Operator-Based Observer of Traffic Density	Alice Harting et.al.	2504.04873	null
2025-04-07	Embracing Dynamics: Dynamics-aware 4D Gaussian Splatting SLAM	Zhicong Sun et.al.	2504.04844	link
2025-04-04	CAMINO: Cloud-native Autonomous Management and Intent-based Orchestrator	Konstantinos Antonakoglou et.al.	2504.03586	null
2025-04-04	The building blocks of software work explain coding careers and language popularity	Xiangnan Feng et.al.	2504.03581	null
2025-04-04	Online Traffic Density Estimation using Physics-Informed Neural Networks	Dennis Wilkman et.al.	2504.03483	null
2025-04-04	DML-RAM: Deep Multimodal Learning Framework for Robotic Arm Manipulation using Pre-trained Models	Sathish Kumar et.al.	2504.03423	null
2025-04-04	NeRFlex: Resource-aware Real-time High-quality Rendering of Complex Scenes on Mobile Devices	Zhe Wang et.al.	2504.03415	null
2025-04-04	An Efficient GPU-based Implementation for Noise Robust Sound Source Localization	Zirui Lin et.al.	2504.03373	null
2025-04-04	Stance-Driven Multimodal Controlled Statement Generation: New Dataset and Task	Bingqian Wang et.al.	2504.03295	null
2025-04-04	Mitigating the Impact of Electrode Shift on Classification Performance in Electromyography-Based Motion Prediction Using Sliding-Window Normalization	Taichi Tanaka et.al.	2504.03196	null
2025-04-04	Real-Time Roadway Obstacle Detection for Electric Scooters Using Deep Learning and Multi-Sensor Fusion	Zeyang Zheng et.al.	2504.03171	link
2025-04-04	Water Mapping and Change Detection Using Time Series Derived from the Continuous Monitoring of Land Disturbance Algorithm	Huong Pham et.al.	2504.03170	null
2025-04-04	Performance-Aware Control of Modular Batteries For Fast Frequency Response	Yutong He et.al.	2504.03150	null
2025-04-04	A Human Digital Twin Architecture for Knowledge-based Interactions and Context-Aware Conversations	Abdul Mannan Mohammed et.al.	2504.03147	null
2025-04-04	Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation	Fa-Ting Hong et.al.	2504.02542	link
2025-04-03	Compressing 3D Gaussian Splatting by Noise-Substituted Vector Quantization	Haishan Wang et.al.	2504.03059	link
2025-04-03	Cooperative Inference for Real-Time 3D Human Pose Estimation in Multi-Device Edge Networks	Hyun-Ho Choi et.al.	2504.03052	link
2025-04-03	Emotion Recognition Using Convolutional Neural Networks	Shaoyuan Xu et.al.	2504.03010	null
2025-04-03	Generating Diverse Audio-Visual 360 Soundscapes for Sound Event Localization and Detection	Adrian S. Roman et.al.	2504.02988	link
2025-04-03	Level Up Peer Review in Education: Investigating genAI-driven Gamification system and its influence on Peer Feedback Effectiveness	Rafal Wlodarski et.al.	2504.02962	null
2025-04-03	LiDAR-based Object Detection with Real-time Voice Specifications	Anurag Kulkarni et.al.	2504.02920	link
2025-04-03	Bubbles in a box: Eliminating edge nucleation in cold-atom simulators of vacuum decay	Alexander C. Jenkins et.al.	2504.02829	null
2025-04-03	Dynamic Directional Routing of Freight in the Physical Internet	Sahrish Jaleel Shaikh et.al.	2504.02722	null
2025-04-03	UAV-Assisted 5G Networks: Mobility-Aware 3D Trajectory Optimization and Resource Allocation for Dynamic Environments	Asad Mahmood et.al.	2504.02613	null
2025-04-03	Human-Centered Development of an Explainable AI Framework for Real-Time Surgical Risk Surveillance	Andrea E Davidson et.al.	2504.02551	null
2025-04-03	Online Multivariate Regularized Distributional Regression for High-dimensional Probabilistic Electricity Price Forecasting	Simon Hirsch et.al.	2504.02518	link
2025-04-03	Industrial Internet Robot Collaboration System and Edge Computing Optimization	Qian Zuo et.al.	2504.02492	null
2025-04-03	Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision	Xiaofeng Han et.al.	2504.02477	null
2025-04-03	MonoGS++: Fast and Accurate Monocular RGB Gaussian SLAM	Renwu Li et.al.	2504.02437	null
2025-04-03	OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication	Zhongjian Wang et.al.	2504.02433	null
2025-04-03	**Life

Name		Name	Last commit message	Last commit date
Latest commit History 2,463 Commits
.github		.github
assets		assets
docs		docs
pdf_analysis		pdf_analysis
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
analysis_by_ai.md		analysis_by_ai.md
blacklists.txt		blacklists.txt
config.yaml		config.yaml
daily_arxiv.py		daily_arxiv.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HumanAIGC Research Papers

Updated on 2025.06.26

Talking Face

Image Animation

Video Generation

TryOn

Visual Edit

Others

About

Uh oh!

Releases

Packages

Languages

License

astalavistababe/HumanAIGC-arxiv-daily-suruoxi

Folders and files

Latest commit

History

Repository files navigation

HumanAIGC Research Papers

Updated on 2025.06.26

Talking Face

Image Animation

Video Generation

TryOn

Visual Edit

Others

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages