Table of Contents
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-07-23 | RemixFusion: Residual-based Mixed Representation for Large-scale Online RGB-D Reconstruction | Yuqing Lan et.al. | 2507.17594v1 | null |
2025-07-23 | Physics-based Human Pose Estimation from a Single Moving RGB Camera | Ayce Idil Aytekin et.al. | 2507.17406v1 | null |
2025-07-21 | Toward a Real-Time Framework for Accurate Monocular 3D Human Pose Estimation with Geometric Priors | Mohamed Adjel et.al. | 2507.16850v1 | null |
2025-07-22 | Adaptive Relative Pose Estimation Framework with Dual Noise Tuning for Safe Approaching Maneuvers | Batu Candan et.al. | 2507.16214v1 | null |
2025-07-21 | TONUS: Neuromorphic human pose estimation for artistic sound co-creation | Jules Lecomte et.al. | 2507.15734v1 | null |
2025-07-21 | Hi^2-GSLoc: Dual-Hierarchical Gaussian-Specific Visual Relocalization for Remote Sensing | Boni Hu et.al. | 2507.15683v1 | null |
2025-07-21 | Dense-depth map guided deep Lidar-Visual Odometry with Sparse Point Clouds and Images | JunYing Huang et.al. | 2507.15496v1 | null |
2025-07-20 | 3-Dimensional CryoEM Pose Estimation and Shift Correction Pipeline | Kaishva Chintan Shah et.al. | 2507.14924v1 | null |
2025-07-20 | An Evaluation of DUSt3R/MASt3R/VGGT 3D Reconstruction on Photogrammetric Aerial Blocks | Xinyi Wu et.al. | 2507.14798v1 | null |
2025-07-22 | AI-Enhanced Precision in Sport Taekwondo: Increasing Fairness, Speed, and Trust in Competition (FST.ai) | Keivan Shariatmadar et.al. | 2507.14657v2 | null |
2025-07-18 | C-DOG: Training-Free Multi-View Multi-Object Association in Dense Scenes Without Visual Feature via Connected δ-Overlap Graphs | Yung-Hong Sun et.al. | 2507.14095v1 | null |
2025-07-21 | PCR-GS: COLMAP-Free 3D Gaussian Splatting via Pose Co-Regularizations | Yu Wei et.al. | 2507.13891v2 | null |
2025-07-18 | MaskHOI: Robust 3D Hand-Object Interaction Estimation via Masked Pre-training | Yuechen Xie et.al. | 2507.13673v1 | null |
2025-07-17 | Yifan Wang et.al. | 2507.13347v1 | null | |
2025-07-17 | Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark | Junsu Kim et.al. | 2507.13314v1 | null |
2025-07-17 | DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model | Maulana Bisyir Azhari et.al. | 2507.13145v1 | null |
2025-07-17 | AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability | Tomohiro Suzuki et.al. | 2507.12905v1 | null |
2025-07-17 | From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation | Mengxi Liu et.al. | 2507.12884v1 | null |
2025-07-19 | SpatialTrackerV2: 3D Point Tracking Made Easy | Yuxi Xiao et.al. | 2507.12462v2 | null |
2025-07-16 | Spontaneous Spatial Cognition Emerges during Egocentric Video Viewing through Non-invasive BCI | Weichen Dai et.al. | 2507.12417v1 | null |
2025-07-16 | Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation | Antonio Finocchiaro et.al. | 2507.12292v1 | null |
2025-07-16 | UniLGL: Learning Uniform Place Recognition for FOV-limited/Panoramic LiDAR Global Localization | Hongming Shen et.al. | 2507.12194v1 | null |
2025-07-16 | BRUM: Robust 3D Vehicle Reconstruction from 360 Sparse Images | Davide Di Nucci et.al. | 2507.12095v1 | null |
2025-07-16 | SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation | Beining Xu et.al. | 2507.12027v1 | null |
2025-07-16 | SEPose: A Synthetic Event-based Human Pose Estimation Dataset for Pedestrian Monitoring | Kaustav Chanda et.al. | 2507.11910v1 | null |
2025-07-15 | GKNet: Graph-based Keypoints Network for Monocular Pose Estimation of Non-cooperative Spacecraft | Weizhao Ma et.al. | 2507.11077v1 | null |
2025-07-15 | Joint angle model based learning to refine kinematic human pose estimation | Chang Peng et.al. | 2507.11075v1 | null |
2025-07-14 | Raci-Net: Ego-vehicle Odometry Estimation in Adverse Weather Conditions | Mohammadhossein Talebi et.al. | 2507.10376v1 | null |
2025-07-14 | Kaleidoscopic Background Attack: Disrupting Pose Estimation with Multi-Fold Radial Symmetry Textures | Xinlong Ding et.al. | 2507.10265v1 | null |
2025-07-14 | ProGait: A Multi-Purpose Video Dataset and Benchmark for Transfemoral Prosthesis Users | Xiangyu Yin et.al. | 2507.10223v1 | null |
2025-07-13 | VST-Pose: A Velocity-Integrated Spatiotem-poral Attention Network for Human WiFi Pose Estimation | Xinyu Zhang et.al. | 2507.09672v1 | null |
2025-07-13 | EHPE: A Segmented Architecture for Enhanced Hand Pose Estimation | Bolun Zheng et.al. | 2507.09560v1 | null |
2025-07-13 | Self-supervised pretraining of vision transformers for animal behavioral analysis and neural encoding | Yanchen Wang et.al. | 2507.09513v1 | null |
2025-07-12 | PoseLLM: Enhancing Language-Guided Human Pose Estimation with MLP Alignment | Dewen Zhang et.al. | 2507.09139v1 | null |
2025-07-10 | RegGS: Unposed Sparse Views Gaussian Splatting with 3DGS Registration | Chong Cheng et.al. | 2507.08136v1 | null |
2025-07-10 | SCREP: Scene Coordinate Regression and Evidential Learning-based Perception-Aware Trajectory Generation | Juyeop Han et.al. | 2507.07467v1 | null |
2025-07-09 | g2o vs. Ceres: Optimizing Scan Matching in Cartographer SLAM | Quanjie Qiu et.al. | 2507.07142v1 | null |
2025-07-09 | Smartphone Exergames with Real-Time Markerless Motion Capture: Challenges and Trade-offs | Mathieu Phosanarack et.al. | 2507.06669v1 | null |
2025-07-09 | MK-Pose: Category-Level Object Pose Estimation via Multimodal-Based Keypoint Learning | Yifan Yang et.al. | 2507.06662v1 | null |
2025-07-09 | Failure Forecasting Boosts Robustness of Sim2Real Rhythmic Insertion Policies | Yuhan Liu et.al. | 2507.06519v1 | null |
2025-07-09 | Mask6D: Masked Pose Priors For 6D Object Pose Estimation | Yuechen Xie et.al. | 2507.06486v1 | null |
2025-07-08 | SenseShift6D: Multimodal RGB-D Benchmarking for Robust 6D Pose Estimation across Environment and Sensor Variations | Yegyu Han et.al. | 2507.05751v1 | null |
2025-07-08 | Event-RGB Fusion for Spacecraft Pose Estimation Under Harsh Lighting | Mohsi Jawaid et.al. | 2507.05698v1 | null |
2025-07-07 | W2W: A Simulated Exploration of IMU Placement Across the Human Body for Designing Smarter Wearable | Lala Shakti Swarup Ray et.al. | 2507.05532v1 | null |
2025-07-07 | UDF-GMA: Uncertainty Disentanglement and Fusion for General Movement Assessment | Zeqi Luo et.al. | 2507.04814v1 | null |
2025-07-06 | Thousand-Brains Systems: Sensorimotor Intelligence for Rapid, Robust Learning and Inference | Niels Leadholm et.al. | 2507.04494v1 | null |
2025-07-09 | Gaussian-LIC2: LiDAR-Inertial-Camera Gaussian Splatting SLAM | Xiaolei Lang et.al. | 2507.04004v2 | null |
2025-07-05 | Accurate Pose Estimation Using Contact Manifold Sampling for Safe Peg-in-Hole Insertion of Complex Geometries | Abhay Negi et.al. | 2507.03925v1 | null |
2025-07-02 | Markerless Stride Length estimation in Athletic using Pose Estimation with monocular vision | Patryk Skorupski et.al. | 2507.03016v1 | null |
2025-07-03 | Reconstructing Close Human Interaction with Appearance and Proxemics Reasoning | Buzhen Huang et.al. | 2507.02565v1 | null |
2025-07-03 | IMASHRIMP: Automatic White Shrimp (Penaeus vannamei) Biometrical Analysis from Laboratory Images Using Computer Vision and Deep Learning | Abiam Remache González et.al. | 2507.02519v1 | null |
2025-07-03 | 3D Heart Reconstruction from Sparse Pose-agnostic 2D Echocardiographic Slices | Zhurong Chen et.al. | 2507.02411v1 | null |
2025-07-03 | LMPNet for Weakly-supervised Keypoint Discovery | Pei Guo et.al. | 2507.02308v1 | null |
2025-07-02 | What does really matter in image goal navigation? | Gianluca Monaci et.al. | 2507.01667v1 | null |
2025-07-01 | 2024 NASA SUITS Report: LLM-Driven Immersive Augmented Reality User Interface for Robotics and Space Exploration | Kathy Zhuang et.al. | 2507.01206v1 | null |
2025-07-04 | Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations | Shivansh Patel et.al. | 2507.00990v2 | null |
2025-07-01 | Multi-Modal Graph Convolutional Network with Sinusoidal Encoding for Robust Human Action Segmentation | Hao Xing et.al. | 2507.00752v1 | null |
2025-07-01 | LoD-Loc v2: Aerial Visual Localization over Low Level-of-Detail City Models using Explicit Silhouette Alignment | Juelin Zhu et.al. | 2507.00659v1 | null |
2025-06-30 | Computer Vision for Objects used in Group Work: Challenges and Opportunities | Changsoo Jung et.al. | 2507.00224v1 | null |
2025-06-30 | Validation of AI-Based 3D Human Pose Estimation in a Cyber-Physical Environment | Lisa Marie Otto et.al. | 2506.23739v1 | null |
2025-06-30 | MGPRL: Distributed Multi-Gaussian Processes for Wi-Fi-based Multi-Robot Relative Localization in Large Indoor Environments | Sai Krishna Ghanta et.al. | 2506.23514v1 | null |
2025-06-29 | TVG-SLAM: Robust Gaussian Splatting SLAM with Tri-view Geometric Constraints | Zhen Tan et.al. | 2506.23207v1 | null |
2025-06-28 | Deterministic Object Pose Confidence Region Estimation | Jinghao Wang et.al. | 2506.22720v1 | null |
2025-06-27 | Evaluating Pointing Gestures for Target Selection in Human-Robot Collaboration | Noora Sassali et.al. | 2506.22116v1 | null |
2025-06-27 | Single-Scanline Relative Pose Estimation for Rolling Shutter Cameras | Petr Hruby et.al. | 2506.22069v1 | null |
2025-06-24 | ICP-3DGS: SfM-free 3D Gaussian Splatting for Large-scale Unbounded Scenes | Chenhao Zhang et.al. | 2506.21629v1 | null |
2025-06-26 | EndoFlow-SLAM: Real-Time Endoscopic SLAM with Flow-Constrained Gaussian Splatting | Taoyu Wu et.al. | 2506.21420v1 | null |
2025-06-26 | CURL-SLAM: Continuous and Compact LiDAR Mapping | Kaicheng Zhang et.al. | 2506.21077v1 | null |
2025-06-27 | DidSee: Diffusion-Based Depth Completion for Material-Agnostic Robotic Perception and Manipulation | Wenzhou Lyu et.al. | 2506.21034v2 | null |
2025-06-25 | How do Foundation Models Compare to Skeleton-Based Approaches for Gesture Recognition in Human-Robot Interaction? | Stephanie Käs et.al. | 2506.20795v1 | null |
2025-06-26 | Consensus-Driven Uncertainty for Robotic Grasping based on RGB Perception | Eric C. Joyce et.al. | 2506.20045v2 | null |
2025-06-24 | Systematic Comparison of Projection Methods for Monocular 3D Human Pose Estimation on Fisheye Images | Stephanie Käs et.al. | 2506.19747v1 | null |
2025-06-23 | RAG-6DPose: Retrieval-Augmented 6D Pose Estimation via Leveraging CAD as Knowledge Base | Kuanning Wang et.al. | 2506.18856v1 | null |
2025-06-19 | Reproducible Evaluation of Camera Auto-Exposure Methods in the Field: Platform, Benchmark and Lessons Learned | Olivier Gamache et.al. | 2506.18844v1 | null |
2025-06-23 | SViP: Sequencing Bimanual Visuomotor Policies with Object-Centric Motion Primitives | Yizhou Chen et.al. | 2506.18825v1 | null |
2025-06-20 | RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and Tracking | Teng Guo et.al. | 2506.17119v1 | link |
2025-06-20 | Monocular One-Shot Metric-Depth Alignment for RGB-Based Robot Grasping | Teng Guo et.al. | 2506.17110v1 | null |
2025-06-20 | LunarLoc: Segment-Based Global Localization on the Moon | Annika Thomas et.al. | 2506.16940v1 | link |
2025-06-19 | ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models | Puhao Li et.al. | 2506.16211v1 | null |
2025-06-19 | STAR-Pose: Efficient Low-Resolution Video Human Pose Estimation via Spatial-Temporal Adaptive Super-Resolution | Yucheng Jin et.al. | 2506.16061v1 | null |
2025-06-19 | KARL: Kalman-Filter Assisted Reinforcement Learner for Dynamic Object Tracking and Grasping | Kowndinya Boyalakuntla et.al. | 2506.15945v1 | null |
2025-06-19 | Beyond Audio and Pose: A General-Purpose Framework for Video Synchronization | Yosub Shin et.al. | 2506.15937v1 | null |
2025-06-18 | Improving Robotic Manipulation: Techniques for Object Pose Estimation, Accommodating Positional Uncertainty, and Disassembly Tasks from Examples | Viral Rasik Galaiya et.al. | 2506.15865v1 | null |
2025-06-18 | PRISM-Loc: a Lightweight Long-range LiDAR Localization in Urban Environments with Topological Maps | Kirill Muravyev et.al. | 2506.15849v1 | null |
2025-06-18 | Human Motion Capture from Loose and Sparse Inertial Sensors with Garment-aware Diffusion Models | Andela Ilic et.al. | 2506.15290v1 | null |
2025-06-18 | RA-NeRF: Robust Neural Radiance Field Reconstruction with Accurate Camera Pose Estimation under Complex Trajectories | Qingsong Yan et.al. | 2506.15242v1 | null |
2025-06-17 | PoseGRAF: Geometric-Reinforced Adaptive Fusion for Monocular 3D Human Pose Estimation | Ming Xu et.al. | 2506.14596v1 | link |
2025-06-17 | Non-Overlap-Aware Egocentric Pose Estimation for Collaborative Perception in Connected Autonomy | Hong Huang et.al. | 2506.14180v1 | null |
2025-06-17 | TACS-Graphs: Traversability-Aware Consistent Scene Graphs for Ground Robot Indoor Localization and Mapping | Jeewon Kim et.al. | 2506.14178v1 | null |
2025-06-16 | Diffusion-based Inverse Observation Model for Artificial Skin | Ante Maric et.al. | 2506.13986v1 | null |
2025-06-16 | ATK: Automatic Task-driven Keypoint Selection for Robust Policy Learning | Yunchu Zhang et.al. | 2506.13867v1 | null |
2025-06-16 | PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images | Lingteng Qiu et.al. | 2506.13766v1 | null |
2025-06-16 | JENGA: Object selection and pose estimation for robotic grasping from a stack | Sai Srinivas Jeevanandam et.al. | 2506.13425v1 | null |
2025-06-16 | Automatic Multi-View X-Ray/CT Registration Using Bone Substructure Contours | Roman Flepp et.al. | 2506.13292v1 | null |
2025-06-16 | DETRPose: Real-time end-to-end transformer model for multi-person pose estimation | Sebastian Janampa et.al. | 2506.13027v1 | link |
2025-06-15 | A large-scale, physically-based synthetic dataset for satellite pose estimation | Szabolcs Velkei et.al. | 2506.12782v1 | null |
2025-06-13 | ViTaSCOPE: Visuo-tactile Implicit Representation for In-hand Pose and Extrinsic Contact Estimation | Jayjun Lee et.al. | 2506.12239v1 | null |
2025-06-10 | Monocular 3D Hand Pose Estimation with Implicit Camera Alignment | Christos Pantazopoulos et.al. | 2506.11133v1 | link |
2025-06-12 | Occlusion-Aware 3D Hand-Object Pose Estimation with Masked AutoEncoders | Hui Yang et.al. | 2506.10816v1 | null |
2025-06-12 | In-Hand Object Pose Estimation via Visual-Tactile Fusion | Felix Nonnengießer et.al. | 2506.10787v1 | null |
2025-06-11 | Fluoroscopic Shape and Pose Tracking of Catheters with Custom Radiopaque Markers | Jared Lawson et.al. | 2506.09934v1 | null |
2025-06-11 | EquiCaps: Predictor-Free Pose-Aware Pre-Trained Capsule Networks | Athinoulla Konstantinou et.al. | 2506.09895v1 | link |
2025-06-11 | Accurate and efficient zero-shot 6D pose estimation with frozen foundation models | Andrea Caraffa et.al. | 2506.09784v1 | null |
2025-06-11 | CHIP: A multi-sensor dataset for 6D pose estimation of chairs in industrial settings | Mattia Nardon et.al. | 2506.09699v1 | null |
2025-06-10 | Princeton365: A Diverse Dataset with Accurate Camera Pose | Karhan Kayan et.al. | 2506.09035v1 | null |
2025-06-10 | ArrowPose: Segmentation, Detection, and 5 DoF Pose Estimation Network for Colorless Point Clouds | Frederik Hagelskjaer et.al. | 2506.08699v1 | null |
2025-06-09 | UA-Pose: Uncertainty-Aware 6D Object Pose Estimation and Online Object Completion with Partial References | Ming-Feng Li et.al. | 2506.07996v1 | null |
2025-06-09 | Hierarchical Scoring with 3D Gaussian Splatting for Instance Image-Goal Navigation | Yijie Deng et.al. | 2506.07338v1 | null |
2025-06-10 | From Generation to Generalization: Emergent Few-Shot Learning in Video Diffusion Models | Pablo Acuaviva et.al. | 2506.07280v2 | null |
2025-06-08 | GoTrack: Generic 6DoF Object Pose Refinement and Tracking | Van Nguyen Nguyen et.al. | 2506.07155v1 | null |
2025-06-08 | UNO: Unified Self-Supervised Monocular Odometry for Platform-Agnostic Deployment | Wentao Zhao et.al. | 2506.07013v1 | null |
2025-06-07 | Deep Inertial Pose: A deep learning approach for human pose estimation | Sara M. Cerqueira et.al. | 2506.06850v1 | null |
2025-06-06 | Dy3DGS-SLAM: Monocular 3D Gaussian Splatting SLAM for Dynamic Environments | Mingrui Li et.al. | 2506.05965v1 | null |
2025-06-06 | SurGSplat: Progressive Geometry-Constrained Gaussian Splatting for Surgical Scene Reconstruction | Yuchao Zheng et.al. | 2506.05935v1 | null |
2025-06-06 | CryoFastAR: Fast Cryo-EM Ab Initio Reconstruction Made Easy | Jiakai Zhang et.al. | 2506.05864v1 | null |
2025-06-06 | You Only Estimate Once: Unified, One-stage, Real-Time Category-level Articulated Object 6D Pose Estimation for Robotic Grasping | Jingshun Huang et.al. | 2506.05719v1 | null |
2025-06-05 | On-the-fly Reconstruction for Large-Scale Novel View Synthesis from Unposed Images | Andreas Meuleman et.al. | 2506.05558v1 | null |
2025-06-05 | Rectified Point Flow: Generic Point Cloud Pose Estimation | Tao Sun et.al. | 2506.05282v1 | null |
2025-06-05 | Realizing Text-Driven Motion Generation on NAO Robot: A Reinforcement Learning-Optimized Control Pipeline | Zihan Xu et.al. | 2506.05117v1 | link |
2025-06-05 | CzechLynx: A Dataset for Individual Identification and Pose Estimation of the Eurasian Lynx | Lukas Picek et.al. | 2506.04931v1 | null |
2025-06-05 | SupeRANSAC: One RANSAC to Rule Them All | Daniel Barath et.al. | 2506.04803v1 | link |
2025-06-05 | LGM-Pose: A Lightweight Global Modeling Network for Real-time Human Pose Estimation | Biao Guo et.al. | 2506.04561v1 | null |
2025-06-04 | Photoreal Scene Reconstruction from an Egocentric Device | Zhaoyang Lv et.al. | 2506.04444v1 | link |
2025-06-04 | cuVSLAM: CUDA accelerated visual odometry | Alexander Korovko et.al. | 2506.04359v1 | link |
2025-06-04 | Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation | Tianyu Huang et.al. | 2506.04225v1 | null |
2025-06-04 | Accelerating SfM-based Pose Estimation with Dominating Set | Joji Joseph et.al. | 2506.03667v1 | null |
2025-06-03 | Learning Pyramid-structured Long-range Dependencies for 3D Human Pose Estimation | Mingjie Wei et.al. | 2506.02853v1 | link |
2025-06-03 | GeneA-SLAM2: Dynamic SLAM with AutoEncoder-Preprocessed Genetic Keypoints Resampling and Depth Variance-Guided Dynamic Region Removal | Shufan Qing et.al. | 2506.02736v1 | link |
2025-06-02 | Rig3R: Rig-Aware Conditioning for Learned 3D Reconstruction | Samuel Li et.al. | 2506.02265v1 | null |
2025-06-02 | E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models | Wenyan Cong et.al. | 2506.01933v1 | null |
2025-06-02 | SteerPose: Simultaneous Extrinsic Camera Calibration and Matching from Articulation | Sang-Eun Lee et.al. | 2506.01691v1 | null |
2025-06-01 | TIGeR: Text-Instructed Generation and Refinement for Template-Free Hand-Object Interaction | Yiyao Huang et.al. | 2506.00953v1 | null |
2025-05-31 | XYZ-IBD: High-precision Bin-picking Dataset for Object 6D Pose Estimation Capturing Real-world Industrial Complexity | Junwen Huang et.al. | 2506.00599v1 | null |
2025-05-30 | Lazy Heuristic Search for Solving POMDPs with Expensive-to-Compute Belief Transitions | Muhammad Suhail Saleem et.al. | 2506.00285v1 | null |
2025-05-30 | 6D Pose Estimation on Point Cloud Data through Prior Knowledge Integration: A Case Study in Autonomous Disassembly | Chengzhi Wu et.al. | 2505.24669v1 | null |
2025-05-30 | Category-Level 6D Object Pose Estimation in Agricultural Settings Using a Lattice-Deformation Framework and Diffusion-Augmented Synthetic Data | Marios Glytsos et.al. | 2505.24636v1 | null |
2025-05-30 | PCIE_Pose Solution for EgoExo4D Pose and Proficiency Estimation Challenge | Feng Chen et.al. | 2505.24411v1 | null |
2025-05-29 | Pose-free 3D Gaussian splatting via shape-ray estimation | Youngju Na et.al. | 2505.22978v1 | null |
2025-05-28 | TwinTrack: Bridging Vision and Contact Physics for Real-Time Tracking of Unknown Dynamic Objects | Wen Yang et.al. | 2505.22882v1 | null |
2025-05-28 | 4DTAM: Non-Rigid Tracking and Mapping via Dynamic Surface Gaussians | Hidenobu Matsuki et.al. | 2505.22859v1 | null |
2025-05-28 | MultiFormer: A Multi-Person Pose Estimation System Based on CSI and Attention Mechanism | Yanyi Qu et.al. | 2505.22555v1 | null |
2025-05-28 | Event-based Egocentric Human Pose Estimation in Dynamic Environment | Wataru Ikeda et.al. | 2505.22007v1 | null |
2025-05-27 | Spectral Compression Transformer with Line Pose Graph for Monocular 3D Human Pose Estimation | Zenghao Zheng et.al. | 2505.21309v1 | null |
2025-05-29 | ReassembleNet: Learnable Keypoints and Diffusion for 2D Fresco Reconstruction | Adeela Islam et.al. | 2505.21117v2 | null |
2025-05-27 | HS-SLAM: A Fast and Hybrid Strategy-Based SLAM Approach for Low-Speed Autonomous Driving | Bingxiang Kang et.al. | 2505.20906v1 | null |
2025-05-27 | Mamba-Driven Topology Fusion for Monocular 3-D Human Pose Estimation | Zenghao Zheng et.al. | 2505.20611v1 | null |
2025-05-28 | HAND Me the Data: Fast Robot Adaptation via Hand Path Retrieval | Matthew Hong et.al. | 2505.20455v2 | null |
2025-05-25 | Learning the Contact Manifold for Accurate Pose Estimation During Peg-in-Hole Insertion of Complex Geometries | Abhay Negi et.al. | 2505.19215v1 | null |
2025-05-24 | Why Not Replace? Sustaining Long-Term Visual Localization via Handcrafted-Learned Feature Collaboration on CPU | Yicheng Lin et.al. | 2505.18652v1 | link |
2025-05-24 | An Inertial Sequence Learning Framework for Vehicle Speed Estimation via Smartphone IMU | Xuan Xiao et.al. | 2505.18490v1 | null |
2025-05-23 | Pose Splatter: A 3D Gaussian Splatting Model for Quantifying Animal Pose and Appearance | Jack Goffinet et.al. | 2505.18342v1 | null |
2025-05-23 | To Glue or Not to Glue? Classical vs Learned Image Matching for Mobile Mapping Cameras to Textured Semantic 3D Building Models | Simone Gaisbauer et.al. | 2505.17973v1 | link |
2025-05-23 | Towards Dynamic 3D Reconstruction of Hand-Instrument Interaction in Ophthalmic Surgery | Ming Hu et.al. | 2505.17677v1 | null |
2025-05-23 | PoseBH: Prototypical Multi-Dataset Training Beyond Human Pose Estimation | Uyoung Jeong et.al. | 2505.17475v1 | link |
2025-05-22 | Towards Texture- And Shape-Independent 3D Keypoint Estimation in Birds | Valentin Schmuker et.al. | 2505.16633v1 | null |
2025-05-22 | GMatch: Geometry-Constrained Feature Matching for RGB-D Object Pose Estimation | Ming Yang et.al. | 2505.16144v1 | null |
2025-05-21 | Object-Focus Actor for Data-efficient Robot Generalization Dexterous Manipulation | Yihang Li et.al. | 2505.15098v1 | null |
2025-05-20 | UPTor: Unified 3D Human Pose Dynamics and Trajectory Prediction for Human-Robot Interaction | Nisarga Nilavadi et.al. | 2505.14866v1 | null |
2025-05-19 | Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated Videos | Ruoyu Wang et.al. | 2505.13440v1 | link |
2025-05-19 | KinTwin: Imitation Learning with Torque and Muscle Driven Biomechanical Models Enables Precise Replication of Able-Bodied and Impaired Movement from Markerless Motion Capture | R. James Cotton et.al. | 2505.13436v1 | null |
2025-05-19 | The Way Up: A Dataset for Hold Usage Detection in Sport Climbing | Anna Maschek et.al. | 2505.12854v1 | null |
2025-05-17 | Keypoints as Dynamic Centroids for Unified Human Pose and Segmentation | Niaz Ahmad et.al. | 2505.12130v1 | null |
2025-05-17 | Black-box Adversaries from Latent Space: Unnoticeable Attacks on Human Pose and Shape Estimation | Zhiying Li et.al. | 2505.12009v1 | null |
2025-05-17 | ElderFallGuard: Real-Time IoT and Computer Vision-Based Fall Detection System for Elderly Safety | Tasrifur Riahi et.al. | 2505.11845v1 | null |
2025-05-16 | SurgPose: Generalisable Surgical Instrument Pose Estimation using Zero-Shot Learning and Stereo Vision | Utsav Rai et.al. | 2505.11439v1 | null |
2025-05-16 | MTevent: A Multi-Task Event Camera Dataset for 6D Pose Estimation and Moving Object Detection | Shrutarv Awasthi et.al. | 2505.11282v1 | link |
2025-05-16 | PoseBench3D: A Cross-Dataset Analysis Framework for 3D Human Pose Estimation | Saad Manzur et.al. | 2505.10888v1 | link |
2025-05-16 | RefPose: Leveraging Reference Geometric Correspondences for Accurate 6D Pose Estimation of Unseen Objects | Jaeguk Kim et.al. | 2505.10841v1 | null |
2025-05-14 | UMotion: Uncertainty-driven Human Motion Estimation from Inertial and Ultra-wideband Units | Huakun Liu et.al. | 2505.09393v1 | link |
2025-05-14 | APR-Transformer: Initial Pose Estimation for Localization in Complex Environments through Absolute Pose Regression | Srinivas Ravuri et.al. | 2505.09356v1 | link |
2025-05-13 | Real-time Capable Learning-based Visual Tool Pose Correction via Differentiable Simulation | Shuyuan Yang et.al. | 2505.08875v1 | null |
2025-05-12 | Sleep Position Classification using Transfer Learning for Bed-based Pressure Sensors | Olivier Papillon et.al. | 2505.08111v1 | null |
2025-05-07 | Pose Estimation for Intra-cardiac Echocardiography Catheter via AI-Based Anatomical Understanding | Jaeyoung Huh et.al. | 2505.07851v1 | null |
2025-05-12 | Enabling Privacy-Aware AI-Based Ergonomic Analysis | Sander De Coninck et.al. | 2505.07306v1 | null |
2025-05-13 | Human Motion Prediction via Test-domain-aware Adaptation with Easily-available Human Motions Estimated from Videos | Katsuki Shimbo et.al. | 2505.07301v2 | null |
2025-05-12 | When Dance Video Archives Challenge Computer Vision | Philippe Colantoni et.al. | 2505.07249v1 | null |
2025-05-10 | CompSLAM: Complementary Hierarchical Multi-Modal Localization and Mapping for Robot Autonomy in Underground Environments | Shehryar Khattak et.al. | 2505.06483v1 | null |
2025-05-09 | Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach | Tim Schneider et.al. | 2505.06182v1 | null |
2025-05-08 | Progressive Inertial Poser: Progressive Real-Time Kinematic Chain Estimation for 3D Full-Body Pose from Three IMU Sensors | Zunjie Zhu et.al. | 2505.05336v1 | null |
2025-05-08 | Improving Global Motion Estimation in Sparse IMU-based Motion Capture with Physics | Xinyu Yi et.al. | 2505.05010v1 | null |
2025-05-08 | An Efficient Method for Accurate Pose Estimation and Error Correction of Cuboidal Objects | Utsav Rai et.al. | 2505.04962v1 | null |
2025-05-07 | Comparison of Visual Trackers for Biomechanical Analysis of Running | Luis F. Gomez et.al. | 2505.04713v1 | null |
2025-05-07 | Do We Still Need to Work on Odometry for Autonomous Driving? | Cedric Le Gentil et.al. | 2505.04438v1 | null |
2025-05-07 | HDiffTG: A Lightweight Hybrid Diffusion-Transformer-GCN Architecture for 3D Human Pose Estimation | Yajie Fu et.al. | 2505.04276v1 | link |
2025-05-07 | One2Any: One-Reference 6D Pose Estimation for Any Object | Mengya Liu et.al. | 2505.04109v1 | null |
2025-05-06 | Polar Coordinate-Based 2D Pose Prior with Neural Distance Field | Qi Gan et.al. | 2505.03445v1 | null |
2025-05-06 | LiftFeat: 3D Geometry-Aware Local Feature Matching | Yepeng Liu et.al. | 2505.03422v1 | link |
2025-05-06 | Artificial Behavior Intelligence: Technology, Challenges, and Future Directions | Kanghyun Jo et.al. | 2505.03315v1 | null |
2025-05-05 | Dance of Fireworks: An Interactive Broadcast Gymnastics Training System Based on Pose Estimation | Haotian Chen et.al. | 2505.02690v1 | null |
2025-05-05 | Corr2Distrib: Making Ambiguous Correspondences an Ally to Predict Reliable 6D Pose Distributions | Asma Brazi et.al. | 2505.02501v1 | null |
2025-05-05 | Finger Pose Estimation for Under-screen Fingerprint Sensor | Xiongjun Guan et.al. | 2505.02481v1 | link |
2025-05-05 | 6D Pose Estimation on Spoons and Hands | Kevin Tan et.al. | 2505.02335v1 | null |
2025-05-04 | Continuous Normalizing Flows for Uncertainty-Aware Human Pose Estimation | Shipeng Liu et.al. | 2505.02287v1 | null |
2025-05-04 | A Birotation Solution for Relative Pose Problems | Hongbo Zhao et.al. | 2505.02025v1 | null |
2025-05-03 | Near-field 5D Pose Estimation using Reconfigurable Intelligent Surfaces | Srikar Sharma Sadhu et.al. | 2505.01829v1 | null |
2025-05-03 | AquaGS: Fast Underwater Scene Reconstruction with SfM-Free Gaussian Splatting | Junhao Shi et.al. | 2505.01799v1 | null |
2025-05-03 | PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth | Bu Jin et.al. | 2505.01729v1 | null |
2025-05-02 | T-Graph: Enhancing Sparse-view Camera Pose Estimation by Pairwise Translation Graph | Qingyu Xian et.al. | 2505.01207v1 | null |
2025-05-02 | 3D Human Pose Estimation via Spatial Graph Order Attention and Temporal Body Aware Transformer | Kamel Aouaidjia et.al. | 2505.01003v1 | link |
2025-05-01 | Are Minimal Radial Distortion Solvers Really Necessary for Relative Pose Estimation? | Viktor Kocur et.al. | 2505.00866v1 | link |
2025-05-01 | P2P-Insole: Human Pose Estimation Using Foot Pressure Distribution and Motion Sensors | Atsuya Watanabe et.al. | 2505.00755v1 | null |
2025-05-01 | Dietary Intake Estimation via Continuous 3D Reconstruction of Food | Wallace Lee et.al. | 2505.00606v1 | null |
2025-05-02 | InterLoc: LiDAR-based Intersection Localization using Road Segmentation with Automated Evaluation Method | Nguyen Hoang Khoi Tran et.al. | 2505.00512v2 | null |
2025-04-30 | Self-Supervised Monocular Visual Drone Model Identification through Improved Occlusion Handling | Stavrow A. Bahnam et.al. | 2504.21695v1 | null |
2025-04-30 | Multiview Point Cloud Registration via Optimization in an Autoencoder Latent Space | Luc Vedrenne et.al. | 2504.21467v1 | null |
2025-04-29 | Dance Style Recognition Using Laban Movement Analysis | Muhammad Turab et.al. | 2504.21166v1 | null |
2025-04-29 | Adept: Annotation-Denoising Auxiliary Tasks with Discrete Cosine Transform Map and Keypoint for Human-Centric Pretraining | Weizhen He et.al. | 2504.20800v1 | null |
2025-04-29 | A Survey on Event-based Optical Marker Systems | Nafiseh Jabbari Tofighi et.al. | 2504.20736v1 | null |
2025-04-29 | Large-scale visual SLAM for in-the-wild videos | Shuo Sun et.al. | 2504.20496v1 | null |
2025-05-01 | GSFeatLoc: Visual Localization Using Feature Correspondence on 3D Gaussian Splatting | Jongwon Lee et.al. | 2504.20379v2 | null |
2025-05-01 | PRISM-DP: Spatial Pose-based Observations for Diffusion-Policies via Segmentation, Mesh Generation, and Pose Tracking | Xiatao Sun et.al. | 2504.20359v2 | null |
2025-04-28 | Transformation & Translation Occupancy Grid Mapping: 2-Dimensional Deep Learning Refined SLAM | Leon Davies et.al. | 2504.19654v1 | null |
2025-04-28 | GAN-SLAM: Real-Time GAN Aided Floor Plan Creation Through SLAM | Leon Davies et.al. | 2504.19653v1 | null |
2025-04-28 | Category-Level and Open-Set Object Pose Estimation for Robotics | Peter Hönig et.al. | 2504.19572v1 | null |
2025-04-25 | Certifiably-Correct Mapping for Safe Navigation Despite Odometry Drift | Devansh R. Agrawal et.al. | 2504.18713v1 | null |
2025-04-25 | SSD-Poser: Avatar Pose Estimation with State Space Duality from Sparse Observations | Shuting Zhao et.al. | 2504.18332v1 | null |
2025-04-25 | S3MOT: Monocular 3D Object Tracking with Selective State Space Model | Zhuohao Yan et.al. | 2504.18068v1 | null |
2025-04-22 | SmallGS: Gaussian Splatting-based Camera Pose Estimation for Small-Baseline Videos | Yuxin Yao et.al. | 2504.17810v1 | null |
2025-04-24 | Dynamic Camera Poses and Where to Find Them | Chris Rockwell et.al. | 2504.17788v1 | null |
2025-04-24 | A Guide to Structureless Visual Localization | Vojtech Panek et.al. | 2504.17636v1 | null |
2025-04-24 | Object Pose Estimation by Camera Arm Control Based on the Next Viewpoint Estimation | Tomoki Mizuno et.al. | 2504.17424v1 | null |
2025-04-24 | Bias-Eliminated PnP for Stereo Visual Odometry: Provably Consistent and Large-Scale Localization | Guangyang Zeng et.al. | 2504.17410v1 | null |
2025-04-23 | WiFi based Human Fall and Activity Recognition using Transformer based Encoder Decoder and Graph Neural Networks | Younggeol Cho et.al. | 2504.16655v1 | null |
2025-04-23 | Assessing the Feasibility of Internet-Sourced Video for Automatic Cattle Lameness Detection | Md Fahimuzzman Sohan et.al. | 2504.16404v1 | null |
2025-04-22 | SignX: The Foundation Model for Sign Recognition | Sen Fang et.al. | 2504.16315v1 | null |
2025-04-22 | GADS: A Super Lightweight Model for Head Pose Estimation | Menan Velayuthan et.al. | 2504.15751v1 | null |
2025-04-21 | Field Report on Ground Penetrating Radar for Localization at the Mars Desert Research Station | Anja Sheppard et.al. | 2504.15455v1 | null |
2025-04-21 | Vision6D: 3D-to-2D Interactive Visualization and Annotation Tool for 6D Pose Estimation | Yike Zhang et.al. | 2504.15329v1 | link |
2025-04-21 | Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs | Chun-Hsiao Yeh et.al. | 2504.15280v1 | link |
2025-04-21 | Instance-Adaptive Keypoint Learning with Local-to-Global Geometric Aggregation for Category-Level Object Pose Estimation | Xiao Zhang et.al. | 2504.15134v1 | null |
2025-04-20 | Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction | Weirong Chen et.al. | 2504.14516v1 | null |
2025-04-20 | SG-Reg: Generalizable and Efficient Scene Graph Registration | Chuhao Liu et.al. | 2504.14440v1 | link |
2025-04-18 | Imitation Learning with Precisely Labeled Human Demonstrations | Yilong Song et.al. | 2504.13803v1 | null |
2025-04-18 | Mono3R: Exploiting Monocular Cues for Geometric 3D Reconstruction | Wenyu Li et.al. | 2504.13419v1 | null |
2025-04-17 | ViTa-Zero: Zero-shot Visuotactile Object 6D Pose Estimation | Hongyu Li et.al. | 2504.13179v1 | null |
2025-04-18 | ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos | Zetong Zhang et.al. | 2504.13167v2 | null |
2025-04-17 | Unsupervised Cross-Domain 3D Human Pose Estimation via Pseudo-Label-Guided Global Transforms | Jingjing Liu et.al. | 2504.12699v1 | null |
2025-04-16 | MobilePoser: Real-Time Full-Body Pose Estimation and 3D Human Translation from IMUs in Mobile Consumer Devices | Vasco Xu et.al. | 2504.12492v1 | link |
2025-04-16 | Diffusion Based Robust LiDAR Place Recognition | Benjamin Krummenacher et.al. | 2504.12412v1 | null |
2025-04-16 | Regist3R: Incremental Registration with Stereo Foundation Model | Sidun Liu et.al. | 2504.12356v1 | null |
2025-04-16 | CoMotion: Concurrent Multi-person 3D Motion | Alejandro Newell et.al. | 2504.12186v1 | link |
2025-04-16 | No Fuss, Just Function -- A Proposal for Non-Intrusive Full Body Tracking in XR for Meaningful Spatial Interactions | Elisabeth Mayer et.al. | 2504.11987v1 | null |
2025-04-16 | An Online Adaptation Method for Robust Depth Estimation and Visual Odometry in the Open World | Xingwu Ji et.al. | 2504.11698v1 | link |
2025-04-17 | CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image | Jingshun Huang et.al. | 2504.11230v2 | null |
2025-04-15 | DMAGaze: Gaze Estimation Based on Feature Disentanglement and Multi-Scale Attention | Haohan Chen et.al. | 2504.11160v1 | null |
2025-04-14 | MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model | Jian Liu et.al. | 2504.10433v1 | link |
2025-04-14 | Benchmarking 3D Human Pose Estimation Models Under Occlusions | Filipa Lino et.al. | 2504.10350v1 | null |
2025-04-15 | Differentially Private 2D Human Pose Estimation | Kaushik Bhargav Sivangi et.al. | 2504.10190v2 | null |
2025-04-14 | TT3D: Table Tennis 3D Reconstruction | Thomas Gossard et.al. | 2504.10035v1 | null |
2025-04-14 | Efficient 2D to Full 3D Human Pose Uplifting including Joint Rotations | Katja Ludwig et.al. | 2504.09953v1 | null |
2025-04-14 | NeRF-Based Transparent Object Grasping Enhanced by Shape Priors | Yi Han et.al. | 2504.09868v1 | null |
2025-04-13 | EasyREG: Easy Depth-Based Markerless Registration and Tracking using Augmented Reality Device for Surgical Guidance | Yue Yang et.al. | 2504.09498v1 | null |
2025-04-12 | SCFlow2: Plug-and-Play Object Pose Refiner with Shape-Constraint Scene Flow | Qingyuan Wang et.al. | 2504.09160v1 | null |
2025-04-12 | A Constrained Optimization Approach for Gaussian Splatting from Coarsely-posed Images and Noisy Lidar Point Clouds | Jizong Peng et.al. | 2504.09129v1 | null |
2025-04-12 | BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting | Jeongwan On et.al. | 2504.09097v1 | null |
2025-04-11 | The Invisible EgoHand: 3D Hand Forecasting through EgoBody Pose Estimation | Masashi Hatano et.al. | 2504.08654v1 | null |
2025-04-11 | MBE-ARI: A Multimodal Dataset Mapping Bi-directional Engagement in Animal-Robot Interaction | Ian Noronha et.al. | 2504.08646v1 | link |
2025-04-11 | Hardware, Algorithms, and Applications of the Neuromorphic Vision Sensor: a Review | Claudio Cimarelli et.al. | 2504.08588v1 | null |
2025-04-11 | Multi-person Physics-based Pose Estimation for Combat Sports | Hossein Feiz et.al. | 2504.08175v1 | null |
2025-04-10 | Towards Unconstrained 2D Pose Estimation of the Human Spine | Muhammad Saif Ullah Khan et.al. | 2504.08110v1 | null |
2025-04-10 | BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation | Yuanhong Yu et.al. | 2504.07955v1 | null |
2025-04-09 | DLTPose: 6DoF Pose Estimation From Accurate Dense Surface Point Estimates | Akash Jadhav et.al. | 2504.07335v1 | null |
2025-04-09 | Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation | Yu Qi et.al. | 2504.06961v1 | null |
2025-04-09 | GraspClutter6D: A Large-scale Real-world Dataset for Robust Perception and Grasping in Cluttered Scenes | Seunghyeok Back et.al. | 2504.06866v1 | link |
2025-04-09 | Setup-Invariant Augmented Reality for Teaching by Demonstration with Surgical Robots | Alexandre Banks et.al. | 2504.06677v1 | link |
2025-04-09 | HGMamba: Enhancing 3D Human Pose Estimation with a HyperGCN-Mamba Network | Hu Cui et.al. | 2504.06638v1 | null |
2025-04-08 | Leveraging Synthetic Adult Datasets for Unsupervised Infant Pose Estimation | Sarosij Bose et.al. | 2504.05789v1 | null |
2025-04-08 | SAP-CoPE: Social-Aware Planning using Cooperative Pose Estimation with Infrastructure Sensor Nodes | Minghao Ning et.al. | 2504.05727v1 | link |
2025-04-08 | POMATO: Marrying Pointmap Matching with Temporal Motion for Dynamic 3D Reconstruction | Songyan Zhang et.al. | 2504.05692v1 | link |
2025-04-10 | Learning Affine Correspondences by Integrating Geometric Constraints | Pengju Sun et.al. | 2504.04834v2 | link |
2025-04-10 | A Convex and Global Solution for the P |
Jiayi Su et.al. | 2504.04445v2 | null |
2025-04-05 | 3R-GS: Best Practice in Optimizing Camera Poses Along with 3DGS | Zhisheng Huang et.al. | 2504.04294v1 | null |
2025-04-02 | A Geometric Approach For Pose and Velocity Estimation Using IMU and Inertial/Body-Frame Measurements | Sifeddine Benahmed et.al. | 2504.03764v1 | null |
2025-04-04 | Robust Human Registration with Body Part Segmentation on Noisy Point Clouds | Kai Lascheit et.al. | 2504.03602v1 | null |
2025-04-04 | Endo3R: Unified Online Reconstruction from Dynamic Monocular Endoscopic Video | Jiaxin Guo et.al. | 2504.03198v1 | null |
2025-04-03 | Cooperative Inference for Real-Time 3D Human Pose Estimation in Multi-Device Edge Networks | Hyun-Ho Choi et.al. | 2504.03052v1 | link |
2025-04-03 | BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation | Van Nguyen Nguyen et.al. | 2504.02812v1 | null |
2025-04-03 | PicoPose: Progressive Pixel-to-Pixel Correspondence Learning for Novel Object Pose Estimation | Lihua Liu et.al. | 2504.02617v1 | link |
2025-04-02 | Dual-stream Transformer-GCN Model with Contextualized Representations Learning for Monocular 3D Human Pose Estimation | Mingrui Ye et.al. | 2504.01764v1 | link |
2025-04-02 | ForestVO: Enhancing Visual Odometry in Forest Environments through ForestGlue | Thomas Pritchard et.al. | 2504.01261v1 | link |
2025-04-01 | AP-CAP: Advancing High-Quality Data Synthesis for Animal Pose Estimation via a Controllable Image Generation Pipeline | Lei Wang et.al. | 2504.00394v1 | null |
2025-03-31 | Easi3R: Estimating Disentangled Motion from DUSt3R Without Training | Xingyu Chen et.al. | 2503.24391v1 | link |
2025-03-31 | LiM-Loc: Visual Localization with Dense and Accurate 3D Reference Maps Directly Corresponding 2D Keypoints to 3D LiDAR Point Clouds | Masahiko Tsuji et.al. | 2503.23664v1 | null |
2025-03-30 | PhysPose: Refining 6D Object Poses with Physical Constraints | Martin Malenický et.al. | 2503.23587v1 | null |
2025-03-30 | Improving Indoor Localization Accuracy by Using an Efficient Implicit Neural Map Representation | Haofei Kuang et.al. | 2503.23480v1 | link |
2025-03-30 | SparseLoc: Sparse Open-Set Landmark-based Global Localization for Autonomous Navigation | Pranjal Paul et.al. | 2503.23465v1 | null |
2025-03-30 | HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation | Hongwei Zheng et.al. | 2503.23331v1 | null |
2025-03-29 | Incorporating GNSS Information with LIDAR-Inertial Odometry for Accurate Land-Vehicle Localization | Jintao Cheng et.al. | 2503.23199v1 | null |
2025-03-29 | FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video | Andrea Boscolo Camiletto et.al. | 2503.23094v1 | null |
2025-03-28 | ForcePose: A Deep Learning Approach for Force Calculation Based on Action Recognition Using MediaPipe Pose Estimation Combined with Object Detection | Nandakishor M et.al. | 2503.22363v1 | null |
2025-03-28 | GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion | Li-Heng Chen et.al. | 2503.22349v1 | null |
2025-03-27 | NeRF-based Point Cloud Reconstruction using a Stationary Camera for Agricultural Applications | Kibon Ku et.al. | 2503.21958v1 | null |
2025-03-27 | Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video | David Yifan Yao et.al. | 2503.21761v1 | link |
2025-03-27 | Reconstructing Humans with a Biomechanically Accurate Skeleton | Yan Xia et.al. | 2503.21751v1 | null |
2025-03-27 | OccRobNet : Occlusion Robust Network for Accurate 3D Interacting Hand-Object Pose Estimation | Mallika Garg et.al. | 2503.21723v1 | null |
2025-03-27 | RapidPoseTriangulation: Multi-view Multi-person Whole-body Human Pose Triangulation in a Millisecond | Daniel Bermuth et.al. | 2503.21692v1 | null |
2025-03-27 | STAMICS: Splat, Track And Map with Integrated Consistency and Semantics for Dense RGB-D SLAM | Yongxu Wang et.al. | 2503.21425v1 | null |
2025-03-27 | Lidar-only Odometry based on Multiple Scan-to-Scan Alignments over a Moving Window | Aaron Kurda et.al. | 2503.21293v1 | null |
2025-03-27 | Recurrent Feature Mining and Keypoint Mixup Padding for Category-Agnostic Pose Estimation | Junjie Chen et.al. | 2503.21140v1 | link |
2025-03-26 | DINeMo: Learning Neural Mesh Models with no 3D Annotations | Weijie Guo et.al. | 2503.20220v1 | null |
2025-03-25 | Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors | Yuke Lou et.al. | 2503.20118v1 | null |
2025-03-25 | Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders | Paul Koch et.al. | 2503.19947v1 | link |
2025-03-25 | Visuo-Tactile Object Pose Estimation for a Multi-Finger Robot Hand with Low-Resolution In-Hand Tactile Sensing | Lukas Mack et.al. | 2503.19893v1 | null |
2025-03-25 | Semi-SD: Semi-Supervised Metric Depth Estimation via Surrounding Cameras for Autonomous Driving | Yusen Xie et.al. | 2503.19713v1 | link |
2025-03-25 | DynOPETs: A Versatile Benchmark for Dynamic Object Pose Estimation and Tracking in Moving Camera Scenarios | Xiangting Meng et.al. | 2503.19625v1 | null |
2025-03-25 | Pose-Based Fall Detection System: Efficient Monitoring on Standard CPUs | Vinayak Mali et.al. | 2503.19501v1 | null |
2025-03-25 | Multi-modal 3D Pose and Shape Estimation with Computed Tomography | Mingxiao Tu et.al. | 2503.19405v1 | null |
2025-03-25 | From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting | Zhiwei Huang et.al. | 2503.19358v1 | null |
2025-03-25 | Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation | Zhuoran Zhao et.al. | 2503.19307v1 | link |
2025-03-25 | Any6D: Model-free 6D Pose Estimation of Novel Objects | Taeyeop Lee et.al. | 2503.18673v2 | null |
2025-03-24 | Structure-Aware Correspondence Learning for Relative Pose Estimation | Yihan Chen et.al. | 2503.18671v1 | null |
2025-03-24 | TrackID3x3: A Dataset and Algorithm for Multi-Player Tracking with Identification and Pose Estimation in 3x3 Basketball Full-court Videos | Kazuhiro Yamada et.al. | 2503.18282v1 | link |
2025-03-23 | Selecting and Pruning: A Differentiable Causal Sequentialized State-Space Model for Two-View Correspondence Learning | Xiang Fang et.al. | 2503.17938v1 | null |
2025-03-22 | Co-op: Correspondence-based Novel Object Pose Estimation | Sungphill Moon et.al. | 2503.17731v1 | null |
2025-03-21 | Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image | Jerred Chen et.al. | 2503.17358v1 | null |
2025-03-21 | Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors | Wonbong Jang et.al. | 2503.17316v1 | null |
2025-03-20 | ContactFusion: Stochastic Poisson Surface Maps from Visual and Contact Sensing | Aditya Kamireddypalli et.al. | 2503.16592v1 | null |
2025-03-20 | Probabilistic Prompt Distribution Learning for Animal Pose Estimation | Jiyong Rao et.al. | 2503.16120v1 | link |
2025-03-20 | PoseTraj: Pose-Aware Trajectory Control in Video Diffusion | Longbin Ji et.al. | 2503.16068v1 | null |
2025-03-20 | Automating 3D Dataset Generation with Neural Radiance Fields | P. Schulz et.al. | 2503.15997v1 | link |
2025-03-20 | Learning to Efficiently Adapt Foundation Models for Self-Supervised Endoscopic 3D Scene Reconstruction from Any Cameras | Beilei Cui et.al. | 2503.15917v1 | null |
2025-03-19 | EdgeRegNet: Edge Feature-based Multimodal Registration Network between Images and LiDAR Point Clouds | Yuanchao Yue et.al. | 2503.15284v1 | link |
2025-03-20 | GIVEPose: Gradual Intra-class Variation Elimination for RGB-based Category-Level Object Pose Estimation | Zinqin Huang et.al. | 2503.15110v2 | link |
2025-03-20 | Distilling 3D distinctive local descriptors for 6D pose estimation | Amir Hamza et.al. | 2503.15106v2 | null |
2025-03-18 | Validation of Human Pose Estimation and Human Mesh Recovery for Extracting Clinically Relevant Motion Data from Videos | Kai Armstrong et.al. | 2503.14760v1 | null |
2025-03-18 | SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model | Yucheng Mao et.al. | 2503.14463v1 | null |
2025-03-18 | SCJD: Sparse Correlation and Joint Distillation for Efficient 3D Human Pose Estimation | Weihong Chen et.al. | 2503.14097v1 | null |
2025-03-18 | Foundation Feature-Driven Online End-Effector Pose Estimation: A Marker-Free and Learning-Free Approach | Tianshu Wu et.al. | 2503.14051v1 | null |
2025-03-19 | Learning Shape-Independent Transformation via Spherical Representations for Category-Level Object Pose Estimation | Huan Ren et.al. | 2503.13926v2 | null |
2025-03-20 | STEP: Simultaneous Tracking and Estimation of Pose for Animals and Humans | Shashikant Verma et.al. | 2503.13344v2 | link |
2025-03-17 | UniHOPE: A Unified Approach for Hand-Only and Hand-Object Pose Estimation | Yinqiao Wang et.al. | 2503.13303v1 | null |
2025-03-17 | Uncertainty-Aware Knowledge Distillation for Compact and Efficient 6DoF Pose Estimation | Nassim Ali Ousalah et.al. | 2503.13053v1 | null |
2025-03-17 | PoseSyn: Synthesizing Diverse 3D Pose Data from In-the-Wild 2D Data | ChangHee Yang et.al. | 2503.13025v1 | null |
2025-03-15 | Gun Detection Using Combined Human Pose and Weapon Appearance | Amulya Reddy Maligireddy et.al. | 2503.12215v1 | null |
2025-03-15 | TACO: Taming Diffusion for in-the-wild Video Amodal Completion | Ruijie Lu et.al. | 2503.12049v1 | null |
2025-03-14 | Bring Your Rear Cameras for Egocentric 3D Human Pose Estimation | Hiroyasu Akada et.al. | 2503.11652v1 | null |
2025-03-14 | Online Test-time Adaptation for 3D Human Pose Estimation: A Practical Perspective with Estimated 2D Poses | Qiuxia Lin et.al. | 2503.11194v1 | null |
2025-03-14 | Fast and Robust Localization for Humanoid Soccer Robot via Iterative Landmark Matching | Ruochen Hou et.al. | 2503.11020v1 | null |
2025-03-13 | Clothes-Changing Person Re-identification Based On Skeleton Dynamics | Asaf Joseph et.al. | 2503.10759v1 | null |
2025-03-13 | Consistent multi-animal pose estimation in cattle using dynamic Kalman filter based tracking | Maarten Perneel et.al. | 2503.10450v1 | link |
2025-03-13 | 6D Object Pose Tracking in Internet Videos for Robotic Manipulation | Georgy Ponimatkin et.al. | 2503.10307v1 | null |
2025-03-13 | VicaSplat: A Single Run is All You Need for 3D Gaussian Splatting and Camera Estimation from Unposed Video Frames | Zhiqi Li et.al. | 2503.10286v1 | null |
2025-03-12 | Physics-Aware Human-Object Rendering from Sparse Views via 3D Gaussian Splatting | Weiquan Wang et.al. | 2503.09640v1 | null |
2025-03-12 | GenHPE: Generative Counterfactuals for 3D Human Pose Estimation with Radio Frequency Signals | Shuokang Huang et.al. | 2503.09537v1 | null |
2025-03-12 | MonoSLAM: Robust Monocular SLAM with Global Structure Optimization | Bingzheng Jiang et.al. | 2503.09296v1 | null |
2025-03-12 | Better Together: Unified Motion Capture and 3D Avatar Reconstruction | Arthur Moreau et.al. | 2503.09293v1 | null |
2025-03-11 | Acoustic Neural 3D Reconstruction Under Pose Drift | Tianxiang Lin et.al. | 2503.08930v1 | null |
2025-03-11 | Keypoint Semantic Integration for Improved Feature Matching in Outdoor Agricultural Environments | Rajitha de Silva et.al. | 2503.08843v1 | null |
2025-03-11 | Keypoint Detection and Description for Raw Bayer Images | Jiakai Lin et.al. | 2503.08673v1 | null |
2025-03-11 | SGNetPose+: Stepwise Goal-Driven Networks with Pose Information for Trajectory Prediction in Autonomous Driving | Akshat Ghiya et.al. | 2503.08016v1 | null |
2025-03-10 | Better Pose Initialization for Fast and Robust 2D/3D Pelvis Registration | Yehyun Suh et.al. | 2503.07767v1 | null |
2025-03-10 | HumanMM: Global Human Motion Recovery from Multi-shot Videos | Yuhong Zhang et.al. | 2503.07597v1 | link |
2025-03-11 | AthletePose3D: A Benchmark Dataset for 3D Human Pose Estimation and Kinematic Validation in Athletic Movements | Calvin Yeung et.al. | 2503.07499v2 | link |
2025-03-10 | Multi-Robot System for Cooperative Exploration in Unknown Environments: A Survey | Chuqi Wang et.al. | 2503.07278v1 | null |
2025-03-12 | Endo-FASt3r: Endoscopic Foundation model Adaptation for Structure from motion | Mona Sheikh Zeinoddin et.al. | 2503.07204v2 | null |
2025-03-10 | Multi-Modal 3D Mesh Reconstruction from Images and Text | Melvin Reka et.al. | 2503.07190v1 | null |
2025-03-11 | PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM | Alan Dao et.al. | 2503.07111v2 | null |
2025-03-09 | AxisPose: Model-Free Matching-Free Single-Shot 6D Object Pose Estimation via Axis Generation | Yang Zou et.al. | 2503.06660v1 | null |
2025-03-08 | NeuraLoc: Visual Localization in Neural Implicit Map with Dual Complementary Features | Hongjia Zhai et.al. | 2503.06117v1 | null |
2025-03-08 | Fish2Mesh Transformer: 3D Human Mesh Recovery from Egocentric Vision | David C. Jeong et.al. | 2503.06089v1 | null |
2025-03-08 | ReJSHand: Efficient Real-Time Hand Pose Estimation and Mesh Reconstruction Using Refined Joint and Skeleton Features | Shan An et.al. | 2503.05995v1 | link |
2025-03-07 | Differentiable Rendering-based Pose Estimation for Surgical Robotic Instruments | Zekai Liang et.al. | 2503.05953v1 | null |
2025-03-07 | Novel Object 6D Pose Estimation with a Single Reference View | Jian Liu et.al. | 2503.05578v1 | link |
2025-03-07 | Multi-Grained Feature Pruning for Video-Based Human Pose Estimation | Zhigang Wang et.al. | 2503.05365v1 | null |
2025-03-07 | Persistent Object Gaussian Splat (POGS) for Tracking Human and Robot Manipulation of Irregularly Shaped Objects | Justin Yu et.al. | 2503.05189v1 | null |
2025-03-07 | SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting | Linqi Yang et.al. | 2503.05174v1 | null |
2025-03-07 | GaussianCAD: Robust Self-Supervised CAD Reconstruction from Three Orthographic Views Using 3D Gaussian Splatting | Zheng Zhou et.al. | 2503.05161v1 | null |
2025-03-06 | MarsLGPR: Mars Rover Localization with Ground Penetrating Radar | Anja Sheppard et.al. | 2503.04944v1 | null |
2025-03-09 | ReynoldsFlow: Exquisite Flow Estimation via Reynolds Transport Theorem | Yu-Hsi Chen et.al. | 2503.04500v2 | link |
2025-03-05 | Active 6D Pose Estimation for Textureless Objects using Multi-View RGB Frames | Jun Yang et.al. | 2503.03726v1 | null |
2025-03-05 | Machine Learning in Biomechanics: Key Applications and Limitations in Walking, Running, and Sports Movements | Carlo Dindorf et.al. | 2503.03717v1 | null |
2025-03-05 | Improving 6D Object Pose Estimation of metallic Household and Industry Objects | Thomas Pöllabauer et.al. | 2503.03655v1 | null |
2025-03-05 | Tiny Lidars for Manipulator Self-Awareness: Sensor Characterization and Initial Localization Experiments | Giammarco Caroleo et.al. | 2503.03449v1 | null |
2025-03-05 | Direct Sparse Odometry with Continuous 3D Gaussian Maps for Indoor Environments | Jie Deng et.al. | 2503.03373v1 | link |
2025-03-05 | Supervised Visual Docking Network for Unmanned Surface Vehicles Using Auto-labeling in Real-world Water Environments | Yijie Chu et.al. | 2503.03282v1 | null |
2025-03-05 | SCORE: Saturated Consensus Relocalization in Semantic Line Maps | Haodong Jiang et.al. | 2503.03254v1 | link |
2025-03-04 | Monocular Person Localization under Camera Ego-motion | Yu Zhan et.al. | 2503.02916v1 | null |
2025-03-04 | PIDLoc: Cross-View Pose Optimization Network Inspired by PID Controllers | Wooju Lee et.al. | 2503.02388v1 | null |
2025-03-04 | DQO-MAP: Dual Quadrics Multi-Object mapping with Gaussian Splatting | Haoyuan Li et.al. | 2503.02223v1 | link |
2025-03-04 | Zero-Shot Sim-to-Real Visual Quadrotor Control with Hard Constraints | Yan Miao et.al. | 2503.02198v1 | null |
2025-03-03 | Constraint-Based Modeling of Dynamic Entities in 3D Scene Graphs for Robust SLAM | Marco Giberna et.al. | 2503.02050v1 | null |
2025-03-05 | Category-level Meta-learned NeRF Priors for Efficient Object Mapping | Saad Ejaz et.al. | 2503.01582v2 | null |
2025-03-03 | RUSSO: Robust Underwater SLAM with Sonar Optimization against Visual Degradation | Shu Pan et.al. | 2503.01434v1 | null |
2025-03-03 | ecg2o: A Seamless Extension of g2o for Equality-Constrained Factor Graph Optimization | Anas Abdelkarim et.al. | 2503.01311v1 | link |
2025-03-03 | Convex Hull-based Algebraic Constraint for Visual Quadric SLAM | Xiaolong Yu et.al. | 2503.01254v1 | link |
2025-03-04 | Floorplan-SLAM: A Real-Time, High-Accuracy, and Long-Term Multi-Session Point-Plane SLAM for Efficient Floorplan Reconstruction | Haolin Wang et.al. | 2503.00397v2 | null |
2025-03-01 | BGM2Pose: Active 3D Human Pose Estimation with Non-Stationary Sounds | Yuto Shibata et.al. | 2503.00389v1 | null |
2025-02-28 | BST: Badminton Stroke-type Transformer for Skeleton-based Action Recognition in Racket Sports | Jing-Yuan Chang et.al. | 2502.21085v1 | link |
2025-02-28 | Two-Stream Spatial-Temporal Transformer Framework for Person Identification via Natural Conversational Keypoints | Masoumeh Chapariniya et.al. | 2502.20803v1 | null |
2025-02-27 | Cutting-edge 3D reconstruction solutions for underwater coral reef images: A review and comparison | Jiageng Zhong et.al. | 2502.20154v1 | null |
2025-02-27 | BEV-DWPVO: BEV-based Differentiable Weighted Procrustes for Low Scale-drift Monocular Visual Odometry on Ground | Yufei Wei et.al. | 2502.20078v1 | null |
2025-02-28 | SegLocNet: Multimodal Localization Network for Autonomous Driving via Bird's-Eye-View Segmentation | Zijie Zhou et.al. | 2502.20077v2 | link |
2025-02-27 | RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges | Thibaut Loiseau et.al. | 2502.19955v1 | null |
2025-02-27 | QORT-Former: Query-optimized Real-time Transformer for Understanding Two Hands Manipulating Objects | Elkhan Ismayilzada et.al. | 2502.19769v1 | null |
2025-02-27 | Accurate Pose Estimation for Flight Platforms based on Divergent Multi-Aperture Imaging System | Shunkun Liang et.al. | 2502.19708v1 | null |
2025-02-26 | Increasing the Task Flexibility of Heavy-Duty Manipulators Using Visual 6D Pose Estimation of Objects | Petri Mäkinen et.al. | 2502.19169v1 | null |
2025-02-25 | EgoSim: An Egocentric Multi-view Simulator and Real Dataset for Body-worn Cameras during Motion and Activity | Dominik Hollidt et.al. | 2502.18373v1 | null |
2025-02-25 | Learning Structure-Supporting Dependencies via Keypoint Interactive Transformer for General Mammal Pose Estimation | Tianyang Xu et.al. | 2502.18214v1 | link |
2025-02-24 | V-HOP: Visuo-Haptic 6D Object Pose Tracking | Hongyu Li et.al. | 2502.17434v1 | null |
2025-02-23 | Orchestrating Joint Offloading and Scheduling for Low-Latency Edge SLAM | Yao Zhang et.al. | 2502.16495v1 | null |
2025-02-23 | DeProPose: Deficiency-Proof 3D Human Pose Estimation via Adaptive Multi-View Fusion | Jianbin Jiao et.al. | 2502.16419v1 | link |
2025-02-21 | RGB-Only Gaussian Splatting SLAM for Unbounded Outdoor Scenes | Sicheng Yu et.al. | 2502.15633v1 | null |
2025-02-21 | SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training | Nie Lin et.al. | 2502.15251v1 | link |
2025-02-21 | Nonlinear Dynamical Systems for Automatic Face Annotation in Head Tracking and Pose Estimation | Thoa Thieu et.al. | 2502.15179v1 | null |
2025-02-20 | Design of a Visual Pose Estimation Algorithm for Moon Landing | Atakan Süslü et.al. | 2502.14942v1 | null |
2025-02-20 | Hier-SLAM++: Neuro-Symbolic Semantic SLAM with a Hierarchically Categorical Gaussian Splatting | Boying Li et.al. | 2502.14931v1 | null |
2025-02-19 | EfficientPose 6D: Scalable and Efficient 6D Object Pose Estimation | Zixuan Fang et.al. | 2502.14061v1 | null |
2025-02-19 | Active Illumination for Visual Ego-Motion Estimation in the Dark | Francesco Crocetti et.al. | 2502.13708v1 | null |
2025-02-19 | Object-Pose Estimation With Neural Population Codes | Heiko Hoffmann et.al. | 2502.13403v1 | null |
2025-02-18 | Spatiotemporal Multi-Camera Calibration using Freely Moving People | Sang-Eun Lee et.al. | 2502.12546v1 | null |
2025-02-18 | Learning Transformation-Isomorphic Latent Space for Accurate Hand Pose Estimation | Kaiwen Ren et.al. | 2502.12535v1 | null |
2025-02-19 | FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views | Shangzhan Zhang et.al. | 2502.12138v2 | null |
2025-02-17 | Enhancing Transparent Object Pose Estimation: A Fusion of GDR-Net and Edge Detection | Tessa Pulli et.al. | 2502.12027v1 | null |
2025-02-17 | SurgPose: a Dataset for Articulated Robotic Surgical Tool Pose Estimation and Tracking | Zijian Wu et.al. | 2502.11534v1 | null |
2025-02-18 | VarGes: Improving Variation in Co-Speech 3D Gesture Generation via StyleCLIPS | Ming Meng et.al. | 2502.10729v2 | link |
2025-02-15 | Semantics-aware Test-time Adaptation for 3D Human Pose Estimation | Qiuxia Lin et.al. | 2502.10724v1 | null |
2025-02-15 | Learning semantical dynamics and spatiotemporal collaboration for human pose estimation in video | Runyang Feng et.al. | 2502.10616v1 | null |
2025-02-14 | HIPPo: Harnessing Image-to-3D Priors for Model-free Zero-shot 6D Pose Estimation | Yibo Liu et.al. | 2502.10606v1 | null |
2025-02-14 | Manual2Skill: Learning to Read Manuals and Acquire Robotic Skills for Furniture Assembly Using Vision-Language Models | Chenrui Tie et.al. | 2502.10090v1 | link |
2025-02-13 | Metamorphic Testing for Pose Estimation Systems | Matias Duran et.al. | 2502.09460v1 | null |
2025-02-13 | BevSplat: Resolving Height Ambiguity via Feature-Based Gaussian Primitives for Weakly-Supervised Cross-View Localization | Qiwei Wang et.al. | 2502.09080v1 | null |
2025-02-14 | Siren Song: Manipulating Pose Estimation in XR Headsets Using Acoustic Attacks | Zijian Huang et.al. | 2502.08865v2 | null |
2025-02-12 | LIR-LIVO: A Lightweight,Robust LiDAR/Vision/Inertial Odometry with Illumination-Resilient Deep Features | Shujie Zhou et.al. | 2502.08676v1 | link |
2025-02-12 | CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World | Yankai Fu et.al. | 2502.08449v1 | null |
2025-02-11 | GaRLIO: Gravity enhanced Radar-LiDAR-Inertial Odometry | Chiyun Noh et.al. | 2502.07703v1 | link |
2025-02-11 | Matrix3D: Large Photogrammetry Model All-in-One | Yuanxun Lu et.al. | 2502.07685v1 | null |
2025-02-08 | Vision-in-the-loop Simulation for Deep Monocular Pose Estimation of UAV in Ocean Environment | Maneesha Wickramasuriya et.al. | 2502.05409v1 | null |
2025-02-06 | Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation | Nathan Louis et.al. | 2502.04483v1 | link |
2025-02-06 | GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation | Weihang Li et.al. | 2502.04293v1 | null |
2025-02-06 | Advanced Object Detection and Pose Estimation with Hybrid Task Cascade and High-Resolution Networks | Yuhui Jin et.al. | 2502.03877v1 | null |
2025-02-05 | Mapping and Localization Using LiDAR Fiducial Markers | Yibo Liu et.al. | 2502.03510v1 | null |
2025-02-04 | Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation | Jian Liu et.al. | 2502.02525v1 | link |
2025-02-03 | CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation | Xiao Lin et.al. | 2502.01312v1 | null |
2025-02-03 | Enhancing Feature Tracking Reliability for Visual Navigation using Real-Time Safety Filter | Dabin Kim et.al. | 2502.01092v1 | null |
2025-02-03 | ZeroBP: Learning Position-Aware Correspondence for Zero-shot 6D Pose Estimation in Bin-Picking | Jianqiu Chen et.al. | 2502.01004v1 | null |
2025-01-31 | A Direct Semi-Exhaustive Search Method for Robust, Partial-to-Full Point Cloud Registration | Richard Cheng et.al. | 2502.00115v1 | null |
2025-01-31 | XRF V2: A Dataset for Action Summarization with Wi-Fi Signals, and IMUs in Phones, Watches, Earbuds, and Glasses | Bo Lan et.al. | 2501.19034v1 | link |
2025-01-30 | SimpleDepthPose: Fast and Reliable Human Pose Estimation with RGBD-Images | Daniel Bermuth et.al. | 2501.18478v1 | link |
2025-01-29 | Online Trajectory Replanner for Dynamically Grasping Irregular Objects | Minh Nhat Vu et.al. | 2501.17968v1 | null |
2025-01-28 | DebugAgent: Efficient and Interpretable Error Slice Discovery for Comprehensive Model Debugging | Muxi Chen et.al. | 2501.16751v1 | null |
2025-01-27 | Toward Efficient Generalization in 3D Human Pose Estimation via a Canonical Domain Approach | Hoosang Lee et.al. | 2501.16146v1 | null |
2025-01-27 | NanoHTNet: Nano Human Topology Network for Efficient 3D Human Pose Estimation | Jialun Cai et.al. | 2501.15763v1 | null |
2025-01-25 | Towards Better Robustness: Progressively Joint Pose-3DGS Learning for Arbitrarily Long Videos | Zhen-Hui Dong et.al. | 2501.15096v1 | null |
2025-01-25 | SpatioTemporal Learning for Human Pose Estimation in Sparsely-Labeled Videos | Yingying Jiao et.al. | 2501.15073v1 | null |
2025-01-24 | 3D/2D Registration of Angiograms using Silhouette-based Differentiable Rendering | Taewoong Lee et.al. | 2501.14918v1 | link |
2025-01-24 | Light3R-SfM: Towards Feed-forward Structure-from-Motion | Sven Elflein et.al. | 2501.14914v1 | null |
2025-01-24 | Glissando-Net: Deep sinGLe vIew category level poSe eStimation ANd 3D recOnstruction | Bo Sun et.al. | 2501.14896v1 | null |
2025-01-24 | Optimizing Grasping Precision for Industrial Pick-and-Place Tasks Through a Novel Visual Servoing Approach | Khairidine Benali et.al. | 2501.14557v1 | null |
2025-01-24 | LiDAR-Based Vehicle Detection and Tracking for Autonomous Racing | Marcello Cellina et.al. | 2501.14502v1 | null |
2025-01-24 | Optimizing Human Pose Estimation Through Focused Human and Joint Regions | Yingying Jiao et.al. | 2501.14439v1 | null |
2025-01-24 | Causal-Inspired Multitask Learning for Video-Based Human Pose Estimation | Haipeng Chen et.al. | 2501.14356v1 | null |
2025-01-24 | HAMMER: Heterogeneous, Multi-Robot Semantic Gaussian Splatting | Javier Yu et.al. | 2501.14147v1 | null |
2025-01-23 | Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass | Jianing Yang et.al. | 2501.13928v1 | link |
2025-01-23 | EgoHand: Ego-centric Hand Pose Estimation and Gesture Recognition with Head-mounted Millimeter-wave Radar and IMUs | Yizhe Lv et.al. | 2501.13805v1 | link |
2025-01-23 | VIGS SLAM: IMU-based Large-Scale 3D Gaussian Splatting SLAM | Gyuhyeon Pak et.al. | 2501.13402v1 | null |
2025-01-22 | Deep Learning-Based Image Recovery and Pose Estimation for Resident Space Objects | Louis Aberdeen et.al. | 2501.13009v1 | null |
2025-01-21 | BlanketGen2-Fit3D: Synthetic Blanket Augmentation Towards Improving Real-World In-Bed Blanket Occluded Human Pose Estimation | Tamás Karácsony et.al. | 2501.12318v1 | null |
2025-01-19 | Refinement Module based on Parse Graph of Feature Map for Human Pose Estimation | Shibang Liu et.al. | 2501.11069v1 | null |
2025-01-18 | RoMu4o: A Robotic Manipulation Unit For Orchard Operations Automating Proximal Hyperspectral Leaf Sensing | Mehrad Mortazavi et.al. | 2501.10621v1 | link |
2025-01-17 | landmarker: a Toolkit for Anatomical Landmark Localization in 2D/3D Images | Jef Jonkers et.al. | 2501.10098v1 | link |
2025-01-16 | A New Teacher-Reviewer-Student Framework for Semi-supervised 2D Human Pose Estimation | Wulian Yun et.al. | 2501.09565v1 | null |
2025-01-21 | Towards Robust and Realistic Human Pose Estimation via WiFi Signals | Yang Chen et.al. | 2501.09411v2 | link |
2025-01-16 | RoboReflect: Robotic Reflective Reasoning for Grasping Ambiguous-Condition Objects | Zhen Luo et.al. | 2501.09307v1 | null |
2025-01-16 | BRIGHT-VO: Brightness-Guided Hybrid Transformer for Visual Odometry with Multi-modality Refinement Module | Dongzhihan Wang et.al. | 2501.08659v2 | null |
2025-01-14 | Poseidon: A ViT-based Architecture for Multi-Frame Pose Estimation with Adaptive Frame Weighting and Multi-Scale Feature Fusion | Cesare Davide Pace et.al. | 2501.08446v1 | link |
2025-01-14 | Leveraging 2D Masked Reconstruction for Domain Adaptation of 3D Pose Estimation | Hansoo Park et.al. | 2501.08408v1 | null |
2025-01-14 | Predicting 4D Hand Trajectory from Monocular Videos | Yufei Ye et.al. | 2501.08329v1 | null |
2025-01-14 | A Critical Synthesis of Uncertainty Quantification and Foundation Models in Monocular Depth Estimation | Steven Landgraf et.al. | 2501.08188v1 | null |
2025-01-14 | AgentPose: Progressive Distribution Alignment via Feature Agent for Human Pose Distillation | Feng Zhang et.al. | 2501.08088v1 | null |
2025-01-14 | Robust Low-Light Human Pose Estimation through Illumination-Texture Modulation | Feng Zhang et.al. | 2501.08038v1 | null |
2025-01-14 | BioPose: Biomechanically-accurate 3D Pose Estimation from Monocular Videos | Farnoosh Koleini et.al. | 2501.07800v1 | null |
2025-01-13 | Fixing the Scale and Shift in Monocular Depth For Camera Pose Estimation | Yaqing Ding et.al. | 2501.07742v1 | link |
2025-01-13 | Efficiently Closing Loops in LiDAR-Based SLAM Using Point Cloud Density Maps | Saurabh Gupta et.al. | 2501.07399v1 | null |
2025-01-13 | Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics | Tze Ho Elden Tse et.al. | 2501.07100v1 | null |
2025-01-10 | eKalibr: Dynamic Intrinsic Calibration for Event Cameras From First Principles of Events | Shuolong Chen et.al. | 2501.05688v1 | link |
2025-01-09 | Relative Pose Estimation through Affine Corrections of Monocular Depth Priors | Yifan Yu et.al. | 2501.05446v1 | link |
2025-01-09 | From Simple to Complex Skills: The Case of In-Hand Object Reorientation | Haozhi Qi et.al. | 2501.05439v1 | null |
2025-01-11 | Towards Balanced Continual Multi-Modal Learning in Human Pose Estimation | Jiaxuan Peng et.al. | 2501.05264v2 | null |
2025-01-08 | KN-LIO: Geometric Kinematics and Neural Field Coupled LiDAR-Inertial Odometry | Zhong Wang et.al. | 2501.04263v1 | null |
2025-01-07 | OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints | Mingjie Pan et.al. | 2501.03841v1 | null |
2025-01-10 | MC-VTON: Minimal Control Virtual Try-On Diffusion Transformer | Junsheng Luan et.al. | 2501.03630v2 | null |
2025-01-07 | TexHOI: Reconstructing Textures of 3D Unknown Objects in Monocular Hand-Object Interaction Scenes | Alakh Aggarwal et.al. | 2501.03525v1 | link |
2025-01-06 | Mobile Augmented Reality Framework with Fusional Localization and Pose Estimation | Songlin Hou et.al. | 2501.03336v1 | null |
2025-01-06 | SurgRIPE challenge: Benchmark of Surgical Robot Instrument Pose Estimation | Haozheng Xu et.al. | 2501.02990v1 | null |
2025-01-06 | HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos | Jinglei Zhang et.al. | 2501.02973v1 | null |
2025-01-06 | Spiking monocular event based 6D pose estimation for space application | Jonathan Courtois et.al. | 2501.02916v1 | null |
2025-01-06 | Universal Features Guided Zero-Shot Category-Level Object Pose Estimation | Wentian Qu et.al. | 2501.02831v1 | null |
2025-01-06 | Unsupervised Domain Adaptation for Occlusion Resilient Human Pose Estimation | Arindam Dutta et.al. | 2501.02773v1 | null |
2025-01-06 | WorldPose: A World Cup Dataset for Global 3D Human Pose Estimation | Tianjian Jiang et.al. | 2501.02771v1 | null |
2025-01-05 | LP-ICP: General Localizability-Aware Point Cloud Registration for Robust Localization in Extreme Unstructured Environments | Haosong Yue et.al. | 2501.02580v1 | link |
2025-01-04 | ROLO-SLAM: Rotation-Optimized LiDAR-Only SLAM in Uneven Terrain with Ground Vehicle | Yinchuan Wang et.al. | 2501.02166v1 | link |
2025-01-03 | TCPFormer: Learning Temporal Correlation with Implicit Pose Proxy for 3D Human Pose Estimation | Jiajie Liu et.al. | 2501.01770v1 | link |
2025-01-03 | Laparoscopic Scene Analysis for Intraoperative Visualisation of Gamma Probe Signals in Minimally Invasive Cancer Surgery | Baoru Huang et.al. | 2501.01752v1 | null |
2025-01-03 | Free-Form Motion Control: A Synthetic Video Generation Dataset with Controllable Camera and Object Motions | Xincheng Shuai et.al. | 2501.01425v2 | null |
2025-01-02 | On Unifying Video Generation and Camera Pose Estimation | Chun-Hao Paul Huang et.al. | 2501.01409v1 | null |
2025-01-02 | L3D-Pose: Lifting Pose for 3D Avatars from a Single Camera in the Wild | Soumyaratna Debnath et.al. | 2501.01174v1 | null |
2024-12-31 | Relative Pose Observability Analysis Using Dual Quaternions | Nicholas B. Andrews et.al. | 2501.00657v1 | null |
2024-12-31 | VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception | Zhaoliang Wan et.al. | 2501.00510v1 | null |
2024-12-30 | Hierarchical Pose Estimation and Mapping with Multi-Scale Neural Feature Fields | Evgenii Kruzhkov et.al. | 2412.20976v1 | null |
2024-12-30 | ReFlow6D: Refraction-Guided Transparent Object 6D Pose Estimation via Intermediate Representation Learning | Hrishikesh Gupta et.al. | 2412.20830v1 | link |
2024-12-30 | Frequency-aware Event Cloud Network | Hongwei Ren et.al. | 2412.20803v1 | null |
2024-12-30 | KeyGS: A Keyframe-Centric Gaussian Splatting Method for Monocular Image Sequences | Keng-Wei Chang et.al. | 2412.20767v1 | null |
2024-12-30 | Towards nation-wide analytical healthcare infrastructures: A privacy-preserving augmented knee rehabilitation case study | Boris Bačić et.al. | 2412.20733v1 | link |
2024-12-29 | Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation | Qucheng Peng et.al. | 2412.20538v1 | link |
2024-12-28 | MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing | Shuo Wang et.al. | 2412.20082v1 | null |
2024-12-28 | GSplatLoc: Ultra-Precise Camera Localization via 3D Gaussian Splatting | Atticus J. Zeller et.al. | 2412.20056v1 | link |
2024-12-27 | Optimizing Local-Global Dependencies for Accurate 3D Human Pose Estimation | Guangsheng Xu et.al. | 2412.19676v1 | link |
2024-12-27 | Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images | Xudong Cai et.al. | 2412.19518v1 | null |
2024-12-26 | Humans as a Calibration Pattern: Dynamic 3D Scene Reconstruction from Unsynchronized and Uncalibrated Videos | Changwoon Choi et.al. | 2412.19089v1 | null |
2024-12-23 | Reconstructing People, Places, and Cameras | Lea Müller et.al. | 2412.17806v1 | link |
2024-12-22 | Leveraging Consistent Spatio-Temporal Correspondence for Robust Visual Odometry | Zhaoxing Zhang et.al. | 2412.16923v1 | link |
2024-12-21 | EasyVis2: A Real Time Multi-view 3D Visualization for Laparoscopic Surgery Training Enhanced by a Deep Neural Network YOLOv8-Pose | Yung-Hong Sun et.al. | 2412.16742v1 | null |
2024-12-21 | FACTS: Fine-Grained Action Classification for Tactical Sports | Christopher Lai et.al. | 2412.16454v1 | null |
2024-12-20 | Can Generative Video Models Help Pose Estimation? | Ruojin Cai et.al. | 2412.16155v1 | null |
2024-12-20 | Monkey Transfer Learning Can Improve Human Pose Estimation | Bradley Scott et.al. | 2412.15966v1 | null |
2024-12-19 | Scaling 4D Representations | João Carreira et.al. | 2412.15212v1 | null |
2024-12-13 | IMPROVE: Impact of Mobile Phones on Remote Online Virtual Education | Roberto Daza et.al. | 2412.14195v1 | link |
2024-12-18 | Level-Set Parameters: Novel Representation for 3D Shape Analysis | Huan Lei et.al. | 2412.13502v1 | null |
2024-12-18 | Pre-training a Density-Aware Pose Transformer for Robust LiDAR-based 3D Human Pose Estimation | Xiaoqi An et.al. | 2412.13454v1 | link |
2024-12-17 | CondiMen: Conditional Multi-Person Mesh Recovery | Brégier Romain et.al. | 2412.13058v1 | null |
2024-12-17 | ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries | Wangyu Xue et.al. | 2412.12675v1 | null |
2024-12-16 | Category Level 6D Object Pose Estimation from a Single RGB Image using Diffusion | Adam Bethell et.al. | 2412.11420v1 | null |
2024-12-13 | ExeChecker: Where Did I Go Wrong? | Yiwen Gu et.al. | 2412.10573v1 | link |
2024-12-11 | CUPS: Improving Human Pose-Shape Estimators with Conformalized Deep Uncertainty | Harry Zhang et.al. | 2412.10431v1 | null |
2024-12-13 | RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting | Lizhi Bai et.al. | 2412.09868v1 | null |
2024-12-12 | Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos | Linyi Jin et.al. | 2412.09621v1 | null |
2024-12-12 | FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction | Jiale Xu et.al. | 2412.09573v1 | null |
2024-12-11 | BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation | Shengze Wang et.al. | 2412.08640v1 | null |
2024-12-12 | Drift-free Visual SLAM using Digital Twins | Roxane Merat et.al. | 2412.08496v2 | null |
2024-12-11 | Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization | Siyan Dong et.al. | 2412.08376v1 | link |
2024-12-10 | LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models | Ziqi Lu et.al. | 2412.07746v1 | null |
2024-12-09 | MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds | Zhenggang Tang et.al. | 2412.06974v1 | null |
2024-12-09 | An Efficient Scene Coordinate Encoding and Relocalization Method | Kuan Xu et.al. | 2412.06488v1 | link |
2024-12-09 | Attention-Enhanced Lightweight Hourglass Network for Human Pose Estimation | Marsha Mariya Kappan et.al. | 2412.06227v1 | null |
2024-12-06 | CCS: Continuous Learning for Customized Incremental Wireless Sensing Services | Qunhang Fu et.al. | 2412.04821v1 | null |
2024-12-05 | ProPLIKS: Probablistic 3D human body pose estimation | Karthik Shetty et.al. | 2412.04665v1 | null |
2024-12-05 | DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction | Ben Kaye et.al. | 2412.04464v1 | null |
2024-12-05 | Targeted Hard Sample Synthesis Based on Estimated Pose and Occlusion Error for Improved Object Pose Estimation | Alan Li et.al. | 2412.04279v1 | null |
2024-12-04 | Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis | Qitao Zhao et.al. | 2412.03570v1 | null |
2024-12-06 | NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images | Lingen Li et.al. | 2412.03517v2 | null |
2024-12-05 | A Bidirectional Siamese Recurrent Neural Network for Accurate Gait Recognition Using Body Landmarks | Proma Hossain Progga et.al. | 2412.03498v2 | null |
2024-12-04 | MCVO: A Generic Visual Odometry for Arbitrarily Arranged Multi-Cameras | Huai Yu et.al. | 2412.03146v1 | link |
2024-12-04 | An indoor DSO-based ceiling-vision odometry system for indoor industrial environments | Abdelhak Bougouffa et.al. | 2412.02950v1 | null |
2024-12-03 | EgoCast: Forecasting Egocentric Human Pose in the Wild | Maria Escobar et.al. | 2412.02903v1 | null |
2024-12-02 | emg2pose: A Large and Diverse Benchmark for Surface Electromyographic Hand Pose Estimation | Sasha Salter et.al. | 2412.02725v1 | link |
2024-12-03 | ProbPose: A Probabilistic Approach to 2D Human Pose Estimation | Miroslav Purkrabek et.al. | 2412.02254v1 | link |
2024-12-03 | Cascaded Multi-Scale Attention for Enhanced Multi-Scale Feature Extraction and Interaction with Low-Resolution Images | Xiangyong Lu et.al. | 2412.02197v1 | link |
2024-12-03 | CLERF: Contrastive LEaRning for Full Range Head Pose Estimation | Ting-Ruen Wei et.al. | 2412.02066v1 | null |
2024-12-02 | Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle | Miroslav Purkrabek et.al. | 2412.01562v1 | link |
2024-12-02 | 6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting | Yufeng Jin et.al. | 2412.01543v1 | null |
2024-12-02 | HandOS: 3D Hand Reconstruction in One Stage | Xingyu Chen et.al. | 2412.01537v1 | null |
2024-12-02 | SF-Loc: A Visual Mapping and Geo-Localization System based on Sparse Visual Structure Frames | Yuxuan Zhou et.al. | 2412.01500v1 | link |
2024-12-02 | MamKPD: A Simple Mamba Baseline for Real-Time 2D Keypoint Detection | Yonghao Dang et.al. | 2412.01422v1 | null |
2024-12-02 | Cross-Modal Visual Relocalization in Prior LiDAR Maps Utilizing Intensity Textures | Qiyuan Shen et.al. | 2412.01299v1 | null |
2024-12-02 | CRISP: Object Pose and Shape Estimation with Test-Time Adaptation | Jingnan Shi et.al. | 2412.01052v1 | null |
2024-11-29 | Diorama: Unleashing Zero-shot Single-view 3D Scene Modeling | Qirui Wu et.al. | 2411.19492v1 | null |
2024-11-29 | Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning | Yang You et.al. | 2411.19458v1 | link |
2024-11-28 | GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model | Rui Zhou et.al. | 2411.19289v1 | null |
2024-11-28 | HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos | Prithviraj Banerjee et.al. | 2411.19167v1 | null |
2024-11-28 | Lost & Found: Updating Dynamic 3D Scene Graphs from Egocentric Observations | Tjark Behrens et.al. | 2411.19162v1 | link |
2024-11-28 | Distributed Dual Quaternion Extended Kalman Filtering for Spacecraft Pose Estimation | Mathias Hudoba de Badyn et.al. | 2411.19033v1 | null |
2024-11-28 | Waterfall Transformer for Multi-person Pose Estimation | Navin Ranjan et.al. | 2411.18944v1 | null |
2024-12-02 | AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers | Sherwin Bahmani et.al. | 2411.18673v2 | null |
2024-11-27 | XR-MBT: Multi-modal Full Body Tracking for XR through Self-Supervision with Learned Depth Point Cloud Registration | Denys Rozumnyi et.al. | 2411.18377v1 | null |
2024-11-27 | Manual-PA: Learning 3D Part Assembly from Instruction Diagrams | Jiahao Zhang et.al. | 2411.18011v1 | null |
2024-11-26 | Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Generative Latent Priors | Ziang Xu et.al. | 2411.17790v1 | null |
2024-11-26 | Geometric Point Attention Transformer for 3D Shape Reassembly | Jiahan Li et.al. | 2411.17788v1 | null |
2024-11-26 | RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training | Raktim Gautam Goswami et.al. | 2411.17662v1 | link |
2024-11-26 | Communication-Efficient Cooperative SLAMMOT via Determining the Number of Collaboration Vehicles | Susu Fang et.al. | 2411.17432v1 | null |
2024-11-26 | Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration | Junyuan Deng et.al. | 2411.17240v1 | link |
2024-11-28 | SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting | Gyeongjin Kang et.al. | 2411.17190v3 | null |
2024-11-26 | GMFlow: Global Motion-Guided Recurrent Flow for 6D Object Pose Estimation | Xin Liu et.al. | 2411.17174v1 | null |
2024-11-25 | Diffusion Features for Zero-Shot 6DoF Object Pose Estimation | Bernd Von Gimborn et.al. | 2411.16668v1 | null |
2024-11-25 | Edge Weight Prediction For Category-Agnostic Pose Estimation | Or Hirschorn et.al. | 2411.16665v1 | link |
2024-11-25 | SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis | Hyojun Go et.al. | 2411.16443v1 | link |
2024-11-25 | One Diffusion to Generate Them All | Duong H. Le et.al. | 2411.16318v1 | link |
2024-11-25 | UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image | Xingyu Liu et.al. | 2411.16106v1 | null |
2024-11-24 | Generalizable Single-view Object Pose Estimation by Two-side Generating and Matching | Yujing Sun et.al. | 2411.15860v1 | link |
2024-11-24 | PEnG: Pose-Enhanced Geo-Localisation | Tavis Shore et.al. | 2411.15742v1 | link |
2024-11-22 | Personalization of Wearable Sensor-Based Joint Kinematic Estimation Using Computer Vision for Hip Exoskeleton Applications | Changseob Song et.al. | 2411.15366v1 | null |
2024-11-22 | Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation | Huy Le et.al. | 2411.14913v1 | null |
2024-11-22 | mmWave Radar for Sit-to-Stand Analysis: A Comparative Study with Wearables and Kinect | Shuting Hu et.al. | 2411.14656v1 | null |
2024-11-21 | DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding | Tianhe Ren et.al. | 2411.14347v1 | link |
2024-11-21 | SEMPose: A Single End-to-end Network for Multi-object Pose Estimation | Xin Liu et.al. | 2411.14002v1 | null |
2024-11-21 | Dehazing-aided Multi-Rate Multi-Modal Pose Estimation Framework for Mitigating Visual Disturbances in Extreme Underwater Domain | Vidya Sudevan et.al. | 2411.13988v1 | null |
2024-11-21 | Hybrid-Neuromorphic Approach for Underwater Robotics Applications: A Conceptual Framework | Vidya Sudevan et.al. | 2411.13962v1 | null |
2024-11-20 | Developing Normative Gait Cycle Parameters for Clinical Analysis Using Human Pose Estimation | Rahm Ranjan et.al. | 2411.13716v1 | null |
2024-11-20 | Robust SG-NeRF: Robust Scene Graph Aided Neural Surface Reconstruction | Yi Gu et.al. | 2411.13620v1 | null |
2024-11-19 | VioPose: Violin Performance 4D Pose Estimation by Hierarchical Audiovisual Inference | Seong Jong Yoo et.al. | 2411.13607v1 | link |
2024-11-20 | DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild | Weicai Ye et.al. | 2411.13291v1 | null |
2024-11-20 | X as Supervision: Contending with Depth Ambiguity in Unsupervised Monocular 3D Pose Estimation | Yuchen Yang et.al. | 2411.13026v1 | link |
2024-11-19 | IoT-Based 3D Pose Estimation and Motion Optimization for Athletes: Application of C3D and OpenPose | Fei Ren et.al. | 2411.12676v1 | null |
2024-11-15 | SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction | Yutao Tang et.al. | 2411.12592v1 | link |
2024-11-19 | GLOVER: Generalizable Open-Vocabulary Affordance Reasoning for Task-Oriented Grasping | Teli Ma et.al. | 2411.12286v1 | null |
2024-11-18 | IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos | Yunong Liu et.al. | 2411.11409v1 | link |
2024-11-15 | USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting | Kang Chen et.al. | 2411.10504v1 | link |
2024-11-13 | ReMP: Reusable Motion Prior for Multi-domain 3D Human Pose Estimation and Motion Inbetweening | Hojun Jang et.al. | 2411.09435v1 | null |
2024-11-13 | Generalized Pose Space Embeddings for Training In-the-Wild using Anaylis-by-Synthesis | Dominik Borer et.al. | 2411.08603v1 | null |
2024-11-13 | DG-SLAM: Robust Dynamic Gaussian Splatting SLAM with Hybrid Pose Optimization | Yueming Xu et.al. | 2411.08373v1 | null |
2024-11-16 | RINO: Accurate, Robust Radar-Inertial Odometry with Non-Iterative Estimation | Shuocheng Yang et.al. | 2411.07699v2 | link |
2024-11-12 | Human Arm Pose Estimation with a Shoulder-worn Force-Myography Device for Human-Robot Interaction | Rotem Atari et.al. | 2411.07644v1 | null |
2024-11-12 | Towards Seamless Integration of Magnetic Tracking into Fluoroscopy-guided Interventions | Shuwei Xing et.al. | 2411.07495v1 | null |
2024-11-08 | Acoustic-based 3D Human Pose Estimation Robust to Human Position | Yusuke Oumi et.al. | 2411.07165v1 | null |
2024-11-11 | CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models | Junho Kim et.al. | 2411.06869v1 | null |
2024-11-11 | GenZ-ICP: Generalizable and Degeneracy-Robust LiDAR Odometry Using an Adaptive Weighting | Daehan Lee et.al. | 2411.06766v1 | link |
2024-11-11 | GTA-Net: An IoT-Integrated 3D Human Pose Estimation System for Real-Time Adolescent Sports Posture Correction | Shizhe Yuan et.al. | 2411.06725v1 | null |
2024-11-10 | Magnetic Field Aided Vehicle Localization with Acceleration Correction | Mrunmayee Deshpande et.al. | 2411.06543v1 | null |
2024-11-10 | Visuotactile-Based Learning for Insertion with Compliant Hands | Osher Azulay et.al. | 2411.06408v1 | link |
2024-11-08 | Poze: Sports Technique Feedback under Data Constraints | Agamdeep Singh et.al. | 2411.05734v1 | null |
2024-11-08 | DeepArUco++: Improved detection of square fiducial markers in challenging lighting conditions | Rafael Berral-Soler et.al. | 2411.05552v1 | link |
2024-11-08 | Tightly-Coupled, Speed-aided Monocular Visual-Inertial Localization in Topological Map | Chanuk Yang et.al. | 2411.05497v1 | null |
2024-11-08 | Relative Pose Estimation for Nonholonomic Robot Formation with UWB-IO Measurements | Kunrui Ze et.al. | 2411.05481v1 | null |
2024-11-07 | Social EgoMesh Estimation | Luca Scofano et.al. | 2411.04598v1 | link |
2024-11-07 | Pose2Trajectory: Using Transformers on Body Pose to Predict Tennis Player's Trajectory | Ali K. AlShami et.al. | 2411.04501v1 | null |
2024-11-08 | SuperQ-GRASP: Superquadrics-based Grasp Pose Estimation on Larger Objects for Mobile-Manipulation | Xun Tu et.al. | 2411.04386v2 | null |
2024-11-08 | GS2Pose: Two-stage 6D Object Pose Estimation Guided by Gaussian Splatting | Jilan Mei et.al. | 2411.03807v3 | null |
2024-11-06 | Estimation of Psychosocial Work Environment Exposures Through Video Object Detection. Proof of Concept Using CCTV Footage | Claus D. Hansen et.al. | 2411.03724v1 | null |
2024-11-05 | Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data | Seunggeun Chi et.al. | 2411.03561v1 | null |
2024-11-05 | HFGaussian: Learning Generalizable Gaussian Human with Integrated Human Features | Arnab Dey et.al. | 2411.03086v1 | null |
2024-11-04 | Semantic Masking and Visual Feature Matching for Robust Localization | Luisa Mao et.al. | 2411.01804v1 | null |
2024-11-03 | Activating Self-Attention for Multi-Scene Absolute Pose Regression | Miso Lee et.al. | 2411.01443v1 | link |
2024-11-04 | 3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction | Jongmin Lee et.al. | 2411.00543v2 | null |
2024-10-31 | Whole-Herd Elephant Pose Estimation from Drone Data for Collective Behavior Analysis | Brody McNutt et.al. | 2411.00196v1 | null |
2024-10-31 | No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images | Botao Ye et.al. | 2410.24207v1 | link |
2024-11-06 | SceneComplete: Open-World 3D Scene Completion in Complex Real World Environments for Robot Manipulation | Aditya Agarwal et.al. | 2410.23643v2 | null |
2024-10-30 | SCRREAM : SCan, Register, REnder And Map:A Framework for Annotating Accurate and Dense 3D Indoor Scenes with a Benchmark | HyunJun Jung et.al. | 2410.22715v1 | link |
2024-10-29 | LiVisSfM: Accurate and Robust Structure-from-Motion with LiDAR and Visual Cues | Hanqing Jiang et.al. | 2410.22213v1 | null |
2024-10-29 | PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting | Sunghwan Hong et.al. | 2410.22128v1 | link |
2024-10-29 | HRPVT: High-Resolution Pyramid Vision Transformer for medium and small-scale human pose estimation | Zhoujie Xu et.al. | 2410.22079v1 | null |
2024-10-29 | EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data | Zhonghua Yi et.al. | 2410.21743v1 | link |
2024-10-28 | Synthetica: Large Scale Synthetic Data for Robot Perception | Ritvik Singh et.al. | 2410.21153v1 | null |
2024-10-29 | BLAPose: Enhancing 3D Human Pose Estimation with Bone Length Adjustment | Chih-Hsiang Hsu et.al. | 2410.20731v2 | link |
2024-11-01 | RopeTP: Global Human Motion Recovery via Integrating Robust Pose Estimation with Diffusion Trajectory Prior | Mingjiang Liang et.al. | 2410.20358v2 | null |
2024-10-27 | Harmony4D: A Video Dataset for In-The-Wild Close Human Interactions | Rawal Khirodkar et.al. | 2410.20294v1 | null |
2024-10-26 | Neural Fields in Robotics: A Survey | Muhammad Zubair Irshad et.al. | 2410.20220v1 | link |
2024-10-25 | DECADE: Towards Designing Efficient-yet-Accurate Distance Estimation Modules for Collision Avoidance in Mobile Advanced Driver Assistance Systems | Muhammad Zaeem Shahzad et.al. | 2410.19336v1 | null |
2024-10-24 | Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction | Junyi Chen et.al. | 2410.18962v1 | null |
2024-10-24 | VoxelKeypointFusion: Generalizable Multi-View Multi-Person Pose Estimation | Daniel Bermuth et.al. | 2410.18723v1 | link |
2024-10-23 | Robust Two-View Geometry Estimation with Implicit Differentiation | Vladislav Pyatov et.al. | 2410.17983v1 | link |
2024-10-23 | YOLOv11: An Overview of the Key Architectural Enhancements | Rahima Khanam et.al. | 2410.17725v1 | link |
2024-10-21 | Assisted Physical Interaction: Autonomous Aerial Robots with Neural Network Detection, Navigation, and Safety Layers | Andrea Berra et.al. | 2410.15802v1 | null |
2024-10-21 | ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from Videos | Tao Tang et.al. | 2410.15582v1 | link |
2024-10-20 | Neural Active Structure-from-Motion in Dark and Textureless Environment | Kazuto Ichimaru et.al. | 2410.15378v1 | null |
2024-10-20 | POSE: Pose estimation Of virtual Sync Exhibit system | Hao-Tang Tsui et.al. | 2410.15343v1 | link |
2024-10-18 | Graph Optimality-Aware Stochastic LiDAR Bundle Adjustment with Progressive Spatial Smoothing | Jianping Li et.al. | 2410.14565v1 | null |
2024-10-18 | Multi-modal Pose Diffuser: A Multimodal Generative Conditional Pose Prior | Calvin-Khang Ta et.al. | 2410.14540v1 | null |
2024-10-18 | Sim2real Cattle Joint Estimation in 3D point clouds | Okour Mohammad et.al. | 2410.14419v1 | null |
2024-10-18 | Unlabeled Action Quality Assessment Based on Multi-dimensional Adaptive Constrained Dynamic Time Warping | Renguang Chen et.al. | 2410.14161v1 | null |
2024-10-15 | From Real Artifacts to Virtual Reference: A Robust Framework for Translating Endoscopic Images | unyang Wu et.al. | 2410.13896v1 | null |
2024-10-17 | DualQuat-LOAM: LiDAR Odometry and Mapping parametrized on Dual Quaternions | Edison P. Velasco-Sánchez et.al. | 2410.13541v1 | null |
2024-10-17 | Object Pose Estimation Using Implicit Representation For Transparent Objects | Varun Burde et.al. | 2410.13465v1 | null |
2024-10-16 | Optimizing Multi-Task Learning for Accurate Spacecraft Pose Estimation | Francesco Evangelisti et.al. | 2410.12679v1 | null |
2024-10-15 | Contrastive Touch-to-Touch Pretraining | Samanta Rodriguez et.al. | 2410.11834v1 | null |
2024-10-18 | X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing | Xinyan Chen et.al. | 2410.10167v2 | null |
2024-10-13 | Occluded Human Pose Estimation based on Limb Joint Augmentation | Gangtao Han et.al. | 2410.09885v1 | null |
2024-10-12 | Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors | Hritam Basak et.al. | 2410.09467v1 | null |
2024-10-12 | Towards Multi-Modal Animal Pose Estimation: An In-Depth Analysis | Qianyi Deng et.al. | 2410.09312v1 | link |
2024-10-11 | CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation | Jianyu Zhao et.al. | 2410.09010v1 | link |
2024-10-11 | Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization | Christian Schmidt et.al. | 2410.08743v1 | link |
2024-10-10 | Generalizing Stochastic Smoothing for Differentiation and Gradient Estimation | Felix Petersen et.al. | 2410.08125v1 | null |
2024-10-10 | Robotic framework for autonomous manipulation of laboratory equipment with different degrees of transparency via 6D pose estimation | Maria Makarova et.al. | 2410.07801v1 | null |
2024-10-10 | Optimal-State Dynamics Estimation for Physics-based Human Motion Capture from Videos | Cuong Le et.al. | 2410.07795v1 | link |
2024-10-12 | Autonomous Driving in Unstructured Environments: How Far Have We Come? | Chen Min et.al. | 2410.07701v2 | link |
2024-10-10 | Invisibility Cloak: Disappearance under Human Pose Estimation via Backdoor Attacks | Minxing Zhang et.al. | 2410.07670v1 | null |
2024-10-09 | OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB | Yunzhi Lin et.al. | 2410.06694v1 | null |
2024-10-08 | SpecTrack: Learned Multi-Rotation Tracking via Speckle Imaging | Ziyang Chen et.al. | 2410.06028v1 | link |
2024-10-08 | AIVIO: Closed-loop, Object-relative Navigation of UAVs with AI-aided Visual Inertial Odometry | Thomas Jantos et.al. | 2410.05996v1 | null |
2024-10-08 | Are Minimal Radial Distortion Solvers Necessary for Relative Pose Estimation? | Charalambos Tzamos et.al. | 2410.05984v1 | link |
2024-10-08 | FürElise: Capturing and Physically Synthesizing Hand Motions of Piano Performance | Ruocheng Wang et.al. | 2410.05791v1 | null |
2024-10-07 | Comparison of marker-less 2D image-based methods for infant pose estimation | Lennart Jahn et.al. | 2410.04980v1 | null |
2024-10-06 | Enhancing 3D Human Pose Estimation Amidst Severe Occlusion with Dual Transformer Fusion | Mehwish Ghafoor et.al. | 2410.04574v1 | link |
2024-10-06 | LiteVLoc: Map-Lite Visual Localization for Image Goal Navigation | Jianhao Jiao et.al. | 2410.04419v1 | null |
2024-10-05 | Test-Time Adaptation for Keypoint-Based Spacecraft Pose Estimation Based on Predicted-View Synthesis | Juan Ignacio Bravo Pérez-Villar et.al. | 2410.04298v1 | link |
2024-10-05 | A Framework for Reproducible Benchmarking and Performance Diagnosis of SLAM Systems | Nikola Radulov et.al. | 2410.04242v1 | link |
2024-10-04 | Unsupervised Prior Learning: Discovering Categorical Pose Priors from Videos | Ziyu Wang et.al. | 2410.03858v1 | null |
2024-10-04 | Universal Global State Estimation for Inertial Navigation Systems | Sifeddine Benahmed et.al. | 2410.03846v1 | null |
2024-10-04 | MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion | Junyi Zhang et.al. | 2410.03825v1 | null |
2024-10-04 | Dessie: Disentanglement for Articulated 3D Horse Shape and Pose Estimation from Images | Ci Li et.al. | 2410.03438v1 | null |
2024-10-04 | HRVMamba: High-Resolution Visual State Space Model for Dense Prediction | Hao Zhang et.al. | 2410.03174v1 | null |
2024-10-04 | CLIP-Clique: Graph-based Correspondence Matching Augmented by Vision Language Models for Object-based Global Localization | Shigemichi Matsuzaki et.al. | 2410.03054v1 | null |
2024-10-03 | Why Sample Space Matters: Keyframe Sampling Optimization for LiDAR-based Place Recognition | Nikolaos Stathoulopoulos et.al. | 2410.02643v1 | link |
2024-10-03 | Key-Grid: Unsupervised 3D Keypoints Detection using Grid Heatmap Features | Chengkai Hou et.al. | 2410.02237v1 | null |
2024-10-02 | SGBA: Semantic Gaussian Mixture Model-Based LiDAR Bundle Adjustment | Xingyu Ji et.al. | 2410.01618v1 | null |
2024-10-02 | SurgeoNet: Realtime 3D Pose Estimation of Articulated Surgical Instruments from Stereo Images using a Synthetically-trained Network | Ahmed Tawfik Aboukhadra et.al. | 2410.01293v1 | null |
2024-10-01 | Pose Estimation of Buried Deep-Sea Objects using 3D Vision Deep Learning Models | Jerry Yan et.al. | 2410.01061v1 | null |
2024-10-01 | RAD: A Dataset and Benchmark for Real-Life Anomaly Detection with Robotic Observations | Kaichen Zhou et.al. | 2410.00713v1 | link |
2024-10-01 | GERA: Geometric Embedding for Efficient Point Registration Analysis | Geng Li et.al. | 2410.00589v1 | null |
2024-09-30 | Continual Human Pose Estimation for Incremental Integration of Keypoints and Pose Variations | Muhammad Saif Ullah Khan et.al. | 2409.20469v1 | null |
2024-09-30 | Classroom-Inspired Multi-Mentor Distillation with Adaptive Learning Strategies | Shalini Sarode et.al. | 2409.20237v1 | null |
2024-09-30 | PuzzleBoard: A New Camera Calibration Pattern with Position Encoding | Peer Stelldinger et.al. | 2409.20127v1 | link |
2024-09-30 | Robust Gaussian Splatting SLAM by Leveraging Loop Closure | Zunjie Zhu et.al. | 2409.20111v1 | null |
2024-09-30 | GearTrack: Automating 6D Pose Estimation | Yu Deng et.al. | 2409.19986v1 | null |
2024-09-29 | PPLNs: Parametric Piecewise Linear Networks for Event-Based Temporal Modeling and Beyond | Chen Song et.al. | 2409.19772v1 | link |
2024-09-29 | GelSlim 4.0: Focusing on Touch and Reproducibility | Andrea Sipos et.al. | 2409.19770v1 | null |
2024-09-27 | Robust Proximity Operations using Probabilistic Markov Models | Deep Parikh et.al. | 2409.19062v1 | null |
2024-09-27 | Exploiting Motion Prior for Accurate Pose Estimation of Dashboard Cameras | Yipeng Lu et.al. | 2409.18673v1 | null |
2024-09-27 | DynaWeightPnP: Toward global real-time 3D-2D solver in PnP without correspondences | Jingwei Song et.al. | 2409.18457v1 | null |
2024-09-30 | Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation | Mengchen Zhang et.al. | 2409.18261v2 | link |
2024-09-26 | AI-Powered Augmented Reality for Satellite Assembly, Integration and Test | Alvaro Patricio et.al. | 2409.18101v1 | null |
2024-09-27 | Leveraging Anthropometric Measurements to Improve Human Mesh Estimation and Ensure Consistent Body Shapes | Katja Ludwig et.al. | 2409.17671v2 | null |
2024-09-25 | Safe Leaf Manipulation for Accurate Shape and Pose Estimation of Occluded Fruits | Shaoxiong Yao et.al. | 2409.17389v1 | null |
2024-09-25 | Hierarchical Tri-manual Planning for Vision-assisted Fruit Harvesting with Quadrupedal Robots | Zhichao Liu et.al. | 2409.17116v1 | null |
2024-09-25 | Self-Sensing for Proprioception and Contact Detection in Soft Robots Using Shape Memory Alloy Artificial Muscles | Ran Jing et.al. | 2409.17111v1 | null |
2024-09-25 | Online 6DoF Pose Estimation in Forests using Cross-View Factor Graph Optimisation and Deep Learned Re-localisation | Lucas Carvalho de Lima et.al. | 2409.16680v1 | null |
2024-09-25 | FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation | Jingyi Tang et.al. | 2409.16600v1 | null |
2024-09-25 | Robo-Platform: A Robotic System for Recording Sensors and Controlling Robots | Masoud Dayani Najafabadi et.al. | 2409.16595v1 | link |
2024-09-24 | PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings | Sutharsan Mahendren et.al. | 2409.15832v1 | null |
2024-09-24 | LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation | Ruida Zhang et.al. | 2409.15727v1 | link |
2024-09-23 | Framework for Robust Localization of UUVs and Mapping of Net Pens | David Botta et.al. | 2409.15475v1 | null |
2024-09-23 | FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye Camera | Guoyang Zhao et.al. | 2409.15054v1 | link |
2024-09-23 | BranchPoseNet: Characterizing tree branching with a deep learning-based pose estimation approach | Stefano Puliti et.al. | 2409.14755v1 | link |
2024-09-23 | ERPoT: Effective and Reliable Pose Tracking for Mobile Robots Based on Lightweight and Compact Polygon Maps | Haiming Gao et.al. | 2409.14723v1 | link |
2024-09-22 | Tactile Functasets: Neural Implicit Representations of Tactile Datasets | Sikai Li et.al. | 2409.14592v1 | null |
2024-09-22 | AR Overlay: Training Image Pose Estimation on Curved Surface in a Synthetic Way | Sining Huang et.al. | 2409.14577v1 | null |
2024-09-22 | DROP: Dexterous Reorientation via Online Planning | Albert H. Li et.al. | 2409.14562v1 | null |
2024-09-21 | Combining Absolute and Semi-Generalized Relative Poses for Visual Localization | Vojtech Panek et.al. | 2409.14269v1 | null |
2024-09-18 | SpotLight: Robotic Scene Understanding through Interaction and Affordance Detection | Tim Engelbracht et.al. | 2409.11870v1 | link |
2024-09-18 | End-to-End Probabilistic Geometry-Guided Regression for 6DoF Object Pose Estimation | Thomas Pöllabauer et.al. | 2409.11819v1 | null |
2024-09-18 | Bridging Domain Gap for Flight-Ready Spaceborne Vision | Tae Ha Park et.al. | 2409.11661v1 | null |
2024-09-17 | Good Grasps Only: A data engine for self-supervised fine-tuning of pose estimation using grasp poses for verification | Frederik Hagelskjær et.al. | 2409.11512v1 | null |
2024-09-17 | Training Datasets Generation for Machine Learning: Application to Vision Based Navigation | Jérémy Lebreton et.al. | 2409.11383v1 | null |
2024-09-17 | OmniGen: Unified Image Generation | Shitao Xiao et.al. | 2409.11340v1 | link |
2024-09-17 | ULOC: Learning to Localize in Complex Large-Scale Environments with Ultra-Wideband Ranges | Thien-Minh Nguyen et.al. | 2409.11122v1 | link |
2024-09-17 | Depth-based Privileged Information for Boosting 3D Human Pose Estimation on RGB | Alessandro Simoni et.al. | 2409.11104v1 | null |
2024-09-21 | HGSLoc: 3DGS-based Heuristic Camera Pose Refinement | Zhongyan Niu et.al. | 2409.10925v2 | null |
2024-09-17 | Pose estimation of CubeSats via sensor fusion and Error-State Extended Kalman Filter | Deep Parikh et.al. | 2409.10815v1 | null |
2024-09-16 | CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera | Jingpei Lu et.al. | 2409.10441v1 | null |
2024-09-16 | HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models | Vineet Bhat et.al. | 2409.10419v1 | link |
2024-09-16 | 2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation? | Téo Guichoux et.al. | 2409.10357v1 | null |
2024-09-16 | Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference | Huy-Dung Nguyen et.al. | 2409.10095v1 | null |
2024-09-15 | Precise Pick-and-Place using Score-Based Diffusion Networks | Shih-Wei Guo et.al. | 2409.09725v1 | null |
2024-09-15 | Pre-Training for 3D Hand Pose Estimation with Contrastive Learning on Large-Scale Hand Images in the Wild | Nie Lin et.al. | 2409.09714v1 | null |
2024-09-15 | Proximity operations of CubeSats via sensor fusion of ultra-wideband range measurements with rate gyroscopes, accelerometers and monocular vision | Deep Parikh et.al. | 2409.09665v1 | null |
2024-09-15 | A Scalable Tabletop Satellite Automation Testbed:Design And Experiments | Deep Parikh et.al. | 2409.09633v1 | null |
2024-09-14 | MAC-VO: Metrics-aware Covariance for Learning-based Stereo Visual Odometry | Yuheng Qiu et.al. | 2409.09479v1 | null |
2024-09-14 | Distributed Invariant Kalman Filter for Object-level Multi-robot Pose SLAM | Haoying Li et.al. | 2409.09410v1 | null |
2024-09-13 | Causal Transformer for Fusion and Pose Estimation in Deep Visual Inertial Odometry | Yunus Bilge Kurt et.al. | 2409.08769v1 | link |
2024-09-13 | WheelPoser: Sparse-IMU Based Body Pose Estimation for Wheelchair Users | Yunzhi Li et.al. | 2409.08494v1 | link |
2024-09-12 | Bayesian Inverse Graphics for Few-Shot Concept Learning | Octavio Arriaga et.al. | 2409.08351v1 | link |
2024-09-12 | Touch2Touch: Cross-Modal Tactile Generation for Object Manipulation | Samanta Rodriguez et.al. | 2409.08269v1 | null |
2024-09-12 | Covariance Intersection-based Invariant Kalman Filtering(DInCIKF) for Distributed Pose Estimation | Haoying Li et.al. | 2409.07933v1 | null |
2024-09-12 | GateAttentionPose: Enhancing Pose Estimation with Agent Attention and Improved Gated Convolutions | Liang Feng et.al. | 2409.07798v1 | null |
2024-09-12 | GatedUniPose: A Novel Approach for Pose Estimation Combining UniRepLKNet and Gated Convolution | Liang Feng et.al. | 2409.07752v1 | null |
2024-09-11 | FaVoR: Features via Voxel Rendering for Camera Relocalization | Vincenzo Polizzi et.al. | 2409.07571v1 | link |
2024-09-11 | Benchmarking 2D Egocentric Hand Pose Datasets | Olga Taran et.al. | 2409.07337v1 | null |
2024-09-11 | iKalibr-RGBD: Partially-Specialized Target-Free Visual-Inertial Spatiotemporal Calibration For RGBDs via Continuous-Time Velocity Estimation | Shuolong Chen et.al. | 2409.07116v1 | link |
2024-09-11 | Equivariant Filter for Tightly Coupled LiDAR-Inertial Odometry | Anbo Tao et.al. | 2409.06948v1 | null |
2024-09-13 | A Bayesian framework for active object recognition, pose estimation and shape transfer learning through touch | Haodong Zheng et.al. | 2409.06912v2 | null |
2024-09-11 | Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences | Shishir Reddy Vutukur et.al. | 2409.06683v2 | link |
2024-09-10 | PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation | Ginger Delmas et.al. | 2409.06535v1 | null |
2024-09-10 | Test-Time Certifiable Self-Supervision to Bridge the Sim2Real Gap in Event-Based Satellite Pose Estimation | Mohsi Jawaid et.al. | 2409.06240v1 | null |
2024-09-09 | From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models | Tessa Pulli et.al. | 2409.05413v1 | null |
2024-09-08 | HelmetPoser: A Helmet-Mounted IMU Dataset for Data-Driven Estimation of Human Head Motion in Diverse Conditions | Jianping Li et.al. | 2409.05006v1 | null |
2024-09-06 | Casper DPM: Cascaded Perceptual Dynamic Projection Mapping onto Hands | Yotam Erel et.al. | 2409.04397v1 | null |
2024-09-06 | GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers | Lorenza Prospero et.al. | 2409.04196v1 | link |
2024-09-06 | Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics | Woojin Cho et.al. | 2409.04033v1 | null |
2024-09-06 | Matched Filtering based LiDAR Place Recognition for Urban and Natural Environments | Therese Joseph et.al. | 2409.03998v1 | null |
2024-09-09 | The Influence of Faulty Labels in Data Sets on Human Pose Estimation | Arnold Schwarz et.al. | 2409.03887v2 | null |
2024-09-05 | MaskVal: Simple but Effective Uncertainty Quantification for 6D Pose Estimation | Philipp Quentin et.al. | 2409.03556v1 | null |
2024-09-05 | UAV (Unmanned Aerial Vehicles): Diverse Applications of UAV Datasets in Segmentation, Classification, Detection, and Tracking | Md. Mahfuzur Rahman et.al. | 2409.03245v1 | null |
2024-09-01 | Recoverable Anonymization for Pose Estimation: A Privacy-Enhancing Approach | Wenjun Huang et.al. | 2409.02715v1 | null |
2024-09-04 | Object Gaussian for Monocular 6D Pose Estimation from Sparse Views | Luqing Luo et.al. | 2409.02581v1 | null |
2024-09-03 | EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision | Yiming Zhao et.al. | 2409.02224v1 | null |
2024-09-03 | Deep learning for objective estimation of Parkinsonian tremor severity | Felipe Duque-Quiceno et.al. | 2409.02011v1 | null |
2024-09-03 | SPiKE: 3D Human Pose from Point Cloud Sequences | Irene Ballester et.al. | 2409.01879v1 | link |
2024-09-02 | Kalman Filtering for Precise Indoor Position and Orientation Estimation Using IMU and Acoustics on Riemannian Manifolds | Mohammed H. AlSharif et.al. | 2409.01002v1 | null |
2024-09-01 | Detection, Recognition and Pose Estimation of Tabletop Objects | Sanjuksha Nirgude et.al. | 2409.00869v1 | null |
2024-09-01 | DSLO: Deep Sequence LiDAR Odometry Based on Inconsistent Spatio-temporal Propagation | Huixin Zhang et.al. | 2409.00744v1 | link |
2024-09-01 | MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds | Ziqiang Dang et.al. | 2409.00736v1 | null |
2024-08-31 | ActionPose: Pretraining 3D Human Pose Estimation with the Dark Knowledge of Action | Longyun Liao et.al. | 2409.00449v1 | null |
2024-09-04 | Augmented Reality without Borders: Achieving Precise Localization Without Maps | Albert Gassol Puigjaner et.al. | 2408.17373v3 | null |
2024-08-30 | BOP-D: Revisiting 6D Pose Estimation Benchmark for Better Evaluation under Visual Ambiguities | Boris Meden et.al. | 2408.17297v1 | null |
2024-08-30 | EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs | Zhen Fan et.al. | 2408.17168v1 | null |
2024-09-01 | Generic Objects as Pose Probes for Few-Shot View Synthesis | Zhirui Gao et.al. | 2408.16690v2 | null |
2024-08-29 | OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation | Yuchen Che et.al. | 2408.16547v1 | link |
2024-08-29 | GRPose: Learning Graph Relations for Human Image Generation with Pose Priors | Xiangchen Yin et.al. | 2408.16540v1 | link |
2024-08-28 | Are Pose Estimators Ready for the Open World? STAGE: Synthetic Data Generation Toolkit for Auditing 3D Human Pose Estimators | Nikita Kister et.al. | 2408.16536v1 | null |
2024-08-28 | Multi-view Pose Fusion for Occlusion-Aware 3D Human Pose Estimation | Laura Bragagnolo et.al. | 2408.15810v1 | link |
2024-08-30 | Addressing the challenges of loop detection in agricultural environments | Nicolás Soncini et.al. | 2408.15761v2 | link |
2024-08-28 | Str-L Pose: Integrating Point and Structured Line for Relative Pose Estimation in Dual-Graph | Zherong Zhang et.al. | 2408.15750v1 | null |
2024-08-28 | Benchmarking ML Approaches to UWB-Based Range-Only Posture Recognition for Human Robot-Interaction | Salma Salimi et.al. | 2408.15717v1 | null |
2024-08-26 | Bengali Sign Language Recognition through Hand Pose Estimation using Multi-Branch Spatial-Temporal Attention Model | Abu Saleh Musa Miah et.al. | 2408.14111v1 | null |
2024-08-25 | InterTrack: Tracking Human Object Interaction without Object Templates | Xianghui Xie et.al. | 2408.13953v1 | null |
2024-08-24 | Temporally-consistent 3D Reconstruction of Birds | Johannes Hägerlind et.al. | 2408.13629v1 | null |
2024-08-24 | Explainable Convolutional Networks for Crater Detection and Lunar Landing Navigation | Jianing Song et.al. | 2408.13587v1 | null |
2024-08-27 | Sapiens: Foundation for Human Vision Models | Rawal Khirodkar et.al. | 2408.12569v3 | null |
2024-08-21 | GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting | Wanshui Gan et.al. | 2408.11447v1 | link |
2024-08-20 | GSLoc: Efficient Camera Pose Refinement via 3D Gaussian Splatting | Changkun Liu et.al. | 2408.11085v1 | link |
2024-08-20 | ZebraPose: Zebra Detection and Pose Estimation using only Synthetic Data | Elia Bonetto et.al. | 2408.10831v1 | null |
2024-08-20 | MPL: Lifting 3D Human Pose from Multi-view 2D Poses | Seyed Abolfazl Ghasemzadeh et.al. | 2408.10805v1 | link |
2024-08-19 | RUMI: Rummaging Using Mutual Information | Sheng Zhong et.al. | 2408.10450v1 | null |
2024-08-19 | SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views | Chao Xu et.al. | 2408.10195v1 | null |
2024-08-19 | SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition | Wiktor Mucha et.al. | 2408.10037v1 | link |
2024-08-19 | Pose-GuideNet: Automatic Scanning Guidance for Fetal Head Ultrasound from Pose Estimation | Qianhui Men et.al. | 2408.09931v1 | null |
2024-08-18 | OPPH: A Vision-Based Operator for Measuring Body Movements for Personal Healthcare | Chen Long-fei et.al. | 2408.09409v1 | null |
2024-08-17 | An Open-Source American Sign Language Fingerspell Recognition and Semantic Pose Retrieval Interface | Kevin Jose Thomas et.al. | 2408.09311v1 | link |
2024-08-16 | ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation | Hao Tang et.al. | 2408.09042v1 | null |
2024-08-16 | Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS | Wei Sun et.al. | 2408.08723v1 | null |
2024-08-16 | SketchRef: A Benchmark Dataset and Evaluation Metrics for Automated Sketch Synthesis | Xingyue Lin et.al. | 2408.08623v1 | null |
2024-08-15 | HyperTaxel: Hyper-Resolution for Taxel-Based Tactile Signals Through Contrastive Learning | Hongyu Li et.al. | 2408.08312v1 | null |
2024-08-15 | Comparative Evaluation of 3D Reconstruction Methods for Object Pose Estimation | Varun Burde et.al. | 2408.08234v1 | link |
2024-08-15 | Towards Practical Human Motion Prediction with LiDAR Point Clouds | Xiao Han et.al. | 2408.08202v1 | null |
2024-08-15 | Your Turn: Real-World Turning Angle Estimation for Parkinson's Disease Severity Assessment | Qiushuo Cheng et.al. | 2408.08182v1 | null |
2024-08-15 | Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models | Tianyu Wang et.al. | 2408.07975v1 | null |
2024-08-15 | GOReloc: Graph-based Object-Level Relocalization for Visual SLAM | Yutong Wang et.al. | 2408.07917v1 | link |
2024-08-13 | Grasping by Hanging: a Learning-Free Grasping Detection Method for Previously Unseen Objects | Wanze Li et.al. | 2408.06734v1 | null |
2024-08-13 | A Miniature Vision-Based Localization System for Indoor Blimps | Shicong Ma et.al. | 2408.06648v1 | null |
2024-08-12 | UniT: Unified Tactile Representation for Robot Learning | Zhengtong Xu et.al. | 2408.06481v1 | link |
2024-08-12 | Moo-ving Beyond Tradition: Revolutionizing Cattle Behavioural Phenotyping with Pose Estimation Techniques | Navid Ghassemi et.al. | 2408.06336v1 | null |
2024-08-12 | CAD-Mesher: A Convenient, Accurate, Dense Mesh-based Mapping Module in SLAM for Dynamic Environments | Yanpeng Jia et.al. | 2408.05981v1 | null |
2024-08-12 | PAFormer: Part Aware Transformer for Person Re-identification | Hyeono Jung et.al. | 2408.05918v1 | null |
2024-08-11 | SABER-6D: Shape Representation Based Implicit Object Pose Estimation | Shishir Reddy Vutukur et.al. | 2408.05867v1 | null |
2024-08-10 | Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis | Zhongche Qu et.al. | 2408.05635v1 | null |
2024-08-10 | Anticipation through Head Pose Estimation: a preliminary study | Federico Figari Tomenotti et.al. | 2408.05516v1 | null |
2024-08-09 | Mesh-based Object Tracking for Dynamic Semantic 3D Scene Graphs via Ray Tracing | Lennart Niecksch et.al. | 2408.04979v1 | null |
2024-08-07 | PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model | Yunlong Huang et.al. | 2408.03540v1 | link |
2024-08-06 | Line-based 6-DoF Object Pose Estimation and Tracking With an Event Camera | Zibin Liu et.al. | 2408.03225v1 | link |
2024-08-06 | Training on the Fly: On-device Self-supervised Learning aboard Nano-drones within 20 mW | Elia Cereda et.al. | 2408.03168v1 | null |
2024-08-06 | BodySLAM: A Generalized Monocular Visual SLAM Framework for Surgical Applications | G. Manni et.al. | 2408.03078v1 | link |
2024-08-07 | Pose Magic: Efficient and Temporally Consistent Human Pose Estimation with a Hybrid Mamba-GCN Network | Xinyi Zhang et.al. | 2408.02922v2 | null |
2024-08-05 | Analyzing Data Efficiency and Performance of Machine Learning Algorithms for Assessing Low Back Pain Physical Rehabilitation Exercises | Aleksa Marusic et.al. | 2408.02855v1 | null |
2024-08-05 | Joint-Motion Mutual Learning for Pose Estimation in Videos | Sifan Wu et.al. | 2408.02285v1 | null |
2024-08-04 | AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos | Feichi Lu et.al. | 2408.02110v1 | null |
2024-08-04 | Generalized Maximum Likelihood Estimation for Perspective-n-Point Problem | Tian Zhan et.al. | 2408.01945v1 | null |
2024-08-03 | MotionTrace: IMU-based Field of View Prediction for Smartphone AR Interactions | Rahul Islam et.al. | 2408.01850v1 | null |
2024-08-03 | BEVPlace++: Fast, Robust, and Lightweight LiDAR Global Localization for Unmanned Ground Vehicles | Lun Luo et.al. | 2408.01841v1 | link |
2024-08-03 | E |
Yunshan Qi et.al. | 2408.01840v1 | null |
2024-08-03 | Survey on Emotion Recognition through Posture Detection and the possibility of its application in Virtual Reality | Leina Elansary et.al. | 2408.01728v1 | null |
2024-08-03 | Stimulating Imagination: Towards General-purpose Object Rearrangement | Jianyang Wu et.al. | 2408.01655v1 | null |
2024-08-02 | Full-range Head Pose Geometric Data Augmentations | Huei-Chung Hu et.al. | 2408.01566v1 | null |
2024-07-31 | Adapting Skills to Novel Grasps: A Self-Supervised Approach | Georgios Papagiannis et.al. | 2408.00178v1 | null |
2024-07-31 | Certifying Robustness of Learning-Based Keypoint Detection and Pose Estimation Methods | Xusheng Luo et.al. | 2408.00117v1 | null |
2024-07-30 | StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset | Chaofan Huo et.al. | 2407.20545v1 | link |
2024-07-30 | HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation | Wencan Cheng et.al. | 2407.20542v1 | link |
2024-07-30 | Markers Identification for Relative Pose Estimation of an Uncooperative Target | Batu Candan et.al. | 2407.20515v1 | null |
2024-07-29 | BaseBoostDepth: Exploiting Larger Baselines For Self-supervised Monocular Depth Estimation | Kieran Saunders et.al. | 2407.20437v1 | null |
2024-07-28 | Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph | Zhengcen Li et.al. | 2407.19497v1 | link |
2024-07-26 | Flexible graph convolutional network for 3D human pose estimation | Abu Taib Mohammed Shahjahan et.al. | 2407.19077v1 | link |
2024-07-26 | From 2D to 3D: AISG-SLA Visual Localization Challenge | Jialin Gao et.al. | 2407.18590v1 | null |
2024-07-28 | HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation | Zhenzhi Wang et.al. | 2407.17438v2 | link |
2024-07-24 | Active Loop Closure for OSM-guided Robotic Mapping in Large-Scale Urban Environments | Wei Gao et.al. | 2407.17078v1 | null |
2024-07-30 | DreamCar: Leveraging Car-specific Prior for in-the-wild 3D Car Reconstruction | Xiaobiao Du et.al. | 2407.16988v2 | link |
2024-07-24 | Pose Estimation from Camera Images for Underwater Inspection | Luyuan Peng et.al. | 2407.16961v1 | null |
2024-07-23 | COALA: A Practical and Vision-Centric Federated Learning Platform | Weiming Zhuang et.al. | 2407.16560v1 | link |
2024-07-23 | Probabilistic Parameter Estimators and Calibration Metrics for Pose Estimation from Image Features | Romeo Valentin et.al. | 2407.16223v1 | null |
2024-07-23 | Optimal camera-robot pose estimation in linear time from points and lines | Guangyang Zeng et.al. | 2407.16151v1 | null |
2024-07-23 | 3D-UGCN: A Unified Graph Convolutional Network for Robust 3D Human Pose Estimation from Monocular RGB Images | Jie Zhao et.al. | 2407.16137v1 | null |
2024-07-21 | CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models | Zheng Chong et.al. | 2407.15886v1 | link |
2024-07-22 | RADA: Robust and Accurate Feature Learning with Domain Adaptation | Jingtai He et.al. | 2407.15791v1 | null |
2024-07-22 | Local Occupancy-Enhanced Object Grasping with Multiple Triplanar Projection | Kangqi Ma et.al. | 2407.15771v1 | null |
2024-07-22 | 6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model | Matteo Bortolon et.al. | 2407.15484v1 | null |
2024-07-23 | Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions | Yihao Ai et.al. | 2407.15451v2 | link |
2024-07-22 | avaTTAR: Table Tennis Stroke Training with On-body and Detached Visualization in Augmented Reality | Dizhi Ma et.al. | 2407.15373v1 | null |
2024-07-20 | From Underground Mines to Offices: A Versatile and Robust Framework for Range-Inertial SLAM | Lorenzo Montano-Oliván et.al. | 2407.14797v1 | null |
2024-07-19 | ESCAPE: Energy-based Selective Adaptive Correction for Out-of-distribution 3D Human Pose Estimation | Luke Bidulka et.al. | 2407.14605v1 | null |
2024-07-19 | 6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry | Sungho Chun et.al. | 2407.14136v1 | link |
2024-07-18 | RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark | Yuan-Hao Ho et.al. | 2407.13930v1 | null |
2024-07-19 | GlobalPointer: Large-Scale Plane Adjustment with Bi-Convex Relaxation | Bangyan Liao et.al. | 2407.13537v2 | link |
2024-07-18 | SCAPE: A Simple and Strong Category-Agnostic Pose Estimator | Yujia Liang et.al. | 2407.13483v1 | link |
2024-07-17 | SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization | Yiyang Chen et.al. | 2407.12667v1 | link |
2024-07-17 | Invertible Neural Warp for NeRF | Shin-Fang Chng et.al. | 2407.12354v1 | null |
2024-07-16 | NeuSurfEmb: A Complete Pipeline for Dense Correspondence-based 6D Object Pose Estimation without CAD Models | Francesco Milano et.al. | 2407.12207v1 | link |
2024-07-16 | Monocular pose estimation of articulated surgical instruments in open surgery | Robert Spektor et.al. | 2407.12138v1 | null |
2024-07-17 | GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection | Jingwen Yu et.al. | 2407.11736v2 | link |
2024-07-16 | TCFormer: Visual Recognition via Token Clustering Transformer | Wang Zeng et.al. | 2407.11321v1 | link |
2024-07-15 | A BlueROV2-based platform for underwater mapping experiments | Tudor Alinei-Poiana et.al. | 2407.10901v1 | link |
2024-07-15 | LVCP: LiDAR-Vision Tightly Coupled Collaborative Real-time Relative Positioning | Zhuozhu Jian et.al. | 2407.10782v1 | null |
2024-07-15 | Domain Generalization for 6D Pose Estimation Through NeRF-based Image Synthesis | Antoine Legrand et.al. | 2407.10762v1 | null |
2024-07-16 | GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation | Haonan Wang et.al. | 2407.10756v2 | null |
2024-07-15 | Learning to Estimate the Pose of a Peer Robot in a Camera Image by Predicting the States of its LEDs | Nicholas Carlotti et.al. | 2407.10661v1 | null |
2024-07-15 | Deep-Learning-Based Markerless Pose Estimation Systems in Gait Analysis: DeepLabCut Custom Training and the Refinement Function | Giulia Panconi et.al. | 2407.10590v1 | null |
2024-07-14 | 3D Foundation Models Enable Simultaneous Geometry and Pose Estimation of Grasped Objects | Weiming Zhi et.al. | 2407.10331v1 | null |
2024-07-16 | psifx -- Psychological and Social Interactions Feature Extraction Package | Guillaume Rochette et.al. | 2407.10266v2 | null |
2024-07-14 | PAFUSE: Part-based Diffusion for 3D Whole-Body Pose Estimation | Nermin Samet et.al. | 2407.10220v1 | link |
2024-07-14 | 3DEgo: 3D Editing on the Go! | Umar Khalid et.al. | 2407.10102v1 | null |
2024-07-12 | iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning | Tom Fischer et.al. | 2407.09271v1 | link |
2024-07-12 | HUP-3D: A 3D multi-view synthetic dataset for assisted-egocentric hand-ultrasound pose estimation | Manuel Birlo et.al. | 2407.09215v1 | null |
2024-07-12 | KGpose: Keypoint-Graph Driven End-to-End Multi-Object 6D Pose Estimation via Point-Wise Pose Voting | Andrew Jeong et.al. | 2407.08909v1 | null |
2024-07-11 | RTMW: Real-Time Multi-Person 2D and 3D Whole-body Pose Estimation | Tao Jiang et.al. | 2407.08634v1 | link |
2024-07-11 | SRPose: Two-view Relative Pose Estimation with Sparse Keypoints | Rui Yin et.al. | 2407.08199v1 | link |
2024-07-11 | SGLC: Semantic Graph-Guided Coarse-Fine-Refine Full Loop Closing for LiDAR SLAM | Neng Wang et.al. | 2407.08106v1 | link |
2024-07-10 | RoCap: A Robotic Data Collection Pipeline for the Pose Estimation of Appearance-Changing Objects | Jiahao Nick Li et.al. | 2407.08081v1 | null |
2024-07-10 | Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization | Jinjie Mai et.al. | 2407.08023v1 | link |
2024-07-10 | Greit-HRNet: Grouped Lightweight High-Resolution Network for Human Pose Estimation | Junjia Han et.al. | 2407.07389v1 | null |
2024-07-09 | Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images | Chuanrui Zhang et.al. | 2407.06984v1 | null |
2024-07-09 | Computer vision tasks for intelligent aerospace missions: An overview | Huilin Chen et.al. | 2407.06513v1 | null |
2024-07-08 | GeoNLF: Geometry guided Pose-Free Neural LiDAR Fields | Weiyi Xue et.al. | 2407.05597v1 | null |
2024-07-10 | On the power of data augmentation for head pose estimation | Michael Welter et.al. | 2407.05357v2 | link |
2024-07-07 | SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning | Yi Feng et.al. | 2407.05283v1 | link |
2024-07-05 | Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos | Leonhard Sommer et.al. | 2407.04384v1 | link |
2024-07-04 | Towards Cross-View-Consistent Self-Supervised Surround Depth Estimation | Laiyan Ding et.al. | 2407.04041v1 | link |
2024-07-04 | Markerless Multi-view 3D Human Pose Estimation: a survey | Ana Filipa Rodrigues Nogueira et.al. | 2407.03817v1 | null |
2024-07-04 | A Fast Dynamic Point Detection Method for LiDAR-Inertial Odometry in Driving Scenarios | Zikang Yuan et.al. | 2407.03590v1 | link |
2024-07-03 | Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation | Mengmeng Cui et.al. | 2407.02990v1 | null |
2024-07-03 | Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction | Jiaxin Guo et.al. | 2407.02918v1 | link |
2024-07-02 | SUPER: Seated Upper Body Pose Estimation using mmWave Radars | Bo Zhang et.al. | 2407.02455v1 | null |
2024-07-02 | ReliaAvatar: A Robust Real-Time Avatar Animator with Integrated Motion Prediction | Bo Qian et.al. | 2407.02129v1 | null |
2024-07-02 | Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval | Nicola Messina et.al. | 2407.02104v1 | null |
2024-07-01 | Active Human Pose Estimation via an Autonomous UAV Agent | Jingxi Chen et.al. | 2407.01811v1 | null |
2024-07-01 | RoDyn-SLAM: Robust Dynamic Dense RGB-D SLAM with Neural Radiance Fields | Haochen Jiang et.al. | 2407.01303v1 | link |
2024-07-01 | Collaborative Graph Exploration with Reduced Pose-SLAM Uncertainty via Submodular Optimization | Ruofei Bai et.al. | 2407.01013v1 | link |
2024-06-30 | Ego-to-Exo: Interfacing Third Person Visuals from Egocentric Views in Real-time for Improved ROV Teleoperation | Adnan Abdullah et.al. | 2407.00848v1 | null |
2024-06-29 | When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration | Philipp Allgeuer et.al. | 2407.00518v1 | link |
2024-06-28 | Assistive Image Annotation Systems with Deep Learning and Natural Language Capabilities: A Review | Moseli Mots'oehli et.al. | 2407.00252v1 | null |
2024-06-28 | EPOCH: Jointly Estimating the 3D Pose of Cameras and Humans | Nicola Garau et.al. | 2406.19726v1 | null |
2024-06-28 | CLOi-Mapper: Consistent, Lightweight, Robust, and Incremental Mapper With Embedded Systems for Commercial Robot Services | DongKi Noh et.al. | 2406.19634v1 | null |
2024-06-27 | Multimodal Visual-haptic pose estimation in the presence of transient occlusion | Michael Zechmair et.al. | 2406.19323v1 | null |
2024-06-27 | Human Modelling and Pose Estimation Overview | Pawel Knap et.al. | 2406.19290v1 | null |
2024-06-26 | Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference | Yuan Gao et.al. | 2406.18453v1 | link |
2024-06-27 | Automatic infant 2D pose estimation from videos: comparing seven deep neural network methods | Filipe Gama et.al. | 2406.17382v2 | null |
2024-06-24 | High-resolution open-vocabulary object 6D pose estimation | Jaime Corsetti et.al. | 2406.16384v1 | null |
2024-06-23 | Breaking the Frame: Image Retrieval by Visual Overlap Prediction | Tong Wei et.al. | 2406.16204v1 | link |
2024-06-21 | Efficient Human Pose Estimation: Leveraging Advanced Techniques with MediaPipe | Sandeep Singh Sengar et.al. | 2406.15649v1 | link |
2024-06-24 | Investigating the impact of 2D gesture representation on co-speech gesture generation | Teo Guichoux et.al. | 2406.15111v2 | null |
2024-06-20 | Benchmarking Monocular 3D Dog Pose Estimation Using In-The-Wild Motion Capture Data | Moira Shooter et.al. | 2406.14412v1 | null |
2024-06-20 | PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions | Sihan Ma et.al. | 2406.14367v1 | null |
2024-06-19 | NeRF-Feat: 6D Object Pose Estimation using Feature Rendering | Shishir Reddy Vutukur et.al. | 2406.13796v1 | null |
2024-06-19 | CNN Based Flank Predictor for Quadruped Animal Species | Vanessa Suessle et.al. | 2406.13588v1 | null |
2024-06-19 | MVSBoost: An Efficient Point Cloud-based 3D Reconstruction | Umair Haroon et.al. | 2406.13515v1 | null |
2024-06-19 | An Efficient yet High-Performance Method for Precise Radar-Based Imaging of Human Hand Poses | Johanna Bräunig et.al. | 2406.13464v1 | null |
2024-06-18 | Head Pose Estimation and 3D Neural Surface Reconstruction via Monocular Camera in situ for Navigation and Safe Insertion into Natural Openings | Ruijie Tang et.al. | 2406.13048v1 | null |
2024-06-17 | Matching Query Image Against Selected NeRF Feature for Efficient and Scalable Localization | Huaiji Zhou et.al. | 2406.11766v1 | null |
2024-06-17 | Domain Generalization for In-Orbit 6D Pose Estimation | Antoine Legrand et.al. | 2406.11743v1 | null |
2024-06-17 | SeamPose: Repurposing Seams as Capacitive Sensors in a Shirt for Upper-Body Pose Tracking | Tianhong Catherine Yu et.al. | 2406.11645v1 | null |
2024-06-14 | Galibr: Targetless LiDAR-Camera Extrinsic Calibration Method via Ground Plane Initialization | Wonho Song et.al. | 2406.11599v1 | null |
2024-06-15 | MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception | M. Mahbubur Rahman et.al. | 2406.10708v1 | link |
2024-06-15 | Improving Ab-Initio Cryo-EM Reconstruction with Semi-Amortized Pose Inference | Shayan Shekarforoush et.al. | 2406.10455v1 | null |
2024-06-14 | The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences | Bria Long et.al. | 2406.10447v1 | null |
2024-06-14 | OpenCapBench: A Benchmark to Bridge Pose Estimation and Biomechanics | Yoni Gozlan et.al. | 2406.09788v1 | null |
2024-06-13 | ImageNet3D: Towards General-Purpose Object-Level 3D Understanding | Wufei Ma et.al. | 2406.09613v1 | link |
2024-06-13 | Deep Transformer Network for Monocular Pose Estimation of Ship-Based UAV | Maneesha Wickramasuriya et.al. | 2406.09260v1 | link |
2024-06-14 | Language-Driven Closed-Loop Grasping with Model-Predictive Trajectory Replanning | Huy Hoang Nguyen et.al. | 2406.09039v2 | null |
2024-06-14 | VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks | Jiannan Wu et.al. | 2406.08394v2 | link |
2024-06-12 | Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization | Jiaxin Deng et.al. | 2406.08001v1 | null |
2024-06-12 | IFTD: Image Feature Triangle Descriptor for Loop Detection in Driving Scenes | Fengtian Lang et.al. | 2406.07937v1 | link |
2024-06-12 | From Variance to Veracity: Unbundling and Mitigating Gradient Variance in Differentiable Bundle Adjustment Layers | Swaminathan Gurumurthy et.al. | 2406.07785v1 | link |
2024-06-12 | SPIN: Spacecraft Imagery for Navigation | Javier Montalvo et.al. | 2406.07500v2 | link |
2024-06-11 | Realistic Data Generation for 6D Pose Estimation of Surgical Instruments | Juan Antonio Barragan et.al. | 2406.07328v1 | link |
2024-06-11 | SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale | Shester Gueuwou et.al. | 2406.06907v1 | null |
2024-06-10 | Multicam-SLAM: Non-overlapping Multi-camera SLAM for Indirect Visual Localization and Navigation | Shenghao Li et.al. | 2406.06374v1 | link |
2024-06-08 | A preprocessing-based planning framework for utilizing contacts in high-precision insertion tasks | Muhammad Suhail Saleem et.al. | 2406.05522v1 | null |
2024-06-06 | GLACE: Global Local Accelerated Coordinate Encoding | Fangjinhua Wang et.al. | 2406.04340v1 | link |
2024-06-06 | Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking | Jiyao Zhang et.al. | 2406.04316v1 | null |
2024-06-05 | Hi5: 2D Hand Pose Estimation with Zero Human Annotation | Masum Hasan et.al. | 2406.03599v1 | null |
2024-06-05 | Sparse Color-Code Net: Real-Time RGB-Based 6D Object Pose Estimation on Edge Devices | Xingjian Yang et.al. | 2406.02977v1 | null |
2024-06-04 | CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation | Dejia Xu et.al. | 2406.02509v1 | null |
2024-06-04 | HPE-CogVLM: New Head Pose Grounding Task Exploration on Vision Language Model | Yu Tian et.al. | 2406.01914v1 | null |
2024-06-03 | A Robust Filter for Marker-less Multi-person Tracking in Human-Robot Interaction Scenarios | Enrico Martini et.al. | 2406.01832v1 | link |
2024-06-01 | Equivariant amortized inference of poses for cryo-EM | Larissa de Ruijter et.al. | 2406.01630v1 | null |
2024-06-03 | 3D WholeBody Pose Estimation based on Semantic Graph Attention Network and Distance Information | Sihan Wen et.al. | 2406.01196v1 | null |
2024-06-01 | CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation | Matan Rusanovsky et.al. | 2406.00384v1 | link |
2024-05-30 | Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation Approach | Muhammad Saif Ullah Khan et.al. | 2405.20084v1 | null |
2024-05-30 | TAMBRIDGE: Bridging Frame-Centered Tracking and 3D Gaussian Splatting for Enhanced SLAM | Peifeng Jiang et.al. | 2405.19614v1 | null |
2024-05-29 | Real-Time Dynamic Robot-Assisted Hand-Object Interaction via Motion Primitives | Mingqi Yuan et.al. | 2405.19531v1 | null |
2024-05-29 | Exploring AI-based Anonymization of Industrial Image and Video Data in the Context of Feature Preservation | Sabrina Cynthia Triess et.al. | 2405.19173v1 | null |
2024-05-28 | World Models for General Surgical Grasping | Hongbin Lin et.al. | 2405.17940v1 | null |
2024-05-27 | MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds | Jiahui Lei et.al. | 2405.17421v1 | link |
2024-05-27 | Occlusion Handling in 3D Human Pose Estimation with Perturbed Positional Encoding | Niloofar Azizi et.al. | 2405.17397v1 | null |
2024-05-27 | Weiquan Wang et.al. | 2405.17016v1 | null | |
2024-05-27 | Clustering-based Learning for UAV Tracking and Pose Estimation | Jiaping Xiao et.al. | 2405.16867v1 | null |
2024-05-26 | Multi-Modal UAV Detection, Classification and Tracking Algorithm -- Technical Report for CVPR 2024 UG2 Challenge | Tianchen Deng et.al. | 2405.16464v1 | link |
2024-05-25 | Intensity and Texture Correction of Omnidirectional Image Using Camera Images for Indirect Augmented Reality | Hakim Ikebayashi et.al. | 2405.16008v1 | null |
2024-05-23 | CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments | Yang Zhou et.al. | 2405.14731v1 | link |
2024-05-23 | Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation | Daniel Kienzle et.al. | 2405.14467v1 | link |
2024-05-21 | Geometric Transformation Uncertainty for Improving 3D Fetal Brain Pose Prediction from Freehand 2D Ultrasound Videos | Jayroop Ramesh et.al. | 2405.13235v1 | link |
2024-05-21 | Leveraging Neural Radiance Fields for Pose Estimation of an Unknown Space Object during Proximity Operations | Antoine Legrand et.al. | 2405.12728v1 | null |
2024-05-21 | PoseGravity: Pose Estimation from Points and Lines with Axis Prior | Akshay Chandrasekhar et.al. | 2405.12646v1 | link |
2024-05-19 | Focus on Low-Resolution Information: Multi-Granular Information-Lossless Model for Low-Resolution Human Pose Estimation | Zejun Gu et.al. | 2405.12247v1 | null |
2024-05-20 | AutoSoccerPose: Automated 3D posture Analysis of Soccer Shot Movements | Calvin Yeung et.al. | 2405.12070v1 | link |
2024-05-19 | Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries | Christiaan G. A. Viviers et.al. | 2405.11677v1 | link |
2024-05-19 | Cross-Domain Knowledge Distillation for Low-Resolution Human Pose Estimation | Zejun Gu et.al. | 2405.11448v1 | null |
2024-05-18 | PS6D: Point Cloud Based Symmetry-Aware 6D Object Pose Estimation in Robot Bin-Picking | Yifan Yang et.al. | 2405.11257v1 | null |
2024-05-18 | MotionGS : Compact Gaussian Splatting SLAM by Motion Filter | Xinli Guo et.al. | 2405.11129v1 | link |
2024-05-17 | Resolving Symmetry Ambiguity in Correspondence-based Methods for Instance-level Object Pose Estimation | Yongliang Lin et.al. | 2405.10557v1 | null |
2024-05-16 | Diversity-Aware Sign Language Production through a Pose Encoding Variational Autoencoder | Mohamed Ilyes Lakhal et.al. | 2405.10423v1 | null |
2024-05-17 | Toon3D: Seeing Cartoons from a New Perspective | Ethan Weber et.al. | 2405.10320v2 | null |
2024-05-15 | Task-adaptive Q-Face | Haomiao Sun et.al. | 2405.09059v1 | null |
2024-05-14 | RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images | Zong-Wei Hong et.al. | 2405.08483v1 | link |
2024-05-14 | TP3M: Transformer-based Pseudo 3D Image Matching with Reference | Liming Han et.al. | 2405.08434v1 | null |
2024-05-13 | Deep Learning-Based Object Pose Estimation: A Comprehensive Survey | Jian Liu et.al. | 2405.07801v1 | link |
2024-05-13 | JointLoc: A Real-time Visual Localization Framework for Planetary UAVs Based on Joint Relative and Absolute Pose Estimation | Xubo Luo et.al. | 2405.07429v1 | link |
2024-05-11 | TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field Optimization | Zhen Tan et.al. | 2405.07027v1 | link |
2024-05-11 | AHPPEBot: Autonomous Robot for Tomato Harvesting based on Phenotyping and Pose Estimation | Xingxu Li et.al. | 2405.06959v1 | null |
2024-05-10 | CasCalib: Cascaded Calibration for Motion Capture from Sparse Unsynchronized Cameras | James Tang et.al. | 2405.06845v1 | link |
2024-05-10 | MGS-SLAM: Monocular Sparse Tracking and Gaussian Mapping with Depth Smooth Regularization | Pengcheng Zhu et.al. | 2405.06241v1 | null |
2024-05-10 | Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera | Haixin Shi et.al. | 2405.05858v2 | null |
2024-05-09 | Semi-Autonomous Laparoscopic Robot Docking with Learned Hand-Eye Information Fusion | Huanyu Tian et.al. | 2405.05817v1 | null |
2024-05-09 | NeuRSS: Enhancing AUV Localization and Bathymetric Mapping with Neural Rendering for Sidescan SLAM | Yiping Xie et.al. | 2405.05807v1 | null |
2024-05-09 | Benchmarking Neural Radiance Fields for Autonomous Robots: An Overview | Yuhang Ming et.al. | 2405.05526v1 | null |
2024-05-08 | Adversary-Guided Motion Retargeting for Skeleton Anonymization | Thomas Carr et.al. | 2405.05428v1 | null |
2024-05-08 | FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models | Jinglin Xu et.al. | 2405.05216v1 | link |
2024-05-08 | ProbRadarM3F: mmWave Radar based Human Skeletal Pose Estimation with Probability Map Guided Multi-Format Feature Fusion | Bing Zhu et.al. | 2405.05164v1 | null |
2024-05-08 | GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation | Ivan Bilić et.al. | 2405.04890v1 | null |
2024-05-07 | Learning Distributional Demonstration Spaces for Task-Specific Cross-Pose Estimation | Jenny Wang et.al. | 2405.04609v1 | null |
2024-05-07 | Bayesian Simultaneous Localization and Multi-Lane Tracking Using Onboard Sensors and a SD Map | Yuxuan Xia et.al. | 2405.04290v1 | null |
2024-05-07 | Speak the Same Language: Global LiDAR Registration on BIM Using Pose Hough Transform | Zhijian Qiao et.al. | 2405.03969v1 | null |
2024-05-07 | Joint Estimation of Identity Verification and Relative Pose for Partial Fingerprints | Xiongjun Guan et.al. | 2405.03959v1 | link |
2024-05-06 | Pose Priors from Language Models | Sanjay Subramanian et.al. | 2405.03689v1 | null |
2024-05-06 | Optimizing Hand Region Detection in MediaPipe Holistic Full-Body Pose Estimation to Improve Accuracy and Avoid Downstream Errors | Amit Moryossef et.al. | 2405.03545v1 | link |
2024-05-05 | Multi-hop graph transformer network for 3D human pose estimation | Zaedul Islam et.al. | 2405.03055v1 | null |
2024-05-05 | Blending Distributed NeRFs with Tri-stage Robust Pose Optimization | Baijun Ye et.al. | 2405.02880v1 | null |
2024-05-03 | WeightedPose: Generalizable Cross-Pose Estimation via Weighted SVD | Xuxin Cheng et.al. | 2405.02241v1 | link |
2024-05-03 | Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation | Xianzhou Zeng et.al. | 2405.02114v1 | link |
2024-05-03 | An Onboard Framework for Staircases Modeling Based on Point Clouds | Chun Qing et.al. | 2405.01918v1 | null |
2024-05-06 | ShadowNav: Autonomous Global Localization for Lunar Navigation in Darkness | Deegan Atha et.al. | 2405.01673v2 | null |
2024-05-02 | IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning | Ryan Hoque et.al. | 2405.01472v1 | null |
2024-05-02 | Behavior Imitation for Manipulator Control and Grasping with Deep Reinforcement Learning | Liu Qiyuan et.al. | 2405.01284v1 | null |
2024-05-02 | Sports Analysis and VR Viewing System Based on Player Tracking and Pose Estimation with Multimodal and Multiview Sensors | Wenxuan Guo et.al. | 2405.01112v1 | null |
2024-05-02 | CoViS-Net: A Cooperative Visual Spatial Foundation Model for Multi-Robot Applications | Jan Blumenkamp et.al. | 2405.01107v1 | null |
2024-05-04 | HandSSCA: 3D Hand Mesh Reconstruction with State Space Channel Attention from RGB images | Zixun Jiao et.al. | 2405.01066v2 | null |
2024-05-01 | Radar-Based Localization For Autonomous Ground Vehicles In Suburban Neighborhoods | Andrew J. Kramer et.al. | 2405.00600v1 | null |
2024-04-30 | Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging | Rayan Armani et.al. | 2404.19541v1 | link |
2024-04-30 | UniFS: Universal Few-shot Instance Perception with Point Representations | Sheng Jin et.al. | 2404.19401v1 | link |
2024-04-30 | Quater-GCN: Enhancing 3D Human Pose Estimation with Orientation and Semi-supervised Training | Xingyu Song et.al. | 2404.19279v1 | link |
2024-04-30 | XFeat: Accelerated Features for Lightweight Image Matching | Guilherme Potje et.al. | 2404.19174v1 | null |
2024-04-29 | Self-Avatar Animation in Virtual Reality: Impact of Motion Signals Artifacts on the Full-Body Pose Reconstruction | Antoine Maiorca et.al. | 2404.18628v1 | null |
2024-04-29 | Mesh-based Photorealistic and Real-time 3D Mapping for Robust Visual Perception of Autonomous Underwater Vehicle | Jungwoo Lee et.al. | 2404.18395v1 | null |
2024-04-29 | Reconstructing Satellites in 3D from Amateur Telescope Images | Zhiming Chang et.al. | 2404.18394v1 | null |
2024-04-27 | Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs | Yiming Bao et.al. | 2404.17837v1 | null |
2024-04-26 | Localization Through Particle Filter Powered Neural Network Estimated Monocular Camera Poses | Yi Shen et.al. | 2404.17685v1 | null |
2024-04-26 | SLAM for Indoor Mapping of Wide Area Construction Environments | Vincent Ress et.al. | 2404.17215v1 | null |
2024-04-25 | WheelPose: Data Synthesis Techniques to Improve Pose Estimation Performance on Wheelchair Users | William Huang et.al. | 2404.17063v1 | link |
2024-04-25 | Transformer-Based Local Feature Matching for Multimodal Image Registration | Remi Delaunay et.al. | 2404.16802v1 | null |
2024-04-25 | DeepKalPose: An Enhanced Deep-Learning Kalman Filter for Temporally Consistent Monocular Vehicle Pose Estimation | Leandro Di Bella et.al. | 2404.16558v1 | null |
2024-04-25 | Efficient Solution of Point-Line Absolute Pose | Petr Hruby et.al. | 2404.16552v1 | link |
2024-04-25 | COBRA -- COnfidence score Based on shape Regression Analysis for method-independent quality assessment of object pose estimation from single images | Panagiotis Sapoutzoglou et.al. | 2404.16471v1 | link |
2024-04-25 | MegaParticles: Range-based 6-DoF Monte Carlo Localization with GPU-Accelerated Stein Particle Filter | Kenji Koide et.al. | 2404.16370v1 | null |
2024-04-24 | 3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement | Filipa Lino et.al. | 2404.16136v1 | link |
2024-04-23 | SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation | Xiangyu Xu et.al. | 2404.15276v1 | link |
2024-04-25 | Domain adaptive pose estimation via multi-level alignment | Yugan Chen et.al. | 2404.14885v2 | link |
2024-04-23 | Semi-supervised 2D Human Pose Estimation via Adaptive Keypoint Masking | Kexin Meng et.al. | 2404.14835v1 | null |
2024-04-23 | UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues | Vandad Davoodnia et.al. | 2404.14634v1 | null |
2024-04-22 | DHRNet: A Dual-Path Hierarchical Relation Network for Multi-Person Pose Estimation | Yonghao Dang et.al. | 2404.14025v1 | link |
2024-04-23 | CT-NeRF: Incremental Optimizing Neural Radiance Field and Poses with Complex Trajectory | Yunlong Ran et.al. | 2404.13896v2 | null |
2024-04-21 | Resampling-free Particle Filters in High-dimensions | Akhilan Boopathy et.al. | 2404.13698v1 | link |
2024-04-20 | EC-SLAM: Real-time Dense Neural RGB-D SLAM System with Effectively Constrained Global Bundle Adjustment | Guanghao Li et.al. | 2404.13346v1 | link |
2024-04-18 | Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds | Oliver Lemke et.al. | 2404.12440v1 | null |
2024-04-18 | Gait Recognition from Highly Compressed Videos | Andrei Niculae et.al. | 2404.12183v1 | null |
2024-04-17 | Mushroom Segmentation and 3D Pose Estimation from Point Clouds using Fully Convolutional Geometric Features and Implicit Pose Encoding | George Retsinas et.al. | 2404.12144v1 | link |
2024-04-17 | Kathakali Hand Gesture Recognition With Minimal Data | Kavitha Raju et.al. | 2404.11205v1 | null |
2024-04-17 | GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement | Linfang Zheng et.al. | 2404.11139v1 | null |
2024-04-17 | CorrNet+: Sign Language Recognition and Translation via Spatial-Temporal Correlation | Lianyu Hu et.al. | 2404.11111v1 | link |
2024-04-16 | HumMUSS: Human Motion Understanding using State Space Models | Arnab Kumar Mondal et.al. | 2404.10880v1 | null |
2024-04-16 | Invariant Kalman Filtering with Noise-Free Pseudo-Measurements | Sven Goffin et.al. | 2404.10687v1 | null |
2024-04-16 | The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement | Gabriele Trivigno et.al. | 2404.10438v1 | null |
2024-04-16 | GaitPoint+: A Gait Recognition Network Incorporating Point Cloud Analysis and Recycling | Huantao Ren et.al. | 2404.10213v1 | null |
2024-04-16 | LWIRPOSE: A novel LWIR Thermal Image Dataset and Benchmark | Avinash Upadhyay et.al. | 2404.10212v1 | link |
2024-04-15 | LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives | Jiadi Cui et.al. | 2404.09748v1 | null |
2024-04-14 | In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition | Wiktor Mucha et.al. | 2404.09308v1 | link |
2024-04-13 | DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector | Johan Edstedt et.al. | 2404.08928v1 | link |
2024-04-16 | 3D Human Scan With A Moving Event Camera | Kai Kohyama et.al. | 2404.08504v2 | null |
2024-04-11 | Separated Attention: An Improved Cycle GAN Based Under Water Image Enhancement Method | Tashmoy Ghosh et.al. | 2404.07649v1 | null |
2024-04-11 | GLID: Pre-training a Generalist Encoder-Decoder Vision Model | Jihao Liu et.al. | 2404.07603v1 | null |
2024-04-10 | Measuring proximity to standard planes during fetal brain ultrasound scanning | Chiara Di Vece et.al. | 2404.07124v1 | null |
2024-04-10 | MoCap-to-Visual Domain Adaptation for Efficient Human Mesh Estimation from 2D Keypoints | Bedirhan Uguz et.al. | 2404.07094v1 | null |
2024-04-10 | Gaussian-LIC: Photo-realistic LiDAR-Inertial-Camera SLAM with 3D Gaussian Splatting | Xiaolei Lang et.al. | 2404.06926v1 | null |
2024-04-09 | Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences | Axel Barroso-Laguna et.al. | 2404.06337v1 | link |
2024-04-09 | Incremental Joint Learning of Depth, Pose and Implicit Scene Representation on Monocular Camera in Large-scale Scenes | Tianchen Deng et.al. | 2404.06050v1 | null |
2024-04-08 | Learning 3D-Aware GANs from Unposed Images with Template Feature Field | Xinya Chen et.al. | 2404.05705v1 | null |
2024-04-08 | Learning a Category-level Object Pose Estimator without Pose Annotations | Fengrui Tian et.al. | 2404.05626v1 | null |
2024-04-08 | DepthMOT: Depth Cues Lead to a Strong Multi-Object Tracker | Jiapeng Wu et.al. | 2404.05518v1 | link |
2024-04-08 | Two Hands Are Better Than One: Resolving Hand to Hand Intersections via Occupancy Networks | Maksym Ivashechkin et.al. | 2404.05414v1 | null |
2024-04-08 | STITCH: Augmented Dexterity for Suture Throws Including Thread Coordination and Handoffs | Kush Hari et.al. | 2404.05151v1 | null |
2024-04-05 | ToolEENet: Tool Affordance 6D Pose Estimation | Yunlong Wang et.al. | 2404.04193v1 | null |
2024-04-04 | SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation | Sichen Chen et.al. | 2404.03518v1 | link |
2024-04-04 | Multi Positive Contrastive Learning with Pose-Consistent Generated Images | Sho Inayoshi et.al. | 2404.03256v1 | null |
2024-04-04 | HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud | Wencan Cheng et.al. | 2404.03159v1 | link |
2024-04-03 | Fusing Multi-sensor Input with State Information on TinyML Brains for Autonomous Nano-drones | Luca Crupi et.al. | 2404.02567v1 | null |
2024-04-03 | Semi-Supervised Unconstrained Head Pose Estimation in the Wild | Huayi Zhou et.al. | 2404.02544v1 | link |
2024-04-02 | 3D Congealing: 3D-Aware Image Alignment in the Wild | Yunzhi Zhang et.al. | 2404.02125v1 | null |
2024-04-02 | SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation | Vinkle Srivastav et.al. | 2404.02041v1 | link |
2024-04-01 | Marrying NeRF with Feature Matching for One-step Pose Estimation | Ronghan Chen et.al. | 2404.00891v1 | null |
2024-03-31 | Graph-Based vs. Error State Kalman Filter-Based Fusion Of 5G And Inertial Data For MAV Indoor Pose Estimation | Meisam Kabiri et.al. | 2404.00691v1 | null |
2024-03-31 | OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos | Dongyoung Choi et.al. | 2404.00676v1 | null |
2024-04-02 | KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation | Jihua Peng et.al. | 2404.00658v2 | link |
2024-03-29 | FetalDiffusion: Pose-Controllable 3D Fetal MRI Synthesis with Conditional Diffusion Model | Molin Zhang et.al. | 2404.00132v1 | null |
2024-03-29 | Latent Embedding Clustering for Occlusion Robust Head Pose Estimation | José Celestino et.al. | 2403.20251v1 | null |
2024-03-29 | A Unified Framework for Human-centric Point Cloud Video Understanding | Yiteng Xu et.al. | 2403.20031v1 | null |
2024-04-01 | Video-Based Human Pose Regression via Decoupled Space-Time Aggregation | Jijie He et.al. | 2403.19926v2 | link |
2024-03-28 | Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation | Xiao Lin et.al. | 2403.19527v1 | link |
2024-03-27 | Object Pose Estimation via the Aggregation of Diffusion Features | Tianfu Wang et.al. | 2403.18791v1 | link |
2024-03-27 | RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation | Yang Tian et.al. | 2403.18259v1 | null |
2024-03-26 | Mathematical Foundation and Corrections for Full Range Head Pose Estimation | Huei-Chung Hu et.al. | 2403.18104v1 | null |
2024-03-26 | EgoPoseFormer: A Simple Baseline for Egocentric 3D Human Pose Estimation | Chenhongyi Yang et.al. | 2403.18080v1 | link |
2024-03-26 | A Survey on 3D Egocentric Human Pose Estimation | Md Mushfiqur Azam et.al. | 2403.17893v1 | link |
2024-03-26 | GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction | Hrishav Bakul Barua et.al. | 2403.17837v1 | link |
2024-03-26 | DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions | Sammy Christen et.al. | 2403.17827v1 | null |
2024-03-26 | System Calibration of a Field Phenotyping Robot with Multiple High-Precision Profile Laser Scanners | Felix Esser et.al. | 2403.17788v1 | null |
2024-03-25 | Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos | Remy Sabathier et.al. | 2403.17103v1 | link |
2024-03-25 | Characterisation of the Intel RealSense D415 Stereo Depth Camera for Motion-Corrected CT Perfusion Imaging | Mahdieh Dashtbani Moghari et.al. | 2403.16490v1 | null |
2024-03-25 | Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects | Zicong Fan et.al. | 2403.16428v1 | link |
2024-03-25 | A Geometric Perspective on Fusing Gaussian Distributions on Lie Groups | Yixiao Ge et.al. | 2403.16411v1 | null |
2024-03-25 | ASDF: Assembly State Detection Utilizing Late Fusion by Integrating 6D Pose Estimation | Hannah Schieber et.al. | 2403.16400v1 | link |
2024-03-24 | KITchen: A Real-World Benchmark and Dataset for 6D Object Pose Estimation in Kitchen Environments | Abdelrahman Younes et.al. | 2403.16238v1 | null |
2024-03-24 | Diffusion Model is a Good Pose Estimator from 3D RF-Vision | Junqiao Fan et.al. | 2403.16198v1 | null |
2024-03-23 | UPNeRF: A Unified Framework for Monocular 3D Object Reconstruction and Pose Estimation | Yuliang Guo et.al. | 2403.15705v1 | link |
2024-03-22 | InterFusion: Text-Driven Generation of 3D Human-Object Interaction | Sisi Dai et.al. | 2403.15612v1 | link |
2024-03-22 | Augmented Reality Warnings in Roadway Work Zones: Evaluating the Effect of Modality on Worker Reaction Times | Sepehr Sabeti et.al. | 2403.15571v1 | null |
2024-03-22 | Gesture-Controlled Aerial Robot Formation for Human-Swarm Interaction in Safety Monitoring Applications | Vít Krátký et.al. | 2403.15333v1 | null |
2024-03-22 | WSCLoc: Weakly-Supervised Sparse-View Camera Relocalization | Jialu Wang et.al. | 2403.15272v1 | null |
2024-03-22 | DITTO: Demonstration Imitation by Trajectory Transformation | Nick Heppert et.al. | 2403.15203v1 | link |
2024-03-22 | Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning | Bumsoo Kim et.al. | 2403.15048v1 | null |
2024-03-22 | Trajectory Regularization Enhances Self-Supervised Geometric Representation | Jiayun Wang et.al. | 2403.14973v1 | link |
2024-03-21 | VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding | Ahmad Mahmood et.al. | 2403.14743v1 | link |
2024-03-21 | Visibility-Aware Keypoint Localization for 6DoF Object Pose Estimation | Ruyi Lian et.al. | 2403.14559v1 | null |
2024-03-23 | Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset | Andrea Avogaro et.al. | 2403.14447v2 | null |
2024-03-21 | Evaluation and Deployment of LiDAR-based Place Recognition in Dense Forests | Haedam Oh et.al. | 2403.14326v1 | null |
2024-03-21 | Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation | Francesco Di Felice et.al. | 2403.14279v1 | null |
2024-03-20 | DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses | Chen Zhao et.al. | 2403.13683v1 | link |
2024-03-20 | Meta-Point Learning and Refining for Category-Agnostic Pose Estimation | Junjie Chen et.al. | 2403.13647v1 | link |
2024-03-20 | Advancing 6D Pose Estimation in Augmented Reality -- Overcoming Projection Ambiguity with Uncontrolled Imagery | Mayura Manawadu et.al. | 2403.13434v1 | null |
2024-03-20 | DOR3D-Net: Dense Ordinal Regression Network for 3D Hand Pose Estimation | Yamin Mao et.al. | 2403.13405v1 | null |
2024-03-20 | ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics | Qiaojun Yu et.al. | 2403.13365v1 | null |
2024-03-20 | MULAN-WC: Multi-Robot Localization Uncertainty-aware Active NeRF with Wireless Coordination | Weiying Wang et.al. | 2403.13348v1 | null |
2024-03-19 | FaceXFormer: A Unified Transformer for Facial Analysis | Kartik Narayan et.al. | 2403.12960v1 | link |
2024-03-19 | WHAC: World-grounded Humans and Cameras | Wanqi Yin et.al. | 2403.12959v1 | link |
2024-03-19 | Diffusion-Driven Self-Supervised Learning for Shape Reconstruction and Pose Estimation | Jingtao Sun et.al. | 2403.12728v1 | link |
2024-03-19 | IFFNeRF: Initialisation Free and Fast 6DoF pose estimation from a single image and a NeRF model | Matteo Bortolon et.al. | 2403.12682v1 | null |
2024-03-19 | In-Hand Following of Deformable Linear Objects Using Dexterous Fingers with Tactile Sensing | Mingrui Yu et.al. | 2403.12676v1 | null |
2024-03-19 | Self-learning Canonical Space for Multi-view 3D Human Pose Estimation | Xiaoben Li et.al. | 2403.12440v1 | null |
2024-03-20 | Human Mesh Recovery from Arbitrary Multi-view Images | Xiaoben Li et.al. | 2403.12434v2 | link |
2024-03-19 | XPose: eXplainable Human Pose Estimation | Luyu Qiu et.al. | 2403.12370v1 | null |
2024-03-18 | HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data | Mengqi Zhang et.al. | 2403.12011v1 | null |
2024-03-18 | Normalized Validity Scores for DNNs in Regression based Eye Feature Extraction | Wolfgang Fuhl et.al. | 2403.11665v1 | null |
2024-03-18 | An Accurate and Real-time Relative Pose Estimation from Triple Point-line Images by Decoupling Rotation and Translation | Zewen Xu et.al. | 2403.11639v1 | null |
2024-03-18 | LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models | Yang Yang et.al. | 2403.11627v1 | link |
2024-03-18 | GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects | Sungphill Moon et.al. | 2403.11510v1 | null |
2024-03-17 | A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation | Qucheng Peng et.al. | 2403.11310v1 | link |
2024-03-17 | Compact 3D Gaussian Splatting For Dense Visual SLAM | Tianchen Deng et.al. | 2403.11247v1 | link |
2024-03-16 | Robotic Task Success Evaluation Under Multi-modal Non-Parametric Object Pose Uncertainty | Lakshadeep Naik et.al. | 2403.10874v1 | null |
2024-03-16 | DPPE: Dense Pose Estimation in a Plenoxels Environment using Gradient Approximation | Christopher Kolios et.al. | 2403.10773v1 | null |
2024-03-15 | GS-Pose: Cascaded Framework for Generalizable Segmentation-based 6D Object Pose Estimation | Dingding Cai et.al. | 2403.10683v1 | null |
2024-03-15 | CLOSURE: Fast Quantification of Pose Uncertainty Sets | Yihuai Gao et.al. | 2403.09990v1 | null |
2024-03-14 | ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Image | Fangqiang Ding et.al. | 2403.09871v1 | null |
2024-03-14 | BOP Challenge 2023 on Detection, Segmentation and Pose Estimation of Seen and Unseen Rigid Objects | Tomas Hodan et.al. | 2403.09799v1 | null |
2024-03-14 | Scalable Autonomous Drone Flight in the Forest with Visual-Inertial SLAM and Dense Submaps Built without LiDAR | Sebastián Barbas Laina et.al. | 2403.09596v1 | null |
2024-03-14 | Improving Real-Time Omnidirectional 3D Multi-Person Human Pose Estimation with People Matching and Unsupervised 2D-3D Lifting | Pawel Knap et.al. | 2403.09437v1 | null |
2024-03-14 | LM2D: Lyrics- and Music-Driven Dance Synthesis | Wenjie Yin et.al. | 2403.09407v1 | null |
2024-03-14 | SD-Net: Symmetric-Aware Keypoint Prediction and Domain Adaptation for 6D Pose Estimation In Bin-picking Scenarios | Ding-Tao Huang et.al. | 2403.09317v1 | link |
2024-03-14 | MOTPose: Multi-object 6D Pose Estimation for Dynamic Video Sequences using Attention-based Temporal Fusion | Arul Selvam Periyasamy et.al. | 2403.09309v1 | null |
2024-03-13 | Data Augmentation in Human-Centric Vision | Wentao Jiang et.al. | 2403.08650v1 | null |
2024-03-15 | PRAGO: Differentiable Multi-View Pose Optimization From Objectness Detections | Matteo Taiana et.al. | 2403.08586v2 | null |
2024-03-13 | NeRF-Supervised Feature Point Detection and Description | Ali Youssef et.al. | 2403.08156v1 | link |
2024-03-12 | Q-SLAM: Quadric Representations for Monocular SLAM | Chensheng Peng et.al. | 2403.08125v1 | null |
2024-03-12 | MRC-Net: 6-DoF Pose Estimation with MultiScale Residual Correlation | Yuelong Li et.al. | 2403.08019v1 | link |
2024-03-12 | Uncertainty Quantification with Deep Ensembles for 6D Object Pose Estimation | Kira Wursthorn et.al. | 2403.07741v1 | null |
2024-03-12 | Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving | JunDa Cheng et.al. | 2403.07535v1 | link |
2024-03-12 | Category-Agnostic Pose Estimation for Point Clouds | Bowen Liu et.al. | 2403.07437v1 | null |
2024-03-12 | Monocular Microscope to CT Registration using Pose Estimation of the Incus for Augmented Reality Cochlear Implant Surgery | Yike Zhang et.al. | 2403.07219v1 | null |
2024-03-11 | Real-Time Simulated Avatar from Head-Mounted Sensors | Zhengyi Luo et.al. | 2403.06862v1 | null |
2024-03-11 | Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition | Erkut Akdag et.al. | 2403.06577v1 | null |
2024-03-10 | Platypose: Calibrated Zero-Shot Multi-Hypothesis 3D Human Motion Estimation | Paweł A. Pierzchlewicz et.al. | 2403.06164v1 | link |
2024-03-10 | Diffusion Models Trained with Large Data Are Transferable Visual Models | Guangkai Xu et.al. | 2403.06090v1 | link |
2024-03-08 | Prepared for the Worst: A Learning-Based Adversarial Attack for Resilience Analysis of the ICP Algorithm | Ziyu Zhang et.al. | 2403.05666v1 | null |
2024-03-11 | Exploiting polar symmetry in designing equivariant observers for vision-based motion estimation | Tarek Bouazza et.al. | 2403.05450v2 | null |
2024-03-07 | Real-Time Planning Under Uncertainty for AUVs Using Virtual Maps | Ivana Collado-Gonzalez et.al. | 2403.04936v1 | null |
2024-03-07 | That's My Point: Compact Object-centric LiDAR Pose Estimation for Large-scale Outdoor Localisation | Georgi Pramatarov et.al. | 2403.04755v1 | null |
2024-03-07 | Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser | Qingyuan Cai et.al. | 2403.04444v1 | link |
2024-03-09 | Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation | Ruicong Liu et.al. | 2403.04381v2 | link |
2024-03-05 | FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation | Chris Rockwell et.al. | 2403.03221v1 | null |
2024-03-05 | NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors | Yannan He et.al. | 2403.03122v1 | null |
2024-03-05 | Improved LiDAR Odometry and Mapping using Deep Semantic Segmentation and Novel Outliers Detection | Mohamed Afifi et.al. | 2403.03111v1 | null |
2024-03-05 | Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps | Timothy Chen et.al. | 2403.02751v1 | link |
2024-03-04 | PowerSkel: A Device-Free Framework Using CSI Signal for Human Skeleton Estimation in Power Station | Cunyi Yin et.al. | 2403.01913v1 | link |
2024-03-04 | A Simple Baseline for Efficient Hand Mesh Reconstruction | Zhishan Zhou et.al. | 2403.01813v1 | null |
2024-03-03 | MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images | Junwen Huang et.al. | 2403.01517v1 | null |
2024-03-02 | Single-image camera calibration with model-free distortion correction | Katia Genovese et.al. | 2403.01263v1 | null |
2024-03-02 | Grid-based Fast and Structural Visual Odometry | Zhang Zhihe et.al. | 2403.01110v1 | null |
2024-03-01 | Optimal Robot Formations: Balancing Range-Based Observability and User-Defined Configurations | Syed Shabbir Ahmed et.al. | 2403.00988v1 | null |
2024-03-04 | TEXterity -- Tactile Extrinsic deXterity: Simultaneous Tactile Estimation and Control for Extrinsic Dexterity | Sangwoon Kim et.al. | 2403.00049v2 | null |
2024-03-01 | Graph Convolutional Neural Networks for Automated Echocardiography View Recognition: A Holistic Approach | Sarina Thomas et.al. | 2402.19062v2 | null |
2024-02-29 | Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey | Yang Liu et.al. | 2402.18844v1 | link |
2024-02-28 | Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting | Taeho Kang et.al. | 2402.18330v1 | link |
2024-02-28 | Location-guided Head Pose Estimation for Fisheye Image | Bing Li et.al. | 2402.18320v1 | null |
2024-02-28 | NToP: NeRF-Powered Large-scale Dataset Generation for 2D and 3D Human Pose Estimation in Top-View Fisheye Images | Jingrui Yu et.al. | 2402.18196v1 | link |
2024-02-28 | Six-Point Method for Multi-Camera Systems with Reduced Solution Space | Banglei Guan et.al. | 2402.18066v1 | link |
2024-02-27 | Real-Time Estimation of Relative Pose for UAVs Using a Dual-Channel Feature Association | Zhaoying Wang et.al. | 2402.17504v1 | null |
2024-02-26 | HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields | Haozhe Qi et.al. | 2402.17062v1 | link |
2024-02-26 | DRSI-Net: Dual-Residual Spatial Interaction Network for Multi-Person Pose Estimation | Shang Wu et.al. | 2402.16640v1 | null |
2024-02-26 | GEA: Reconstructing Expressive 3D Gaussian Avatar from Monocular Video | Xinqi Liu et.al. | 2402.16607v1 | null |
2024-02-26 | DreamUp3D: Object-Centric Generative Models for Single-View 3D Scene Understanding and Real-to-Sim Transfer | Yizhe Wu et.al. | 2402.16308v1 | null |
2024-02-25 | XAI-based gait analysis of patients walking with Knee-Ankle-Foot orthosis using video cameras | Arnav Mishra et.al. | 2402.16175v1 | null |
2024-02-25 | VOLoc: Visual Place Recognition by Querying Compressed Lidar Map | Xudong Cai et.al. | 2402.15961v1 | link |
2024-02-24 | CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge | Xiao Lin et.al. | 2402.15726v1 | null |
2024-02-23 | Optimized Deployment of Deep Neural Networks for Visual Pose Estimation on Nano-drones | Matteo Risso et.al. | 2402.15273v1 | null |
2024-02-22 | Cameras as Rays: Pose Estimation via Ray Diffusion | Jason Y. Zhang et.al. | 2402.14817v1 | null |
2024-02-22 | S^2Former-OR: Single-Stage Bimodal Transformer for Scene Graph Generation in OR | Jialun Pei et.al. | 2402.14461v1 | link |
2024-02-22 | VLPose: Bridging the Domain Gap in Pose Estimation with Language-Vision Tuning | Jingyao Li et.al. | 2402.14456v1 | null |
2024-02-22 | Modeling 3D Infant Kinetics Using Adaptive Graph Convolutional Networks | Daniel Holmberg et.al. | 2402.14400v1 | link |
2024-02-22 | Secure Navigation using Landmark-based Localization in a GPS-denied Environment | Ganesh Sapkota et.al. | 2402.14280v1 | null |
2024-02-21 | SecurePose: Automated Face Blurring and Human Movement Kinematics Extraction from Videos Recorded in Clinical Settings | Rishabh Bajpai et.al. | 2402.14143v1 | null |
2024-02-21 | High-throughput Visual Nano-drone to Nano-drone Relative Localization using Onboard Fully Convolutional Networks | Luca Crupi et.al. | 2402.13756v1 | null |
2024-02-21 | EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera Relocalization | Zhendong Xiao et.al. | 2402.13537v1 | null |
2024-02-20 | DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal Category-level Pose Estimation | Takuya Ikeda et.al. | 2402.12647v1 | link |
2024-02-19 | Landmark-based Localization using Stereo Vision and Deep Learning in GPS-Denied Battlefield Environment | Ganesh Sapkota et.al. | 2402.12551v1 | null |
2024-02-18 | Boosting Semi-Supervised 2D Human Pose Estimation by Revisiting Data Augmentation and Consistency Training | Huayi Zhou et.al. | 2402.11566v1 | link |
2024-02-17 | Enhancing Surgical Performance in Cardiothoracic Surgery with Innovations from Computer Vision and Artificial Intelligence: A Narrative Review | Merryn D. Constable et.al. | 2402.11288v1 | null |
2024-02-17 | Dense Matchers for Dense Tracking | Tomáš Jelínek et.al. | 2402.11287v1 | null |
2024-02-16 | Occlusion Resilient 3D Human Pose Estimation | Soumava Kumar Roy et.al. | 2402.11036v1 | null |
2024-02-16 | 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations | Tsung-Wei Ke et.al. | 2402.10885v1 | null |
2024-02-15 | Lester: rotoscope animation through video object segmentation and tracking | Ruben Tous et.al. | 2402.09883v1 | link |
2024-02-15 | Foul prediction with estimated poses from soccer broadcast video | Jiale Fang et.al. | 2402.09650v1 | null |
2024-02-16 | IMUOptimize: A Data-Driven Approach to Optimal IMU Placement for Human Pose Estimation with Transformer Architecture | Varun Ramani et.al. | 2402.08923v2 | null |
2024-02-13 | Are Semi-Dense Detector-Free Methods Good at Matching Local Features? | Matthieu Vilain et.al. | 2402.08671v1 | null |
2024-02-13 | Gaussian-Sum Filter for Range-based 3D Relative Pose Estimation in the Presence of Ambiguities | Syed S. Ahmed et.al. | 2402.08566v1 | null |
2024-02-13 | Learning to Produce Semi-dense Correspondences for Visual Localization | Khang Truong Giang et.al. | 2402.08359v1 | link |
2024-02-12 | Extending 3D body pose estimation for robotic-assistive therapies of autistic children | Laura Santos et.al. | 2402.08006v1 | null |
2024-02-12 | GBOT: Graph-Based 3D Object Tracking for Augmented Reality-Assisted Assembly Guidance | Shiyu Li et.al. | 2402.07677v1 | link |
2024-02-12 | UAV-assisted Visual SLAM Generating Reconstructed 3D Scene Graphs in GPS-denied Environments | Ahmed Radwan et.al. | 2402.07537v1 | null |
2024-02-09 | Improving 2D-3D Dense Correspondences with Diffusion Models for 6D Object Pose Estimation | Peter Hönig et.al. | 2402.06436v1 | null |
2024-02-08 | Real-time Holistic Robot Pose Estimation with Unknown States | Shikun Ban et.al. | 2402.05655v1 | link |
2024-02-08 | Extending 6D Object Pose Estimators for Stereo Vision | Thomas Pöllabauer et.al. | 2402.05610v1 | null |
2024-02-09 | NCRF: Neural Contact Radiance Fields for Free-Viewpoint Rendering of Hand-Object Interaction | Zhongqun Zhang et.al. | 2402.05532v2 | null |
2024-02-07 | Detection and Pose Estimation of flat, Texture-less Industry Objects on HoloLens using synthetic Training | Thomas Pöllabauer et.al. | 2402.04979v1 | null |
2024-02-07 | 4-Dimensional deformation part model for pose estimation using Kalman filter constraints | Enrique Martinez-Berti et.al. | 2402.04953v1 | null |
2024-02-07 | STAR: Shape-focused Texture Agnostic Representations for Improved Object Detection and 6D Pose Estimation | Peter Hönig et.al. | 2402.04878v1 | link |
2024-02-05 | A Computer Vision Based Approach for Stalking Detection Using a CNN-LSTM-MLP Hybrid Fusion Model | Murad Hasan et.al. | 2402.03417v1 | null |
2024-02-05 | SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM | Mingrui Li et.al. | 2402.03246v1 | link |
2024-02-05 | Extreme Two-View Geometry From Object Poses with Diffusion Models | Yujing Sun et.al. | 2402.02800v1 | link |
2024-02-04 | Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation | Ti Wang et.al. | 2402.02339v1 | null |
2024-02-01 | mmID: High-Resolution mmWave Imaging for Human Identification | Sakila S. Jayaweera et.al. | 2402.00996v1 | null |
2024-02-01 | In-Bed Pose Estimation: A Review | Ziya Ata Yazıcı et.al. | 2402.00700v1 | null |
2024-02-01 | WayFASTER: a Self-Supervised Traversability Prediction for Increased Navigation Awareness | Mateus Valverde Gasparino et.al. | 2402.00683v1 | link |
2024-02-02 | CMRNext: Camera to LiDAR Matching in the Wild for Localization and Extrinsic Calibration | Daniele Cattaneo et.al. | 2402.00129v2 | null |
2024-01-31 | Improved Scene Landmark Detection for Camera Localization | Tien Do et.al. | 2401.18083v1 | link |
2024-01-30 | Navigating the Unknown: Uncertainty-Aware Compute-in-Memory Autonomy of Edge Robotics | Nastaran Darabi et.al. | 2401.17481v1 | null |
2024-01-30 | MESA: Matching Everything by Segmenting Anything | Yesheng Zhang et.al. | 2401.16741v1 | null |
2024-01-30 | Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers | Jianbin Jiao et.al. | 2401.16700v1 | link |
2024-01-29 | Leveraging Positional Encoding for Robust Multi-Reference-Based Object 6D Pose Estimation | Jaewoo Park et.al. | 2401.16284v1 | null |
2024-01-29 | Reconstructing Close Human Interactions from Multiple Views | Qing Shuai et.al. | 2401.16173v1 | link |
2024-01-28 | Multi-Person 3D Pose Estimation from Multi-View Uncalibrated Depth Cameras | Yu-Jhe Li et.al. | 2401.15616v1 | null |
2024-01-30 | Multi-Robot Relative Pose Estimation in SE(2) with Observability Analysis: A Comparison of Extended Kalman Filtering and Robust Pose Graph Optimization | Kihoon Shin et.al. | 2401.15313v2 | null |
2024-01-26 | Adaptive Deep Learning for Efficient Visual Pose Estimation aboard Ultra-low-power Nano-drones | Beatrice Alessandra Motetti et.al. | 2401.15236v1 | null |
2024-01-26 | SimpleEgo: Predicting Probabilistic Body Pose from Egocentric Cameras | Hanz Cuevas-Velasquez et.al. | 2401.14785v1 | null |
2024-01-24 | Synthetic data enables faster annotation and robust segmentation for multi-object grasping in clutter | Dongmyoung Lee et.al. | 2401.13405v1 | null |
2024-01-24 | Linear Relative Pose Estimation Founded on Pose-only Imaging Geometry | Qi Cai et.al. | 2401.13357v1 | null |
2024-01-23 | SemanticSLAM: Learning based Semantic Map Construction and Robust Camera Localization | Mingyang Li et.al. | 2401.13076v1 | link |
2024-01-24 | RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos | Hongchi Xia et.al. | 2401.12592v2 | null |
2024-01-26 | MobileARLoc: On-device Robust Absolute Localisation for Pervasive Markerless Mobile AR | Changkun Liu et.al. | 2401.11511v2 | null |
2024-01-19 | SCENES: Subpixel Correspondence Estimation With Epipolar Supervision | Dominik A. Kloepfer et.al. | 2401.10886v1 | null |
2024-01-19 | Source-Free and Image-Only Unsupervised Domain Adaptation for Category Level Object Pose Estimation | Prakhar Kaushik et.al. | 2401.10848v1 | null |
2024-01-22 | TEXterity: Tactile Extrinsic deXterity | Antonia Bronars et.al. | 2401.10230v2 | null |
2024-01-18 | Exploring Latent Cross-Channel Embedding for Accurate 3D Human Pose Reconstruction in a Diffusion Framework | Junkun Jiang et.al. | 2401.09836v1 | link |
2024-01-17 | DK-SLAM: Monocular Visual SLAM with Deep Keypoints Adaptive Learning, Tracking and Loop-Closing | Hao Qu et.al. | 2401.09160v1 | null |
2024-01-17 | PIN-SLAM: LiDAR SLAM Using a Point-Based Implicit Neural Representation for Achieving Global Map Consistency | Yue Pan et.al. | 2401.09101v1 | link |
2024-01-16 | AdaSem: Adaptive Goal-Oriented Semantic Communications for End-to-End Camera Relocalization | Qi Liao et.al. | 2401.08360v1 | null |
2024-01-16 | S3M: Semantic Segmentation Sparse Mapping for UAVs with RGB-D Camera | Thanh Nguyen Canh et.al. | 2401.08134v1 | null |
2024-01-15 | Collaboratively Self-supervised Video Representation Learning for Action Recognition | Jie Zhang et.al. | 2401.07584v1 | null |
2024-01-14 | 3D Landmark Detection on Human Point Clouds: A Benchmark and A Dual Cascade Point Transformer Framework | Fan Zhang et.al. | 2401.07251v1 | null |
2024-01-11 | On the representation and methodology for wide and short range head pose estimation | Alejandro Cobo et.al. | 2401.05807v1 | link |
2024-01-10 | Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects | Tianhang Cheng et.al. | 2401.05236v1 | link |
2024-01-10 | Video-based Automatic Lameness Detection of Dairy Cows using Pose Estimation and Multiple Locomotion Traits | Helena Russello et.al. | 2401.05202v1 | null |
2024-01-10 | Diffusion-based Pose Refinement and Muti-hypothesis Generation for 3D Human Pose Estimaiton | Hongbo Kang et.al. | 2401.04921v1 | link |
2024-01-15 | Towards Real-World Aerial Vision Guidance with Categorical 6D Pose Tracker | Jingtao Sun et.al. | 2401.04377v2 | link |
2024-01-07 | RHOBIN Challenge: Reconstruction of Human Object Interaction | Xianghui Xie et.al. | 2401.04143v1 | null |
2024-01-08 | D3PRefiner: A Diffusion-based Denoise Method for 3D Human Pose Refinement | Danqi Yan et.al. | 2401.03914v1 | null |
2024-01-07 | Big Data and Deep Learning in Smart Cities: A Comprehensive Dataset for AI-Driven Traffic Accident Detection and Computer Vision Systems | Victor Adewopo et.al. | 2401.03587v1 | null |
2024-01-04 | Survey of 3D Human Body Pose and Shape Estimation Methods for Contemporary Dance Applications | Darshan Venkatrayappa et.al. | 2401.02383v1 | null |
2024-01-04 | Fit-NGP: Fitting Object Models to Neural Graphics Primitives | Marwan Taher et.al. | 2401.02357v1 | null |
2024-01-04 | PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DOF Object Pose Dataset Generation | Lukas Meyer et.al. | 2401.02281v1 | link |
2024-01-03 | Real-Time Human Fall Detection using a Lightweight Pose Estimation Technique | Ekram Alam et.al. | 2401.01587v1 | link |
2024-01-05 | PLE-SLAM: A Visual-Inertial SLAM Based on Point-Line Features and Efficient IMU Initialization | Jiaming He et.al. | 2401.01081v2 | link |
2023-12-30 | 3D Human Pose Perception from Egocentric Stereo Videos | Hiroyasu Akada et.al. | 2401.00889v1 | null |
2024-01-01 | Geometry Depth Consistency in RGBD Relative Pose Estimation | Sourav Kumar et.al. | 2401.00639v1 | null |
2023-12-30 | A comprehensive framework for occluded human pose estimation | Linhao Xu et.al. | 2401.00155v1 | null |
2024-01-02 | 6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation | Li Xu et.al. | 2401.00029v2 | null |
2023-12-29 | MURP: Multi-Agent Ultra-Wideband Relative Pose Estimation with Constrained Communications in 3D Environments | Andrew Fishberg et.al. | 2312.17731v1 | link |
2023-12-28 | iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views | Chin-Hsuan Wu et.al. | 2312.17250v1 | link |
2023-12-28 | EvPlug: Learn a Plug-and-Play Module for Event and Image Fusion | Jianping Jiang et.al. | 2312.16933v1 | null |
2023-12-28 | SR-LIVO: LiDAR-Inertial-Visual Odometry and Mapping with Sweep Reconstruction | Zikang Yuan et.al. | 2312.16800v1 | link |
2023-12-28 | L-LO: Enhancing Pose Estimation Precision via a Landmark-Based LiDAR Odometry | Feiya Li et.al. | 2312.16787v1 | null |
2023-12-27 | HMP: Hand Motion Priors for Pose and Shape Estimation from Video | Enes Duran et.al. | 2312.16737v1 | null |
2023-12-27 | Camera calibration for the surround-view system: a benchmark and dataset | L Qin et.al. | 2312.16499v1 | null |
2023-12-24 | TEMP3D: Temporally Continuous 3D Human Pose Estimation Under Occlusions | Rohit Lal et.al. | 2312.16221v1 | link |
2023-12-26 | Graph Context Transformation Learning for Progressive Correspondence Pruning | Junwen Guo et.al. | 2312.15971v1 | link |
2023-12-25 | Lifting by Image -- Leveraging Image Cues for Accurate 3D Human Pose Estimation | Feng Zhou et.al. | 2312.15636v1 | null |
2023-12-25 | APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and Beyond | Yuxiang Yang et.al. | 2312.15612v1 | link |
2023-12-23 | PACE: Pose Annotations in Cluttered Environments | Yang You et.al. | 2312.15130v1 | link |
2023-12-22 | PoseGen: Learning to Generate 3D Human Pose Dataset with NeRF | Mohsen Gholami et.al. | 2312.14915v1 | link |
2023-12-22 | Harnessing Diffusion Models for Visual Perception with Meta Prompts | Qiang Wan et.al. | 2312.14733v1 | link |
2023-12-22 | Pola4All: survey of polarimetric applications and an open-source toolkit to analyze polarization | Joaquin Rodriguez et.al. | 2312.14697v1 | link |
2023-12-22 | PoseViNet: Distracted Driver Action Recognition Framework Using Multi-View Pose Estimation and Vision Transformer | Neha Sengar et.al. | 2312.14577v1 | null |
2023-12-22 | Scalable 3D Reconstruction From Single Particle X-Ray Diffraction Images Based on Online Machine Learning | Jay Shenoy et.al. | 2312.14432v1 | null |
2023-12-21 | 3D Pose Estimation of Two Interacting Hands from a Monocular Event Camera | Christen Millerdurai et.al. | 2312.14157v1 | null |
2023-12-21 | DUSt3R: Geometric 3D Vision Made Easy | Shuzhe Wang et.al. | 2312.14132v1 | link |
2023-12-20 | NeRF-VO: Real-Time Sparse Visual Odometry with Neural Radiance Fields | Jens Naumann et.al. | 2312.13471v1 | null |
2023-12-20 | Brain-Inspired Visual Odometry: Balancing Speed and Interpretability through a System of Systems Approach | Habib Boloorchi Tabrizi et.al. | 2312.13162v1 | link |
2023-12-18 | Unified framework for diffusion generative models in SO(3): applications in computer vision and astrophysics | Yesukhei Jagvaral et.al. | 2312.11707v1 | null |
2023-12-18 | Underwater Robot Pose Estimation Using Acoustic Methods and Intermittent Position Measurements at the Surface | Vicu-Mihalis Maer et.al. | 2312.11401v1 | null |
2023-12-17 | SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation | Xiaoqi An et.al. | 2312.10758v1 | link |
2023-12-17 | PNeRFLoc: Visual Localization with Point-based Neural Radiance Fields | Boming Zhao et.al. | 2312.10649v1 | null |
2023-12-15 | SoloPose: One-Shot Kinematic 3D Human Pose Estimation with Video Data Augmentation | David C. Jeong et.al. | 2312.10195v1 | link |
2023-12-14 | iComMa: Inverting 3D Gaussians Splatting for Camera Pose Estimation via Comparing and Matching | Yuan Sun et.al. | 2312.09031v1 | null |
2023-12-14 | Scene 3-D Reconstruction System in Scattering Medium | Zhuoyifan Zhang et.al. | 2312.09005v1 | null |
2023-12-14 | CattleEyeView: A Multi-task Top-down View Cattle Dataset for Smarter Precision Livestock Farming | Kian Eng Ong et.al. | 2312.08764v1 | link |
2023-12-20 | PnP for Two-Dimensional Pose Estimation | Joshua Wang et.al. | 2312.08488v2 | link |
2023-12-13 | Pose and shear-based tactile servoing | John Lloyd et.al. | 2312.08411v1 | null |
2023-12-13 | FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects | Bowen Wen et.al. | 2312.08344v1 | link |
2023-12-13 | Efficient Multi-Object Pose Estimation using Multi-Resolution Deformable Attention and Query Aggregation | Arul Selvam Periyasamy et.al. | 2312.08268v1 | null |
2023-12-13 | CenterGrasp: Object-Aware Implicit Representation Learning for Simultaneous Shape Reconstruction and 6-DoF Grasp Estimation | Eugenio Chisari et.al. | 2312.08240v1 | null |
2023-12-13 | C-BEV: Contrastive Bird's Eye View Training for Cross-View Image Retrieval and 3-DoF Pose Estimation | Florian Fervers et.al. | 2312.08060v1 | null |
2023-12-13 | Three-Filters-to-Normal+: Revisiting Discontinuity Discrimination in Depth-to-Normal Translation | Jingwei Yang et.al. | 2312.07964v1 | null |
2023-12-13 | Diffusion Models Enable Zero-Shot Pose Estimation for Lower-Limb Prosthetic Users | Tianxun Zhou et.al. | 2312.07854v1 | null |
2023-12-12 | RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation | Peng Lu et.al. | 2312.07526v1 | link |
2023-12-12 | COLMAP-Free 3D Gaussian Splatting | Yang Fu et.al. | 2312.07504v1 | link |
2023-12-12 | RMS: Redundancy-Minimizing Point Cloud Sampling for Real-Time Pose Estimation in Degenerated Environments | Pavel Petracek et.al. | 2312.07337v1 | link |
2023-12-12 | Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs | Sunghwan Hong et.al. | 2312.07246v1 | link |
2023-12-12 | Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation | Yuchen Yang et.al. | 2312.07051v1 | link |
2023-12-12 | Towards Enhanced Human Activity Recognition through Natural Language Generation and Pose Estimation | Nikhil Kashyap et.al. | 2312.06965v1 | null |
2023-12-12 | Exploring Novel Object Recognition and Spontaneous Location Recognition Machine Learning Analysis Techniques in Alzheimer's Mice | Soham Bafana et.al. | 2312.06914v1 | link |
2023-12-11 | Keypoint-based Stereophotoclinometry for Characterizing and Navigating Small Bodies: A Factor Graph Approach | Travis Driver et.al. | 2312.06865v1 | link |
2023-12-11 | Improving the Robustness of 3D Human Pose Estimation: A Benchmark and Learning from Noisy Input | Trung-Hieu Hoang et.al. | 2312.06797v1 | null |
2023-12-11 | 3D Hand Pose Estimation in Egocentric Images in the Wild | Aditya Prakash et.al. | 2312.06583v1 | null |
2023-12-11 | PointVoxel: A Simple and Effective Pipeline for Multi-View Multi-Modal 3D Human Pose Estimation | Zhiyu Pan et.al. | 2312.06409v1 | null |
2023-12-11 | ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation | Cédric Rommel et.al. | 2312.06386v1 | link |
2023-12-10 | From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation | Javier Tirado-Garín et.al. | 2312.05995v1 | link |
2023-12-09 | You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception | Sheng Jin et.al. | 2312.05525v1 | link |
2023-12-07 | Image and AIS Data Fusion Technique for Maritime Computer Vision Applications | Emre Gülsoylu et.al. | 2312.05270v1 | link |
2023-12-07 | Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection | Kohei Yamashita et.al. | 2312.04527v1 | null |
2023-12-07 | Detecting and Restoring Non-Standard Hands in Stable Diffusion Generated Images | Yiqun Zhang et.al. | 2312.04236v1 | null |
2023-12-06 | Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning | Xinshun Wang et.al. | 2312.03703v1 | link |
2023-12-06 | Cooperative Probabilistic Trajectory Forecasting under Occlusion | Anshul Nayak et.al. | 2312.03296v1 | null |
2023-12-05 | A Unified Simulation Framework for Visual and Behavioral Fidelity in Crowd Analysis | Niccolò Bisagno et.al. | 2312.02613v1 | null |
2023-12-05 | 6D Assembly Pose Estimation by Point Cloud Registration for Robot Manipulation | K. Samarawickrama et.al. | 2312.02593v1 | link |
2023-12-05 | PolyFit: A Peg-in-hole Assembly Framework for Unseen Polygon Shapes via Sim-to-real Adaptation | Geonhyup Lee et.al. | 2312.02531v1 | null |
2023-12-04 | GenEM: Physics-Informed Generative Cryo-Electron Microscopy | Jiakai Zhang et.al. | 2312.02235v1 | null |
2023-12-02 | Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors | Yu Zhang et.al. | 2312.02196v1 | link |
2023-12-04 | iMatching: Imperative Correspondence Learning | Zitong Zhan et.al. | 2312.02141v1 | link |
2023-12-04 | SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM | Nikhil Keetha et.al. | 2312.02126v1 | link |
2023-12-04 | Disentangled Interaction Representation for One-Stage Human-Object Interaction Detection | Xubin Zhong et.al. | 2312.01713v1 | null |
2023-12-05 | Hulk: A Universal Knowledge Translator for Human-Centric Tasks | Yizhou Wang et.al. | 2312.01697v2 | link |
2023-12-04 | Multi-View Person Matching and 3D Pose Estimation with Arbitrary Uncalibrated Camera Networks | Yan Xu et.al. | 2312.01561v1 | null |
2023-12-01 | Object 6D pose estimation meets zero-shot learning | Andrea Caraffa et.al. | 2312.00947v1 | null |
2023-12-01 | Open-vocabulary object 6D pose estimation | Jaime Corsetti et.al. | 2312.00690v1 | null |
2023-12-01 | Global Localization: Utilizing Relative Spatio-Temporal Geometric Constraints from Adjacent and Distant Cameras | Mohammad Altillawi et.al. | 2312.00500v1 | null |
2023-12-01 | Learning Unorthogonalized Matrices for Rotation Estimation | Kerui Gu et.al. | 2312.00462v1 | null |
2023-11-30 | PoseGPT: Chatting about 3D Human Pose | Yao Feng et.al. | 2311.18836v1 | null |
2023-11-30 | FoundPose: Unseen Object Pose Estimation with Foundation Features | Evin Pınar Örnek et.al. | 2311.18809v1 | null |
2023-11-30 | Pose Estimation and Tracking for ASIST | Ari Goodman et.al. | 2311.18665v1 | null |
2023-11-29 | A Stochastic-Geometrical Framework for Object Pose Estimation based on Mixture Models Avoiding the Correspondence Problem | Wolfgang Hoegele et.al. | 2311.18107v1 | null |
2023-11-29 | Pose Anything: A Graph-Based Approach for Category-Agnostic Pose Estimation | Or Hirschorn et.al. | 2311.17891v1 | link |
2023-11-29 | Cinematic Behavior Transfer via NeRF-based Differentiable Filming | Xuekun Jiang et.al. | 2311.17754v1 | null |
2023-11-29 | PViT-6D: Overclocking Vision Transformers for 6D Pose Estimation with Confidence-Level Prediction and Pose Tokens | Sebastian Stapf et.al. | 2311.17504v1 | null |
2023-11-28 | On the Calibration of Human Pose Estimation | Kerui Gu et.al. | 2311.17105v1 | null |
2023-11-28 | Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence | Junyi Zhang et.al. | 2311.17034v1 | link |
2023-11-28 | HandyPriors: Physically Consistent Perception of Hand-Object Interactions with Differentiable Priors | Shutong Zhang et.al. | 2311.16552v1 | null |
2023-11-28 | Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement | Jian Wang et.al. | 2311.16495v1 | null |
2023-11-24 | UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning | Zhongyu Jiang et.al. | 2311.16477v1 | null |
2023-11-27 | DiffSLVA: Harnessing Diffusion Models for Sign Language Video Anonymization | Zhaoyang Xia et.al. | 2311.16060v1 | link |
2023-11-27 | Uncertainty Quantification of Set-Membership Estimation in Control and Perception: Revisiting the Minimum Enclosing Ellipsoid | Yukai Tang et.al. | 2311.15962v1 | null |
2023-11-27 | Computer Vision for Carriers: PATRIOT | Ari Goodman et.al. | 2311.15914v1 | null |
2023-11-27 | SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation | Jiehong Lin et.al. | 2311.15707v1 | link |
2023-11-24 | RSB-Pose: Robust Short-Baseline Binocular 3D Human Pose Estimation with Occlusion Handling | Xiaoyue Wan et.al. | 2311.14242v1 | null |
2023-11-23 | Appearance-based gaze estimation enhanced with synthetic images using deep neural networks | Dmytro Herashchenko et.al. | 2311.14175v1 | link |
2023-11-23 | GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence | Van Nguyen Nguyen et.al. | 2311.14155v1 | link |
2023-11-23 | GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence | Pengyuan Wang et.al. | 2311.13777v1 | null |
2023-11-22 | HEViTPose: High-Efficiency Vision Transformer for Human Pose Estimation | Chengpeng Wu et.al. | 2311.13615v1 | link |
2023-11-24 | Calibration System and Algorithm Design for a Soft Hinged Micro Scanning Mirror with a Triaxial Hall Effect Sensor | Di Wang et.al. | 2311.12778v2 | null |
2023-11-21 | HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation | Yongliang Lin et.al. | 2311.12588v1 | link |
2023-11-21 | CoVOR-SLAM: Cooperative SLAM using Visual Odometry and Ranges for Multi-Robot Systems | Young-Hee Lee et.al. | 2311.12580v1 | null |
2023-11-21 | HCA-Net: Hierarchical Context Attention Network for Intervertebral Disc Semantic Labeling | Afshin Bozorgpour et.al. | 2311.12486v1 | link |
2023-11-21 | Two Views Are Better than One: Monocular 3D Pose Estimation with Multiview Consistency | Christian Keilstrup Ingwersen et.al. | 2311.12421v1 | null |
2023-11-20 | Fingerspelling PoseNet: Enhancing Fingerspelling Translation with Pose-Based Transformer Models | Pooya Fayyazsanavi et.al. | 2311.12128v1 | link |
2023-11-20 | Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation | Wenhao Li et.al. | 2311.12028v1 | link |
2023-11-20 | SniffyArt: The Dataset of Smelling Persons | Mathias Zinnen et.al. | 2311.11888v1 | null |
2023-11-21 | Robot Hand-Eye Calibration using Structure-from-Motion | Nicolas Andreff et.al. | 2311.11808v2 | null |
2023-11-18 | SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation | Yamei Chen et.al. | 2311.11125v1 | link |
2023-11-18 | Synthetic Data Generation for Bridging Sim2Real Gap in a Production Environment | Parth Rawal et.al. | 2311.11039v1 | null |
2023-11-18 | Multiple View Geometry Transformers for 3D Human Pose Estimation | Ziwei Liao et.al. | 2311.10983v1 | link |
2023-11-18 | Jenga Stacking Based on 6D Pose Estimation for Architectural Form Finding Process | Zixun Huang et.al. | 2311.10918v1 | null |
2023-11-17 | BiHRNet: A Binary high-resolution network for Human Pose Estimation | Zhicheng Zhang et.al. | 2311.10296v1 | null |
2023-11-16 | Match and Locate: low-frequency monocular odometry based on deep feature matching | Stepan Konev et.al. | 2311.10034v1 | null |
2023-11-16 | LIO-EKF: High Frequency LiDAR-Inertial Odometry using Extended Kalman Filters | Yibin Wu et.al. | 2311.09887v1 | link |
2023-11-16 | Improved TokenPose with Sparsity | Anning Li et.al. | 2311.09653v1 | null |
2023-11-16 | Pseudo-keypoints RKHS Learning for Self-supervised 6DoF Pose Estimation | Yangzheng Wu et.al. | 2311.09500v1 | null |
2023-11-15 | NormNet: Scale Normalization for 6D Pose Estimation in Stacked Scenarios | En-Te Lin et.al. | 2311.09269v1 | link |
2023-11-15 | Range-Visual-Inertial Sensor Fusion for Micro Aerial Vehicle Localization and Navigation | Abhishek Goudar et.al. | 2311.09056v1 | link |
2023-11-14 | LocaliseBot: Multi-view 3D object localisation with differentiable rendering for robot grasping | Sujal Vijayaraghavan et.al. | 2311.08438v1 | null |
2023-11-13 | SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models | Ziyi Lin et.al. | 2311.07575v1 | link |
2023-11-13 | Bio-Inspired Grasping Controller for Sensorized 2-DoF Grippers | Luca Lach et.al. | 2311.07257v1 | link |
2023-11-10 | CESPED: a new benchmark for supervised particle pose estimation in Cryo-EM | Ruben Sanchez-Garcia et.al. | 2311.06194v1 | link |
2023-11-10 | 2D Image head pose estimation via latent space regression under occlusion settings | José Celestino et.al. | 2311.06038v1 | link |
2023-11-10 | Robust Adversarial Attacks Detection for Deep Learning based Relative Pose Estimation for Space Rendezvous | Ziwei Wang et.al. | 2311.05992v1 | null |
2023-11-10 | A Practical Guide to Implementing Off-Axis Stereo Projection Using Existing Ray Tracing Libraries | Stefan Zellmann et.al. | 2311.05887v1 | link |
2023-11-09 | Visually Guided Model Predictive Robot Control via 6D Object Pose Localization and Tracking | Mederic Fourmy et.al. | 2311.05344v1 | null |
2023-11-09 | Spatial Attention-based Distribution Integration Network for Human Pose Estimation | Sihan Gao et.al. | 2311.05323v1 | null |
2023-11-09 | SPADES: A Realistic Spacecraft Pose Estimation Dataset using Event Sensing | Arunkumar Rathinam et.al. | 2311.05310v1 | null |
2023-11-09 | Differentiable Cloth Parameter Identification and State Estimation in Manipulation | Dongzhe Zheng et.al. | 2311.05141v1 | null |
2023-11-09 | POISE: Pose Guided Human Silhouette Extraction under Occlusions | Arindam Dutta et.al. | 2311.05077v1 | link |
2023-11-08 | Active Transfer Learning for Efficient Video-Specific Human Pose Estimation | Hiromu Taketsugu et.al. | 2311.05041v1 | link |
2023-11-08 | 3D Pose Estimation of Tomato Peduncle Nodes using Deep Keypoint Detection and Point Cloud | Jianchao Ci et.al. | 2311.04699v1 | null |
2023-11-09 | Rethinking Human Pose Estimation for Autonomous Driving with 3D Event Representations | Xiaoting Yin et.al. | 2311.04591v2 | link |
2023-11-08 | Learning Robust Multi-Scale Representation for Neural Radiance Fields from Unposed Images | Nishant Jain et.al. | 2311.04521v1 | null |
2023-11-08 | PLV-IEKF: Consistent Visual-Inertial Odometry using Points, Lines, and Vanishing Points | Tong Hua et.al. | 2311.04477v1 | null |
2023-11-08 | UP-NeRF: Unconstrained Pose-Prior-Free Neural Radiance Fields | Injae Kim et.al. | 2311.03784v2 | link |
2023-11-06 | A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose Estimation | Qitao Zhao et.al. | 2311.03312v1 | null |
2023-11-06 | Enabling In-Situ Resources Utilisation by leveraging collaborative robotics and astronaut-robot interaction | Silvia Romero-Azpitarte et.al. | 2311.03146v1 | null |
2023-11-06 | Simultaneous Time Synchronization and Mutual Localization for Multi-robot System | Xiangyong Wen et.al. | 2311.02948v1 | null |
2023-11-06 | Initialisation of Autonomous Aircraft Visual Inspection Systems via CNN-Based Camera Pose Estimation | Xueyan Oh et.al. | 2311.02900v1 | null |
2023-11-06 | Efficient, Self-Supervised Human Pose Estimation with Inductive Prior Tuning | Nobline Yoo et.al. | 2311.02815v1 | link |
2023-11-03 | Generating Unbiased Pseudo-labels via a Theoretically Guaranteed Chebyshev Constraint to Unify Semi-supervised Classification and Regression | Jiaqi Wu et.al. | 2311.01782v1 | link |
2023-11-03 | Modeling the Uncertainty with Maximum Discrepant Students for Semi-supervised 2D Pose Estimation | Jiaqi Wu et.al. | 2311.01770v1 | null |
2023-11-02 | Sim2Real Bilevel Adaptation for Object Surface Classification using Vision-Based Tactile Sensors | Gabriele M. Caddeo et.al. | 2311.01380v1 | link |
2023-11-01 | A Spatial-Temporal Transformer based Framework For Human Pose Assessment And Correction in Education Scenarios | Wenyang Hu et.al. | 2311.00401v1 | null |
2023-10-31 | HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception | Junkun Yuan et.al. | 2310.20695v1 | link |
2023-10-31 | Pose-to-Motion: Cross-Domain Motion Retargeting with Pose Prior | Qingqing Zhao et.al. | 2310.20249v1 | null |
2023-10-30 | FetusMapV2: Enhanced Fetal Pose Estimation in 3D Ultrasound | Chaoyu Chen et.al. | 2310.19293v1 | null |
2023-10-29 | Distributed Nonlinear Filtering using Triangular Transport Maps | Daniel Grange et.al. | 2310.19000v1 | null |
2023-10-29 | TIC-TAC: A Framework To Learn And Evaluate Your Covariance | Megh Shukla et.al. | 2310.18953v1 | link |
2023-10-29 | Improving Multi-Person Pose Tracking with A Confidence Network | Zehua Fu et.al. | 2310.18920v1 | null |
2023-10-29 | HDMNet: A Hierarchical Matching Network with Double Attention for Large-scale Outdoor LiDAR Point Cloud Registration | Weiyi Xue et.al. | 2310.18874v1 | null |
2023-10-28 | Enhancing Grasping Performance of Novel Objects through an Improved Fine-Tuning Process | Xiao Hu et.al. | 2310.18569v1 | null |
2023-10-27 | ProcNet: Deep Predictive Coding Model for Robust-to-occlusion Visual Segmentation and Pose Estimation | Michael Zechmair et.al. | 2310.18009v1 | null |
2023-10-26 | Learning Extrinsic Dexterity with Parameterized Manipulation Primitives | Shih-Min Yang et.al. | 2310.17785v1 | null |
2023-10-26 | 6-DoF Stability Field via Diffusion Models | Takuma Yoneda et.al. | 2310.17649v1 | null |
2023-10-26 | SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation | Haobo Jiang et.al. | 2310.17359v1 | null |
2023-10-26 | Automatic Edge Error Judgment in Figure Skating Using 3D Pose Estimation from a Monocular Camera and IMUs | Ryota Tanaka et.al. | 2310.17193v1 | link |
2023-10-25 | Real-time 6-DoF Pose Estimation by an Event-based Camera using Active LED Markers | Gerald Ebmer et.al. | 2310.16618v1 | null |
2023-10-25 | ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors | Xiaoxuan Ma et.al. | 2310.16447v1 | link |
2023-10-25 | MotionAGFormer: Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network | Soroush Mehraban et.al. | 2310.16288v1 | link |
2023-10-25 | TransPose: 6D Object Pose Estimation with Geometry-Aware Transformer | Xiao Lin et.al. | 2310.16279v1 | null |
2023-10-23 | Converting Depth Images and Point Clouds for Feature-based Pose Estimation | Robert Lösch et.al. | 2310.14924v1 | link |
2023-10-23 | Object Pose Estimation Annotation Pipeline for Multi-view Monocular Camera Systems in Industrial Settings | Hazem Youssef et.al. | 2310.14914v1 | null |
2023-10-23 | Player Re-Identification Using Body Part Appearences | Mahesh Bhosale et.al. | 2310.14469v1 | null |
2023-10-20 | LanPose: Language-Instructed 6D Object Pose Estimation for Robotic Assembly | Bowen Fu et.al. | 2310.13819v1 | null |
2023-10-20 | FMRT: Learning Accurate Feature Matching with Reconciliatory Transformer | Xinyu Zhang et.al. | 2310.13605v1 | null |
2023-10-20 | ColAG: A Collaborative Air-Ground Framework for Perception-Limited UGVs' Navigation | Zhehan Li et.al. | 2310.13324v1 | link |
2023-10-20 | CylinderTag: An Accurate and Flexible Marker for Cylinder-Shape Objects Pose Estimation Based on Projective Invariants | Shaoan Wang et.al. | 2310.13320v1 | link |
2023-10-19 | Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey | Lijuan Zhou et.al. | 2310.13039v1 | null |
2023-10-19 | FSD: Fast Self-Supervised Single RGB-D to Categorical 3D Objects | Mayank Lunayach et.al. | 2310.12974v1 | link |
2023-10-18 | Mesh Represented Recycle Learning for 3D Hand Pose and Mesh Estimation | Bosang Kim et.al. | 2310.12189v1 | null |
2023-10-18 | One-Shot Imitation Learning: A Pose Estimation Perspective | Pietro Vitiello et.al. | 2310.12077v1 | null |
2023-10-18 | ShapeGraFormer: GraFormer-Based Network for Hand-Object Reconstruction from a Single Depth Map | Ahmed Tawfik Aboukhadra et.al. | 2310.11811v1 | null |
2023-10-17 | Holistic Parking Slot Detection with Polygon-Shaped Representations | Lihao Wang et.al. | 2310.11629v1 | null |
2023-10-17 | Diver Interest via Pointing in Three Dimensions: 3D Pointing Reconstruction for Diver-AUV Communication | Chelsey Edge et.al. | 2310.11536v1 | null |
2023-10-18 | AP $n$P: A Less-constrained P$n$ P Solver for Pose Estimation with Unknown Anisotropic Scaling or Focal Lengths | Jiaxin Wei et.al. | 2310.09982v2 | link |
2023-10-15 | Tabletop Transparent Scene Reconstruction via Epipolar-Guided Optical Flow with Monocular Depth Completion Prior | Xiaotong Chen et.al. | 2310.09956v1 | null |
2023-10-15 | Socially reactive navigation models for mobile robots in dynamic environments | Ricarte Ribeiro et.al. | 2310.09916v1 | link |
2023-10-15 | MoEmo Vision Transformer: Integrating Cross-Attention and Movement Vectors in 3D Pose Estimation for HRI Emotion Detection | David C. Jeong et.al. | 2310.09757v1 | link |
2023-10-16 | IMU Preintegration for Multi-Robot Systems in the Presence of Bias and Communication Constraints | Mohammed Ayman Shalaby et.al. | 2310.08686v2 | null |
2023-10-12 | Towards Design and Development of an ArUco Markers-Based Quantitative Surface Tactile Sensor | Ozdemir Can Kara et.al. | 2310.08398v1 | null |
2023-10-12 | Multimodal Active Measurement for Human Mesh Recovery in Close Proximity | Takahiro Maeda et.al. | 2310.08116v1 | link |
2023-10-12 | X-HRNet: Towards Lightweight Human Pose Estimation with Spatially Unidimensional Self-Attention | Yixuan Zhou et.al. | 2310.08042v1 | link |
2023-10-12 | PoRF: Pose Residual Field for Accurate Neural Surface Reconstruction | Jia-Wang Bian et.al. | 2310.07449v2 | link |
2023-10-11 | SAGE-ICP: Semantic Information-Assisted ICP | Jiaming Cui et.al. | 2310.07237v1 | link |
2023-10-11 | DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation | Rong Wang et.al. | 2310.07206v1 | link |
2023-10-12 | FABind: Fast and Accurate Protein-Ligand Binding | Qizhi Pei et.al. | 2310.06763v2 | link |
2023-10-10 | EARL: Eye-on-Hand Reinforcement Learner for Dynamic Grasping with Active Pose Estimation | Baichuan Huang et.al. | 2310.06751v1 | null |
2023-10-09 | Augmenting Vision-Based Human Pose Estimation with Rotation Matrix | Milad Vazan et.al. | 2310.06068v1 | null |
2023-10-07 | Federated Self-Supervised Learning of Monocular Depth Estimators for Autonomous Vehicles | Elton F. de S. Soares et.al. | 2310.04837v1 | null |
2023-10-10 | 1st Place Solution of Egocentric 3D Hand Pose Estimation Challenge 2023 Technical Report:A Concise Pipeline for Egocentric Hand Pose Reconstruction | Zhishan Zhou et.al. | 2310.04769v2 | null |
2023-10-06 | SwimXYZ: A large-scale dataset of synthetic swimming motions and videos | Fiche Guénolé et.al. | 2310.04360v1 | null |
2023-10-05 | BID-NeRF: RGB-D image pose estimation with inverted Neural Radiance Fields | Ágoston István Csehi et.al. | 2310.03563v1 | null |
2023-10-05 | 3D-Aware Hypothesis & Verification for Generalizable Relative Object Pose Estimation | Chen Zhao et.al. | 2310.03534v1 | null |
2023-10-05 | RGBManip: Monocular Image-based Robotic Manipulation through Active Object Pose Estimation | Boshi An et.al. | 2310.03478v1 | null |
2023-10-05 | Cyber Physical System Information Collection: Robot Location and Navigation Method Based on QR Code | Hongwei Li et.al. | 2310.03470v1 | null |
2023-10-04 | Condition numbers in multiview geometry, instability in relative pose estimation, and RANSAC | Hongyi Fan et.al. | 2310.02719v1 | null |
2023-10-05 | USB-NeRF: Unrolling Shutter Bundle Adjusted Neural Radiance Fields | Moyang Li et.al. | 2310.02687v2 | link |
2023-10-03 | Beyond the Benchmark: Detecting Diverse Anomalies in Videos | Yoav Arad et.al. | 2310.01904v1 | link |
2023-10-03 | MFOS: Model-Free & One-Shot Object Pose Estimation | JongMin Lee et.al. | 2310.01897v1 | null |
2023-10-02 | LEAP: Liberate Sparse-view 3D Modeling from Camera Poses | Hanwen Jiang et.al. | 2310.01410v1 | link |
2023-10-02 | H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation | Yanjie Ze et.al. | 2310.01404v1 | link |
2023-10-04 | Self-supervised Learning of Contextualized Local Visual Embeddings | Thalles Santos Silva et.al. | 2310.00527v3 | link |
2023-09-30 | Diff-DOPE: Differentiable Deep Object Pose Estimation | Jonathan Tremblay et.al. | 2310.00463v1 | null |
2023-09-29 | Diver Identification Using Anthropometric Data Ratios for Underwater Multi-Human-Robot Collaboration | Jungseok Hong et.al. | 2310.00146v1 | null |
2023-09-29 | Denoising and Selecting Pseudo-Heatmaps for Semi-Supervised Human Pose Estimation | Zhuoran Yu et.al. | 2310.00099v1 | null |
2023-09-29 | Revisiting Cephalometric Landmark Detection from the view of Human Pose Estimation with Lightweight Super-Resolution Head | Qian Wu et.al. | 2309.17143v1 | link |
2023-09-29 | AdaPose: Towards Cross-Site Device-Free Human Pose Estimation with Commodity WiFi | Yunjiao Zhou et.al. | 2309.16964v1 | null |
2023-09-28 | End-to-End (Instance)-Image Goal Navigation through Correspondence as an Emergent Phenomenon | Guillaume Bono et.al. | 2309.16634v1 | null |
2023-09-28 | Off-the-shelf bin picking workcell with visual pose estimation: A case study on the world robot summit 2018 kitting task | Frederik Hagelskjær et.al. | 2309.16221v1 | null |
2023-09-28 | Cloth2Body: Generating 3D Human Body Mesh from 2D Clothing | Lu Dai et.al. | 2309.16189v1 | null |
2023-09-28 | Laboratory Automation: Precision Insertion with Adaptive Fingers utilizing Contact through Sliding with Tactile-based Pose Estimation | Sameer Pai et.al. | 2309.16170v1 | null |
2023-09-28 | CLIP-Hand3D: Exploiting 3D Hand Pose Estimation via Context-Aware Prompting | Shaoxiang Guo et.al. | 2309.16140v1 | null |
2023-09-28 | A Modular Bio-inspired Robotic Hand with High Sensitivity | Chao Liu et.al. | 2309.16081v1 | null |
2023-09-27 | Handbook on Leveraging Lines for Two-View Relative Pose Estimation | Petr Hruby et.al. | 2309.16040v1 | null |
2023-09-27 | Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature | Shengze Jin et.al. | 2309.16023v1 | null |
2023-09-27 | Analysis on Multi-robot Relative 6-DOF Pose Estimation Error Based on UWB Range | Xinran Li et.al. | 2309.15367v1 | null |
2023-09-26 | Unsupervised Reconstruction of 3D Human Pose Interactions From 2D Poses Alone | Peter Hardy et.al. | 2309.14865v1 | null |
2023-09-26 | Learning Vision-Based Bipedal Locomotion for Challenging Terrain | Helei Duan et.al. | 2309.14594v1 | null |
2023-09-25 | Spring-IMU Fusion Based Proprioception for Feedback Control of Soft Manipulators | Yinan Meng et.al. | 2309.14279v1 | null |
2023-09-25 | Industrial Application of 6D Pose Estimation for Robotic Manipulation in Automotive Internal Logistics | Philipp Quentin et.al. | 2309.14265v1 | null |
2023-09-25 | BoIR: Box-Supervised Instance Representation for Multi-Person Pose Estimation | Uyoung Jeong et.al. | 2309.14072v1 | link |
2023-09-24 | Towards Subcentimeter Accuracy Digital-Twin Tracking via An RGBD-based Transformer Model and A Comprehensive Mobile Dataset | Zixun Huang et.al. | 2309.13570v1 | link |
2023-09-21 | ORTexME: Occlusion-Robust Human Shape and Pose via Temporal Average Texture and Mesh Encoding | Yu Cheng et.al. | 2309.12183v1 | null |
2023-09-21 | ZS6D: Zero-shot 6D Object Pose Estimation using Vision Transformers | Philipp Ausserlechner et.al. | 2309.11986v1 | null |
2023-09-21 | Ego3DPose: Capturing 3D Cues from Binocular Egocentric Views | Taeho Kang et.al. | 2309.11962v1 | link |
2023-09-21 | A Real-Time Multi-Task Learning System for Joint Detection of Face, Facial Landmark and Head Pose | Qingtian Wu et.al. | 2309.11773v1 | null |
2023-09-20 | Understanding Pose and Appearance Disentanglement in 3D Human Pose Estimation | Krishna Kanth Nakka et.al. | 2309.11667v1 | null |
2023-09-20 | Online Supervised Training of Spaceborne Vision during Proximity Operations using Adaptive Kalman Filtering | Tae Ha Park et.al. | 2309.11645v1 | null |
2023-09-20 | OCC-VO: Dense Mapping via 3D Occupancy-Based Visual Odometry for Autonomous Driving | Heng Li et.al. | 2309.11011v1 | link |
2023-09-19 | Language-Conditioned Affordance-Pose Detection in 3D Point Clouds | Toan Nguyen et.al. | 2309.10911v1 | null |
2023-09-19 | MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings | Surbhi Madan et.al. | 2309.10765v1 | link |
2023-09-19 | SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction | Anilkumar Swamy et.al. | 2309.10748v1 | null |
2023-09-20 | GloPro: Globally-Consistent Uncertainty-Aware 3D Human Pose Estimation & Tracking in the Wild | Simon Schaefer et.al. | 2309.10369v2 | null |
2023-09-19 | RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery | Jiaxin Wei et.al. | 2309.10255v1 | link |
2023-09-18 | Hierarchical Attention and Graph Neural Networks: Toward Drift-Free Pose Estimation | Kathia Melbouci et.al. | 2309.09934v1 | null |
2023-09-18 | Application-driven Validation of Posteriors in Inverse Problems | Tim J. Adler et.al. | 2309.09764v1 | null |
2023-09-18 | RIDE: Self-Supervised Learning of Rotation-Equivariant Keypoint Detection and Invariant Description for Endoscopy | Mert Asim Karaoglu et.al. | 2309.09563v1 | null |
2023-09-18 | Sparse and Privacy-enhanced Representation for Human Pose Estimation | Ting-Ying Lin et.al. | 2309.09515v1 | null |
2023-09-19 | RenderIH: A Large-scale Synthetic Dataset for 3D Interacting Hand Pose Estimation | Lijun Li et.al. | 2309.09301v2 | link |
2023-09-16 | Optimal Initialization Strategies for Range-Only Trajectory Estimation | Abhishek Goudar et.al. | 2309.09011v1 | null |
2023-09-16 | DynaMoN: Motion-Aware Fast And Robust Camera Localization for Dynamic NeRF | Mert Asim Karaoglu et.al. | 2309.08927v1 | link |
2023-09-16 | Outram: One-shot Global Localization via Triangulated Scene Graph and Global Outlier Pruning | Pengyu Yin et.al. | 2309.08914v1 | link |
2023-09-15 | Towards Robust and Smooth 3D Multi-Person Pose Estimation from Monocular Videos in the Wild | Sungchan Park et.al. | 2309.08644v1 | null |
2023-09-15 | YCB-Ev: Event-vision dataset for 6DoF object pose estimation | Pavel Rojtberg et.al. | 2309.08482v1 | link |
2023-09-15 | Fast and Accurate Deep Loop Closing and Relocalization for Reliable LiDAR SLAM | Chenghao Shi et.al. | 2309.08086v1 | null |
2023-09-14 | Gradient based Grasp Pose Optimization on a NeRF that Approximates Grasp Success | Gergely Sóti et.al. | 2309.08040v1 | null |
2023-09-14 | TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting | Rohan Choudhury et.al. | 2309.07910v1 | null |
2023-09-14 | Towards Robust and Unconstrained Full Range of Rotation Head Pose Estimation | Thorsten Hempel et.al. | 2309.07654v1 | link |
2023-09-14 | EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization | Minjung Kim et.al. | 2309.07471v1 | link |
2023-09-14 | Unleashing the Power of Depth and Pose Estimation Neural Networks by Designing Compatible Endoscopic Images | Junyang Wu et.al. | 2309.07390v1 | null |
2023-09-13 | LInKs "Lifting Independent Keypoints" -- Partial Pose Lifting for Occlusion Handling with Improved Accuracy in 2D-3D Human Pose Estimation | Peter Hardy et.al. | 2309.07243v1 | null |
2023-09-13 | 3D Active Metric-Semantic SLAM | Yuezhan Tao et.al. | 2309.06950v1 | null |
2023-09-11 | ViHOPE: Visuotactile In-Hand Object 6D Pose Estimation with Shape Completion | Hongyu Li et.al. | 2309.05662v1 | null |
2023-09-11 | Towards Intuitive HMI for UAV Control | Filip Zoric et.al. | 2309.05460v1 | null |
2023-09-12 | FreeMan: Towards Benchmarking 3D Human Pose Estimation in the Wild | Jiong Wang et.al. | 2309.05073v2 | link |
2023-09-09 | Probabilistic Triangulation for Uncalibrated Multi-View 3D Human Pose Estimation | Boyuan Jiang et.al. | 2309.04756v1 | link |
2023-09-09 | Mirror-Aware Neural Humans | Daniel Ajisafe et.al. | 2309.04750v1 | link |
2023-09-08 | Robot Localization and Mapping Final Report -- Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry | Akankshya Kar et.al. | 2309.04147v1 | null |
2023-09-07 | ArtiGrasp: Physically Plausible Synthesis of Bi-Manual Dexterous Grasping and Articulation | Hui Zhang et.al. | 2309.03891v1 | null |
2023-09-05 | An automated, high-resolution phenotypic assay for adult Brugia malayi and microfilaria | Upender Kalwa et.al. | 2309.03235v1 | null |
2023-09-05 | A Robust Localization Solution for an Uncrewed Ground Vehicle in Unstructured Outdoor GNSS-Denied Environments | W. Jacob Wagner et.al. | 2309.02569v1 | null |
2023-09-05 | GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction | Youmin Zhang et.al. | 2309.02436v1 | link |
2023-09-05 | DR-Pose: A Two-stage Deformation-and-Registration Pipeline for Category-level 6D Object Pose Estimation | Lei Zhou et.al. | 2309.01925v1 | link |
2023-09-04 | On the Query Strategies for Efficient Online Active Distillation | Michele Boldo et.al. | 2309.01612v1 | null |
2023-09-04 | DiffHPE: Robust, Coherent 3D Human Pose Lifting with Diffusion | Cédric Rommel et.al. | 2309.01575v1 | null |
2023-09-06 | Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation | Hanbing Liu et.al. | 2309.01365v2 | link |
2023-09-04 | SKoPe3D: A Synthetic Dataset for Vehicle Keypoint Perception in 3D from Traffic Monitoring Cameras | Himanshu Pahadia et.al. | 2309.01324v1 | null |
2023-09-03 | BodySLAM++: Fast and Tightly-Coupled Visual-Inertial Camera and Human Motion Tracking | Dorian F. Henning et.al. | 2309.01236v1 | null |
2023-09-02 | Mitigating Motion Blur for Robust 3D Baseball Player Pose Modeling for Pitch Analysis | Jerrin Bright et.al. | 2309.01010v1 | null |
2023-09-01 | Fusing Monocular Images and Sparse IMU Signals for Real-time Human Motion Capture | Shaohua Pan et.al. | 2309.00310v1 | link |
2023-08-31 | EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild | Manuel Kaufmann et.al. | 2308.16894v1 | link |
2023-08-31 | SA6D: Self-Adaptive Few-Shot 6D Pose Estimator for Novel and Occluded Objects | Ning Gao et.al. | 2308.16528v1 | null |
2023-08-30 | Two-Stage Violence Detection Using ViTPose and Classification Models at Smart Airports | İrem Üstek et.al. | 2308.16325v1 | link |
2023-08-30 | SignDiff: Learning Diffusion Models for American Sign Language Production | Sen Fang et.al. | 2308.16082v1 | null |
2023-08-30 | Learning Structure-from-Motion with Graph Attention Networks | Lucas Brynte et.al. | 2308.15984v1 | link |
2023-08-30 | Reconstructing Groups of People with Hypergraph Relational Reasoning | Buzhen Huang et.al. | 2308.15844v1 | link |
2023-08-29 | 3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking | Urs Waldmann et.al. | 2308.15316v1 | link |
2023-08-29 | Spatio-temporal MLP-graph network for 3D human pose estimation | Tanvir Hassan et.al. | 2308.15313v1 | link |
2023-08-29 | Pose-Free Neural Radiance Fields via Implicit Pose Regularization | Jiahui Zhang et.al. | 2308.15049v1 | null |
2023-08-28 | R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras | Aron Schmied et.al. | 2308.14713v1 | null |
2023-08-28 | Video-Based Hand Pose Estimation for Remote Assessment of Bradykinesia in Parkinson's Disease | Gabriela T. Acevedo Trebbau et.al. | 2308.14679v1 | null |
2023-08-28 | Active Pose Refinement for Textureless Shiny Objects using the Structured Light Camera | Jun Yang et.al. | 2308.14665v1 | null |
2023-08-28 | CPFES: Physical Fitness Evaluation Based on Canadian Agility and Movement Skill Assessment | Pengcheng Dong et.al. | 2308.14324v1 | null |
2023-08-27 | LDL: Line Distance Functions for Panoramic Localization | Junho Kim et.al. | 2308.13989v1 | link |
2023-08-26 | Prior-guided Source-free Domain Adaptation for Human Pose Estimation | Dripta S. Raychaudhuri et.al. | 2308.13954v1 | null |
2023-08-26 | Vision-Based Human Pose Estimation via Deep Learning: A Survey | Gongjin Lan et.al. | 2308.13872v1 | null |
2023-08-24 | POCO: 3D Pose and Shape Estimation with Confidence | Sai Kumar Dwivedi et.al. | 2308.12965v1 | link |
2023-08-24 | Robot Pose Nowcasting: Forecast the Future to Improve the Present | Alessandro Simoni et.al. | 2308.12914v1 | null |
2023-08-23 | Certifiably Optimal Rotation and Pose Estimation Based on the Cayley Map | Timothy D Barfoot et.al. | 2308.12418v1 | null |
2023-08-22 | Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape | Jiacong Xu et.al. | 2308.11737v1 | null |
2023-08-22 | TrackFlow: Multi-Object Tracking with Normalizing Flows | Gianluca Mancusi et.al. | 2308.11513v1 | null |
2023-08-22 | A LiDAR-Inertial SLAM Tightly-Coupled with Dropout-Tolerant GNSS Fusion for Autonomous Mine Service Vehicles | Yusheng Wang et.al. | 2308.11492v1 | null |
2023-08-22 | PoseGraphNet++: Enriching 3D Human Pose with Orientation Estimation | Soubarna Banik et.al. | 2308.11440v1 | null |
2023-08-22 | Novel-view Synthesis and Pose Estimation for Hand-Object Interaction from Sparse Views | Wentian Qu et.al. | 2308.11198v1 | null |
2023-08-21 | Spectral Graphormer: Spectral Graph-based Transformer for Egocentric Two-Hand Reconstruction using Multi-View Color Images | Tze Ho Elden Tse et.al. | 2308.11015v1 | null |
2023-08-21 | Polarimetric Information for Multi-Modal 6D Pose Estimation of Photometrically Challenging Objects with Limited Data | Patrick Ruhkamp et.al. | 2308.10627v1 | null |
2023-08-21 | GaitPT: Skeletons Are All You Need For Gait Recognition | Andy Catruna et.al. | 2308.10623v1 | null |
2023-08-21 | Approximately Equivariant Graph Networks | Ningyuan Huang et.al. | 2308.10436v1 | link |
2023-08-21 | In-Rack Test Tube Pose Estimation Using RGB-D Data | Hao Chen et.al. | 2308.10411v1 | null |
2023-08-20 | Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video | Yingxuan You et.al. | 2308.10305v1 | link |
2023-08-20 | OCHID-Fi: Occlusion-Robust Hand Pose Estimation in 3D via RF-Vision | Shujie Zhang et.al. | 2308.10146v1 | link |
2023-08-19 | 3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation | Yi Zhang et.al. | 2308.10123v1 | link |
2023-08-19 | Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation | Yang Hai et.al. | 2308.10016v1 | link |
2023-08-19 | UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning | Meiqi Sun et.al. | 2308.09953v1 | null |
2023-08-22 | Scene-Aware Feature Matching | Xiaoyong Lu et.al. | 2308.09949v2 | null |
2023-08-18 | PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation | Hanbing Liu et.al. | 2308.09678v1 | link |
2023-08-18 | Improving 3D Pose Estimation for Sign Language | Maksym Ivashechkin et.al. | 2308.09525v1 | null |
2023-08-18 | Denoising Diffusion for 3D Hand Pose Estimation from Images | Maksym Ivashechkin et.al. | 2308.09523v1 | null |
2023-08-18 | ResQ: Residual Quantization for Video Perception | Davide Abati et.al. | 2308.09511v1 | null |
2023-08-17 | MovePose: A High-performance Human Pose Estimation Algorithm on Mobile and Edge Devices | Dongyang Yu et.al. | 2308.09084v1 | null |
2023-08-17 | Pedestrian Environment Model for Automated Driving | Adrian Holzbock et.al. | 2308.09080v1 | link |
2023-08-17 | Exploiting Point-Wise Attention in 6D Object Pose Estimation Based on Bidirectional Prediction | Yuhao Yang et.al. | 2308.08518v2 | null |
2023-08-16 | View Consistent Purification for Accurate Cross-View Localization | Shan Wang et.al. | 2308.08110v1 | null |
2023-08-15 | Learning Better Keypoints for Multi-Object 6DoF Pose Estimation | Yangzheng Wu et.al. | 2308.07827v1 | link |
2023-08-14 | Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation | Huan Liu et.al. | 2308.07313v1 | link |
2023-08-12 | 4DRVO-Net: Deep 4D Radar-Visual Odometry Using Multi-Modal and Multi-Scale Adaptive Fusion | Guirong Zhuo et.al. | 2308.06573v1 | null |
2023-08-17 | EgoPoser: Robust Real-Time Ego-Body Pose Estimation in Large Scenes | Jiaxi Jiang et.al. | 2308.06493v2 | null |
2023-08-11 | Aggressive Aerial Grasping using a Soft Drone with Onboard Perception | Samuel Ubellacker et.al. | 2308.06351v1 | null |
2023-08-11 | VERF: Runtime Monitoring of Pose Estimation with Neural Radiance Fields | Dominic Maggio et.al. | 2308.05939v1 | null |
2023-08-10 | Toward Globally Optimal State Estimation Using Automatically Tightened Semidefinite Relaxations | Frederike Dümbgen et.al. | 2308.05783v1 | link |
2023-08-10 | KS-APR: Keyframe Selection for Robust Absolute Pose Regression | Changkun Liu et.al. | 2308.05459v1 | null |
2023-08-10 | How-to Augmented Lagrangian on Factor Graphs | Barbara Bazzana et.al. | 2308.05444v1 | null |
2023-08-10 | Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation | Jun Zhou et.al. | 2308.05438v1 | link |
2023-08-10 | Robust Localization with Visual-Inertial Odometry Constraints for Markerless Mobile AR | Changkun Liu et.al. | 2308.05394v1 | null |
2023-08-10 | Double-chain Constraints for 3D Human Pose Estimation in Images and Videos | Hongbo Kang et.al. | 2308.05298v1 | link |
2023-08-09 | ACE-HetEM for ab initio Heterogenous Cryo-EM 3D Reconstruction | Weijie Chen et.al. | 2308.04956v1 | null |
2023-08-07 | SEM-GAT: Explainable Semantic Pose Estimation using Learned Graph Attention | Efimia Panagiotaki et.al. | 2308.03718v1 | link |
2023-08-07 | A Horse with no Labels: Self-Supervised Horse Pose Estimation from Unlabelled Images and Synthetic Prior | Jose Sosa et.al. | 2308.03411v1 | null |
2023-08-06 | Source-free Domain Adaptive Human Pose Estimation | Qucheng Peng et.al. | 2308.03202v1 | link |
2023-08-04 | Diffusion-Augmented Depth Prediction with Sparse Annotations | Jiaqi Li et.al. | 2308.02283v1 | null |
2023-08-04 | DTF-Net: Category-Level Pose Estimation and Shape Reconstruction via Deformable Template Field | Haowen Wang et.al. | 2308.02239v1 | null |
2023-08-07 | Robust Self-Supervised Extrinsic Self-Calibration | Takayuki Kanai et.al. | 2308.02153v2 | null |
2023-08-03 | Sim-to-Real Vision-depth Fusion CNNs for Robust Pose Estimation Aboard Autonomous Nano-quadcopter | Luca Crupi et.al. | 2308.01833v1 | null |
2023-08-03 | Active Acoustic Sensing for Robot Manipulation | Shihan Lu et.al. | 2308.01600v1 | null |
2023-08-02 | HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions | Andrew Guo et.al. | 2308.01477v1 | null |
2023-08-06 | Human-M3: A Multi-view Multi-modal Dataset for 3D Human Pose Estimation in Outdoor Scenes | Bohao Fan et.al. | 2308.00628v2 | link |
2023-08-01 | Markerless human pose estimation for biomedical applications: a survey | Andrea Avogaro et.al. | 2308.00519v1 | null |
2023-08-01 | Kidnapping Deep Learning-based Multirotors using Optimized Flying Adversarial Patches | Pia Hanfeld et.al. | 2308.00344v1 | link |
2023-08-01 | Fine-Grained Sports, Yoga, and Dance Postures Recognition: A Benchmark Analysis | Asish Bera et.al. | 2308.00323v1 | null |
2023-08-01 | Robust Single-view Cone-beam X-ray Pose Estimation with Neural Tuned Tomography (NeTT) and Masked Neural Radiance Fields (mNeRF) | Chaochao Zhou et.al. | 2308.00214v1 | null |
2023-07-31 | Lightweight Super-Resolution Head for Human Pose Estimation | Haonan Wang et.al. | 2307.16765v1 | link |
2023-07-31 | DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation | Runyang Feng et.al. | 2307.16687v1 | null |
2023-07-30 | Touch if it's transparent! ACTOR: Active Tactile-based Category-Level Transparent Object Reconstruction | Prajval Kumar Murali et.al. | 2307.16254v1 | null |
2023-07-30 | Successive Pose Estimation and Beam Tracking for mmWave Vehicular Communication Systems | Cen Liu et.al. | 2307.16117v1 | link |
2023-07-29 | Iterative Graph Filtering Network for 3D Human Pose Estimation | Zaedul Islam et.al. | 2307.16074v1 | link |
2023-07-29 | HandMIM: Pose-Aware Self-Supervised Learning for 3D Hand Mesh Estimation | Zuyan Liu et.al. | 2307.16061v1 | null |
2023-07-29 | Effective Whole-body Pose Estimation with Two-stages Distillation | Zhendong Yang et.al. | 2307.15880v1 | link |
2023-07-28 | TrackAgent: 6D Object Tracking via Reinforcement Learning | Konstantin Röhrl et.al. | 2307.15671v1 | null |
2023-07-28 | Revisiting Fully Convolutional Geometric Features for Object 6D Pose Estimation | Jaime Corsetti et.al. | 2307.15514v1 | link |
2023-07-28 | Robust Visual Sim-to-Real Transfer for Robotic Manipulation | Ricardo Garcia et.al. | 2307.15320v1 | null |
2023-07-27 | Weakly Supervised Multi-Modal 3D Human Body Pose Estimation for Autonomous Driving | Peter Bauer et.al. | 2307.14889v1 | null |
2023-07-26 | Attention of Robot Touch: Tactile Saliency Prediction for Robust Sim-to-Real Tactile Control | Yijiong Lin et.al. | 2307.14510v1 | null |
2023-07-28 | CBGL: Fast Monte Carlo Passive Global Localisation of 2D LIDAR Sensor | Alexandros Filotheou et.al. | 2307.14247v2 | link |
2023-07-26 | Deep Robust Multi-Robot Re-localisation in Natural Environments | Milad Ramezani et.al. | 2307.13950v1 | null |
2023-07-25 | Of Mice and Pose: 2D Mouse Pose Estimation from Unlabelled Data and Synthetic Prior | Jose Sosa et.al. | 2307.13361v1 | null |
2023-07-23 | TransNet: Transparent Object Manipulation Through Category-Level Pose Estimation | Huijie Zhang et.al. | 2307.12400v1 | null |
2023-07-25 | FDCT: Fast Depth Completion for Transparent Objects | Tianan Li et.al. | 2307.12274v2 | link |
2023-07-22 | Challenges for Monocular 6D Object Pose Estimation in Robotics | Stefan Thalhammer et.al. | 2307.12172v1 | null |
2023-07-22 | Pyramid Semantic Graph-based Global Point Cloud Registration with Low Overlap | Zhijian Qiao et.al. | 2307.12116v1 | link |
2023-07-22 | Robot Structure Prior Guided Temporal Attention for Camera-to-Robot Pose Estimation from Image Sequence | Yang Tian et.al. | 2307.12106v1 | link |
2023-07-26 | LAMP: Leveraging Language Prompts for Multi-person Pose Estimation | Shengnan Hu et.al. | 2307.11934v2 | link |
2023-07-21 | YOLOPose V2: Understanding and Improving Transformer-based 6D Pose Estimation | Arul Selvam Periyasamy et.al. | 2307.11550v1 | null |
2023-07-21 | KVN: Keypoints Voting Network with Differentiable RANSAC for Stereo Pose Estimation | Ivano Donadi et.al. | 2307.11543v1 | link |
2023-07-21 | Semantically-enhanced Deep Collision Prediction for Autonomous Navigation using Aerial Robots | Mihir Kulkarni et.al. | 2307.11522v1 | null |
2023-07-20 | SimCol3D -- 3D Reconstruction during Colonoscopy Challenge | Anita Rau et.al. | 2307.11261v1 | link |
2023-07-20 | MSQNet: Actor-agnostic Action Recognition with Multi-modal Query | Anindya Mondal et.al. | 2307.10763v1 | link |
2023-07-19 | POV-Surgery: A Dataset for Egocentric Hand and Tool Pose Estimation During Surgical Activities | Rui Wang et.al. | 2307.10387v1 | link |
2023-07-18 | ActionPrompt: Action-Guided 3D Human Pose Estimation With Text and Pose Prompting | Hongwei Zheng et.al. | 2307.09026v1 | null |
2023-07-17 | Human Emergency Detection during Autonomous Hospital Transports | Andreas Zachariae et.al. | 2307.08359v1 | link |
2023-07-17 | Self-supervised Monocular Depth Estimation: Let's Talk About The Weather | Kieran Saunders et.al. | 2307.08357v1 | null |
2023-07-20 | Boosting 3-DoF Ground-to-Satellite Camera Localization Accuracy via Geometry-Guided Cross-View Transformer | Yujiao Shi et.al. | 2307.08015v3 | link |
2023-07-15 | Tightly-Coupled LiDAR-Visual SLAM Based on Geometric Features for Mobile Agents | Ke Cao et.al. | 2307.07763v1 | null |
2023-07-13 | Haptic-guided assisted telemanipulation approach for grasping desired objects from heaps | Maxime Adjigble et.al. | 2307.07053v1 | null |
2023-07-13 | Improving 2D Human Pose Estimation across Unseen Camera Views with Synthetic Data | Miroslav Purkrábek et.al. | 2307.06737v1 | link |
2023-07-12 | Deep learning-based estimation of whole-body kinematics from multi-view images | Kien X. Nguyen et.al. | 2307.05896v1 | link |
2023-07-12 | GLA-GCN: Global-local Adaptive Graph Convolutional Network for 3D Human | Bruce X. B. Yu et.al. | 2307.05853v1 | link |
2023-07-09 | TransPose: A Transformer-based 6D Object Pose Estimation Network with Depth Refinement | Mahmoud Abdulsalam et.al. | 2307.05561v1 | null |
2023-07-11 | ResMatch: Residual Attention Learning for Local Feature Matching | Yuxin Deng et.al. | 2307.05180v1 | link |
2023-07-07 | Proximity and Visuotactile Point Cloud Fusion for Contact Patches in Extreme Deformation | Jessica Yin et.al. | 2307.03839v1 | null |
2023-07-07 | Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation | Zhongyu Jiang et.al. | 2307.03833v1 | link |
2023-07-07 | Equivariant Single View Pose Prediction Via Induced and Restricted Representations | Owen Howell et.al. | 2307.03704v1 | null |
2023-07-07 | RCDN -- Robust X-Corner Detection Algorithm based on Advanced CNN Model | Ben Chen et.al. | 2307.03505v1 | null |
2023-07-06 | Self-supervised Optimization of Hand Pose Estimation using Anatomical Features and Iterative Learning | Christian Jauch et.al. | 2307.03007v1 | null |
2023-07-06 | Recognition and Estimation of Human Finger Pointing with an RGB Camera for Robot Directive | Eran Bamani et.al. | 2307.02949v1 | null |
2023-07-06 | A Real-time Human Pose Estimation Approach for Optimal Sensor Placement in Sensor-based Human Activity Recognition | Orhan Konak et.al. | 2307.02906v1 | null |
2023-07-04 | Secure Deep Learning-based Distributed Intelligence on Pocket-sized Drones | Elia Cereda et.al. | 2307.01559v1 | null |
2023-07-03 | Joint Coordinate Regression and Association For Multi-Person Pose Estimation, A Pure Neural Network Approach | Dongyang Yu et.al. | 2307.01004v1 | null |
2023-07-01 | Automatic Solver Generator for Systems of Laurent Polynomial Equations | Evgeniy Martyushev et.al. | 2307.00320v1 | link |
2023-07-01 | SyMFM6D: Symmetry-aware Multi-directional Fusion for Multi-View 6D Object Pose Estimation | Fabian Duffhauss et.al. | 2307.00306v1 | link |
2023-06-30 | GIRA: Gaussian Mixture Models for Inference and Robot Autonomy | Kshitij Goel et.al. | 2307.00071v1 | link |
2023-06-30 | Towards the extraction of robust sign embeddings for low resource sign language recognition | Mathieu De Coster et.al. | 2306.17558v1 | null |
2023-06-30 | Fusion of Visual-Inertial Odometry with LiDAR Relative Localization for Cooperative Guidance of a Micro-Scale Aerial Vehicle | Václav Pritzl et.al. | 2306.17544v1 | link |
2023-06-30 | Locking On: Leveraging Dynamic Vehicle-Imposed Motion Constraints to Improve Visual Localization | Stephen Hausler et.al. | 2306.17529v1 | null |
2023-06-29 | ID-Pose: Sparse-view Camera Pose Estimation by Inverting Diffusion Models | Weihao Cheng et.al. | 2306.17140v1 | null |
2023-06-29 | Learning Structure-Guided Diffusion Model for 2D Human Pose Estimation | Zhongwei Qiu et.al. | 2306.17074v1 | null |
2023-06-28 | Hierarchical Graph Neural Networks for Proprioceptive 6D Pose Estimation of In-hand Objects | Alireza Rezazadeh et.al. | 2306.15858v1 | null |
2023-06-09 | Data-Link: High Fidelity Manufacturing Datasets for Model2Real Transfer under Industrial Settings | Sunny Katyara et.al. | 2306.05766v1 | null |
2023-05-28 | Counter-Hypothetical Particle Filters for Single Object Pose Tracking | Elizabeth A. Olson et.al. | 2305.17828v1 | null |
2023-05-25 | Enhanced 6D Pose Estimation for Robotic Fruit Picking | Marco Costanzo et.al. | 2305.15856v1 | null |
2023-05-22 | You Only Look at One: Category-Level Object Representations for Pose Estimation From a Single Example | Walter Goodwin et.al. | 2305.12626v1 | null |
2023-05-18 | Manifold-Aware Self-Training for Unsupervised Domain Adaptation on Regressing 6D Object Pose | Yichen Zhang et.al. | 2305.10808v1 | link |
2023-05-08 | RelPose++: Recovering 6D Poses from Sparse-view Observations | Amy Lin et.al. | 2305.04926v1 | link |
2023-04-17 | Uncovering the Background-Induced bias in RGB based 6-DoF Object Pose Estimation | Elena Govi et.al. | 2304.08230v1 | link |
2023-03-28 | CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects | Nick Heppert et.al. | 2303.15782v1 | link |
2023-03-23 | Prior-free Category-level Pose Estimation with Implicit Space Transformation | Jianhui Liu et.al. | 2303.13479v1 | link |
2023-06-21 | 6D Object Pose Estimation from Approximate 3D Models for Orbital Robotics | Maximilian Ulmer et.al. | 2303.13241v3 | null |
2023-03-22 | Rigidity-Aware Detection for 6D Object Pose Estimation | Yang Hai et.al. | 2303.12396v1 | link |
2023-03-22 | Object Pose Estimation with Statistical Guarantees: Conformal Keypoint Detection and Geometric Uncertainty Propagation | Heng Yang et.al. | 2303.12246v1 | link |
2023-03-21 | Linear-Covariance Loss for End-to-End Learning of 6D Pose Estimation | Fulin Liu et.al. | 2303.11516v1 | link |
2023-03-18 | SOCS: Semantically-aware Object Coordinate Space for Category-Level 6D Object Pose Estimation under Large Shape Variations | Boyan Wan et.al. | 2303.10346v1 | null |
2023-03-12 | Module-Wise Network Quantization for 6D Object Pose Estimation | Saqib Javed et.al. | 2303.06753v1 | link |
2023-03-09 | SpyroPose: Importance Sampling Pyramids for Object Pose Distribution Estimation in SE(3) | Rasmus Laurvig Haugaard et.al. | 2303.05308v1 | null |
2023-03-03 | Depth-based 6DoF Object Pose Estimation using Swin Transformer | Zhujun Li et.al. | 2303.02133v1 | link |
2023-03-02 | Canonical mapping as a general-purpose object descriptor for robotic manipulation | Benjamin Joffe et.al. | 2303.01331v1 | null |
2023-02-14 | MSDA: Monocular Self-supervised Domain Adaptation for 6D Object Pose Estimation | Dingding Cai et.al. | 2302.07300v1 | null |
2023-02-14 | Model-Based Underwater 6D Pose Estimation from RGB | Davide Sapienza et.al. | 2302.06821v1 | null |
2023-02-02 | A Projective Geometric View for 6D Pose Estimation in mmWave MIMO Systems | Shengqiang Shen et.al. | 2302.00227v2 | null |
2023-01-31 | Collision-aware In-hand 6D Object Pose Estimation using Multiple Vision-based Tactile Sensors | Gabriele M. Caddeo et.al. | 2301.13667v1 | link |
2023-01-19 | Learning ultrasound plane pose regression: assessing generalized pose coordinates in the fetal brain | Chiara Di Vece et.al. | 2301.08317v1 | null |
2023-01-19 | RGB-D-Based Categorical Object Pose and Shape Estimation: Methods, Datasets, and Evaluation | Leonard Bruns et.al. | 2301.08147v1 | link |
2022-12-21 | HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Pose Dataset with Household Objects in Realistic Scenarios | HyunJun Jung et.al. | 2212.10428v2 | link |
2022-12-13 | MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare | Yann Labbé et.al. | 2212.06870v1 | null |
2022-12-11 | Context-aware 6D Pose Estimation of Known Objects using RGB-D data | Ankit Kumar et.al. | 2212.05560v1 | null |
2023-01-30 | Category-Level 6D Object Pose Estimation with Flexible Vector-Based Rotation Representation | Wei Chen et.al. | 2212.04632v2 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-07-20 | Decision PCR: Decision version of the Point Cloud Registration task | Yaojie Zhang et.al. | 2507.14965v1 | null |
2025-07-19 | GPI-Net: Gestalt-Guided Parallel Interaction Network via Orthogonal Geometric Consistency for Robust Point Cloud Registration | Weikang Gu et.al. | 2507.14452v1 | null |
2025-07-16 | A Multi-Level Similarity Approach for Single-View Object Grasping: Matching, Planning, and Fine-Tuning | Hao Chen et.al. | 2507.11938v1 | null |
2025-07-09 | Diff |
Juncheng Mu et.al. | 2507.06651v1 | null |
2025-07-07 | Simultaneous Localization and Mapping Using Active mmWave Sensing in 5G NR | Tao Du et.al. | 2507.04662v1 | null |
2025-07-06 | Lidar Variability: A Novel Dataset and Comparative Study of Solid-State and Spinning Lidars | Doumegna Mawuto Koudjo Felix et.al. | 2507.04321v1 | null |
2025-07-03 | TurboReg: TurboClique for Robust and Efficient Point Cloud Registration | Shaocheng Yan et.al. | 2507.01439v2 | null |
2025-06-26 | CA-I2P: Channel-Adaptive Registration Network with Global Optimal Selection | Zhixin Cheng et.al. | 2506.21364v1 | null |
2025-06-18 | Correspondence-Free Multiview Point Cloud Registration via Depth-Guided Joint Optimisation | Yiran Zhou et.al. | 2506.18922v1 | null |
2025-06-16 | MT-PCR: A Hybrid Mamba-Transformer with Spatial Serialization for Hierarchical Point Cloud Registration | Bingxi Liu et.al. | 2506.13183v1 | null |
2025-06-13 | Robust Filtering -- Novel Statistical Learning and Inference Algorithms with Applications | Aamir Hussain Chughtai et.al. | 2506.11530v1 | null |
2025-06-05 | Rectified Point Flow: Generic Point Cloud Pose Estimation | Tao Sun et.al. | 2506.05282v1 | null |
2025-05-30 | A 3D Mobile Crowdsensing Framework for Sustainable Urban Digital Twins | Taku Yamazaki et.al. | 2505.24348v1 | null |
2025-05-23 | A Coarse to Fine 3D LiDAR Localization with Deep Local Features for Long Term Robot Navigation in Large Environments | Míriam Máximo et.al. | 2505.18340v1 | link |
2025-05-22 | D-LIO: 6DoF Direct LiDAR-Inertial Odometry based on Simultaneous Truncated Distance Field Mapping | Lucia Coto-Elena et.al. | 2505.16726v1 | link |
2025-05-19 | Cross-modal feature fusion for robust point cloud registration with ambiguous geometry | Zhaoyi Wang et.al. | 2505.13088v1 | link |
2025-05-17 | MonoMobility: Zero-Shot 3D Mobility Analysis from Monocular Videos | Hongyi Zhou et.al. | 2505.11868v1 | null |
2025-05-15 | VGC-RIO: A Tightly Integrated Radar-Inertial Odometry with Spatial Weighted Doppler Velocity and Local Geometric Constrained RCS Histograms | Jianguang Xiang et.al. | 2505.09103v2 | null |
2025-05-08 | An Efficient Method for Accurate Pose Estimation and Error Correction of Cuboidal Objects | Utsav Rai et.al. | 2505.04962v1 | null |
2025-05-07 | Registration of 3D Point Sets Using Exponential-based Similarity Matrix | Ashutosh Singandhupe et.al. | 2505.04540v1 | link |
2025-05-08 | FA-KPConv: Introducing Euclidean Symmetries to KPConv via Frame Averaging | Ali Alawieh et.al. | 2505.04485v2 | null |
2025-05-06 | Matching Distance and Geometric Distribution Aided Learning Multiview Point Cloud Registration | Shiqi Li et.al. | 2505.03692v1 | link |
2025-05-04 | Enhancing Lidar Point Cloud Sampling via Colorization and Super-Resolution of Lidar Imagery | Sier Ha et.al. | 2505.02049v1 | null |
2025-05-09 | 3D Hand-Eye Calibration for Collaborative Robot Arm: Look at Robot Base Once | Leihui Li et.al. | 2504.21619v2 | link |
2025-04-30 | Multiview Point Cloud Registration via Optimization in an Autoencoder Latent Space | Luc Vedrenne et.al. | 2504.21467v1 | null |
2025-04-10 | Investigating Vision-Language Model for Point Cloud-based Vehicle Classification | Yiqiao Li et.al. | 2504.08154v1 | null |
2025-04-09 | A Pointcloud Registration Framework for Relocalization in Subterranean Environments | David Akhihiero et.al. | 2504.07231v1 | null |
2025-04-09 | FACT: Multinomial Misalignment Classification for Point Cloud Registration | Ludvig Dillén et.al. | 2504.06627v1 | null |
2025-04-08 | Implementation of a Zed 2i Stereo Camera for High-Frequency Shoreline Change and Coastal Elevation Monitoring | José A. Pilartes-Congo et.al. | 2504.06464v1 | null |
2025-04-02 | Bridge 2D-3D: Uncertainty-aware Hierarchical Registration Network with Domain Alignment | Zhixin Cheng et.al. | 2504.01641v1 | null |
2025-03-21 | R2LDM: An Efficient 4D Radar Super-Resolution Framework Leveraging Diffusion Model | Boyuan Zheng et.al. | 2503.17097v1 | null |
2025-03-21 | ColabSfM: Collaborative Structure-from-Motion by Point Cloud Registration | Johan Edstedt et.al. | 2503.17093v1 | link |
2025-03-17 | MT-PCR: Leveraging Modality Transformation for Large-Scale Point Cloud Registration with Limited Overlap | Yilong Wu et.al. | 2503.12833v1 | null |
2025-03-13 | Unlocking Generalization Power in LiDAR Point Cloud Registration | Zhenxuan Zeng et.al. | 2503.10149v1 | link |
2025-03-11 | BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes | Minkyun Seo et.al. | 2503.07940v1 | link |
2025-03-10 | SANDRO: a Robust Solver with a Splitting Strategy for Point Cloud Registration | Michael Adlerstein et.al. | 2503.07743v1 | link |
2025-03-10 | HybridReg: Robust 3D Point Cloud Registration with Hybrid Motions | Keyu Du et.al. | 2503.07019v1 | link |
2025-03-07 | Diff-Reg v2: Diffusion-Based Matching Matrix Estimation for Image Matching and 3D Registration | Qianliang Wu et.al. | 2503.04127v2 | null |
2025-03-04 | HyperGCT: A Dynamic Hyper-GNN-Learned Geometric Constraint for 3D Registration | Xiyu Zhang et.al. | 2503.02195v1 | null |
2025-03-02 | Semantic-ICP: Iterative Closest Point for Non-rigid Multi-Organ Point Cloud Registration | Wanwen Chen et.al. | 2503.00972v1 | null |
2025-02-26 | BEV-LIO(LC): BEV Image Assisted LiDAR-Inertial Odometry with Loop Closure | Haoxin Cai et.al. | 2502.19242v1 | link |
2025-02-15 | Occlusion-aware Non-Rigid Point Cloud Registration via Unsupervised Neural Deformation Correntropy | Mingyang Zhao et.al. | 2502.10704v1 | link |
2025-02-12 | Fully-Geometric Cross-Attention for Point Cloud Registration | Weijie Wang et.al. | 2502.08285v1 | null |
2025-02-11 | Multiview Point Cloud Registration Based on Minimum Potential Energy for Free-Form Blade Measurement | Zijie Wu et.al. | 2502.07680v1 | null |
2025-02-10 | DefTransNet: A Transformer-based Method for Non-Rigid Point Cloud Registration in the Simulation of Soft Tissue Deformation | Sara Monji-Azad et.al. | 2502.06336v1 | null |
2025-02-05 | Mapping and Localization Using LiDAR Fiducial Markers | Yibo Liu et.al. | 2502.03510v1 | null |
2025-01-31 | A Direct Semi-Exhaustive Search Method for Robust, Partial-to-Full Point Cloud Registration | Richard Cheng et.al. | 2502.00115v1 | null |
2025-01-18 | PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration | Xiaoshui Huang et.al. | 2501.07762v2 | null |
2025-01-10 | LPRnet: A self-supervised registration network for LiDAR and photogrammetric point clouds | Chen Wang et.al. | 2501.05669v1 | null |
2025-01-09 | LP-ICP: General Localizability-Aware Point Cloud Registration for Robust Localization in Extreme Unstructured Environments | Haosong Yue et.al. | 2501.02580v2 | link |
2025-01-03 | MRG: A Multi-Robot Manufacturing Digital Scene Generation Method Using Multi-Instance Point Cloud Registration | Songjie Han et.al. | 2501.02041v1 | null |
2024-12-29 | Towards Explaining Uncertainty Estimates in Point Cloud Registration | Ziyuan Qin et.al. | 2412.20612v1 | null |
2024-12-26 | Resolving the Ambiguity of Complete-to-Partial Point Cloud Registration for Image-Guided Liver Surgery with Patches-to-Partial Matching | Zixin Yang et.al. | 2412.19328v1 | null |
2024-12-25 | Cross-PCR: A Robust Cross-Source Point Cloud Registration Framework | Guiyu Zhao et.al. | 2412.18873v1 | null |
2024-12-23 | PointVoxelFormer -- Reviving point cloud networks for 3D medical imaging | Mattias Paul Heinrich et.al. | 2412.17390v1 | null |
2024-12-19 | 3D Registration in 30 Years: A Survey | Jiaqi Yang et.al. | 2412.13735v2 | link |
2024-12-13 | TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes | Yan Xia et.al. | 2412.10308v1 | null |
2024-12-10 | A Real-time Degeneracy Sensing and Compensation Method for Enhanced LiDAR SLAM | Zongbo Liao et.al. | 2412.07513v1 | null |
2024-12-07 | AutoURDF: Unsupervised Robot Modeling from Point Cloud Frames Using Cluster Registration | Jiong Lin et.al. | 2412.05507v1 | null |
2024-12-06 | GS-Matching: Reconsidering Feature Matching task in Point Cloud Registration | Yaojie Zhang et.al. | 2412.04855v1 | null |
2024-12-04 | AffordDP: Generalizable Diffusion Policy with Transferable Affordance | Shijie Wu et.al. | 2412.03142v1 | null |
2024-12-04 | QuadricsReg: Large-Scale Point Cloud Registration using Quadric Primitives | Ji Wu et.al. | 2412.02998v1 | null |
2024-12-01 | FlashSLAM: Accelerated RGB-D SLAM for Real-Time 3D Scene Reconstruction with Gaussian Splatting | Phu Pham et.al. | 2412.00682v1 | null |
2024-11-27 | XR-MBT: Multi-modal Full Body Tracking for XR through Self-Supervision with Learned Depth Point Cloud Registration | Denys Rozumnyi et.al. | 2411.18377v1 | null |
2024-11-22 | EADReg: Probabilistic Correspondence Generation with Efficient Autoregressive Diffusion Model for Outdoor Point Cloud Registration | Linrui Gong et.al. | 2411.15271v1 | null |
2024-11-20 | Automatic marker-free registration based on similar tetrahedras for single-tree point clouds | Jing Ren et.al. | 2411.13069v1 | null |
2024-11-19 | 3D Reconstruction by Looking: Instantaneous Blind Spot Detector for Indoor SLAM through Mixed Reality | Hanbeom Chang et.al. | 2411.12514v1 | null |
2024-11-16 | Deep Loss Convexification for Learning Iterative Models | Ziming Zhang et.al. | 2411.10649v1 | null |
2024-11-12 | 3D Focusing-and-Matching Network for Multi-Instance Point Cloud Registration | Liyuan Zhang et.al. | 2411.07740v1 | link |
2024-11-04 | Mining and Transferring Feature-Geometry Coherence for Unsupervised Point Cloud Registration | Kezheng Xiong et.al. | 2411.01870v1 | link |
2024-10-30 | UniRiT: Towards Few-Shot Non-Rigid Point Cloud Registration | Geng Li et.al. | 2410.22909v1 | null |
2024-10-29 | Micro-Structures Graph-Based Point Cloud Registration for Balancing Efficiency and Accuracy | Rongling Zhang et.al. | 2410.21857v1 | null |
2024-10-29 | Memory-Efficient Point Cloud Registration via Overlapping Region Sampling | Tomoyasu Shimada et.al. | 2410.21753v1 | null |
2024-10-21 | RANSAC Back to SOTA: A Two-stage Consensus Filtering for Real-time 3D Registration | Pengcheng Shi et.al. | 2410.15682v1 | link |
2024-10-14 | A Consistency-Aware Spot-Guided Transformer for Versatile and Hierarchical Point Cloud Registration | Renlang Huang et.al. | 2410.10295v1 | link |
2024-10-14 | Kinematic-ICP: Enhancing LiDAR Odometry with Kinematic Constraints for Wheeled Mobile Robots Moving on Planar Surfaces | Tiziano Guadagnino et.al. | 2410.10277v1 | null |
2024-10-10 | LiPO: LiDAR Inertial Odometry for ICP Comparison | Darwin Mick et.al. | 2410.08097v1 | null |
2024-10-08 | Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration | Xueyang Kang et.al. | 2410.05729v1 | link |
2024-10-07 | Enhanced Multi-Robot SLAM System with Cross-Validation Matching and Exponential Threshold Keyframe Selection | Ang He et.al. | 2410.05017v1 | null |
2024-10-03 | LoGDesc: Local geometric features aggregation for robust point cloud registration | Karim Slimani et.al. | 2410.02420v1 | link |
2024-10-01 | GERA: Geometric Embedding for Efficient Point Registration Analysis | Geng Li et.al. | 2410.00589v1 | null |
2024-10-01 | TFCT-I2P: Three stream fusion network with color aware transformer for image-to-point cloud registration | Muyao Peng et.al. | 2410.00360v1 | link |
2024-10-06 | KISS-Matcher: Fast and Robust Point Cloud Registration Revisited | Hyungtae Lim et.al. | 2409.15615v2 | link |
2024-09-23 | MATCH POLICY: A Simple Pipeline from Point Cloud Registration to Manipulation Policies | Haojie Huang et.al. | 2409.15517v1 | null |
2024-09-22 | SynBench: A Synthetic Benchmark for Non-rigid 3D Point Cloud Registration | Sara Monji-Azad et.al. | 2409.14474v1 | null |
2024-09-27 | FracGM: A Fast Fractional Programming Technique for Geman-McClure Robust Estimator | Bang-Shien Chen et.al. | 2409.13978v2 | link |
2024-09-17 | Enhancing the Reliability of LiDAR Point Cloud Sampling: A Colorization and Super-Resolution Approach Based on LiDAR-Generated Images | Sier Ha et.al. | 2409.11532v1 | null |
2024-09-14 | Registration between Point Cloud Streams and Sequential Bounding Boxes via Gradient Descent | Xuesong Li et.al. | 2409.09312v1 | null |
2024-09-11 | Unsupervised Point Cloud Registration with Self-Distillation | Christian Löwens et.al. | 2409.07558v1 | link |
2024-09-10 | Mahalanobis k-NN: A Statistical Lens for Robust Point-Cloud Registrations | Tejas Anvekar et.al. | 2409.06267v1 | link |
2024-09-09 | From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models | Tessa Pulli et.al. | 2409.05413v1 | null |
2024-09-08 | Sight View Constraint for Robust Point Cloud Registration | Yaojie Zhang et.al. | 2409.05065v1 | null |
2024-08-23 | UMERegRobust - Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration | Yuval Haitman et.al. | 2408.12380v2 | link |
2024-08-21 | Informed, Constrained, Aligned: A Field Analysis on Degeneracy-aware Point Cloud Registration in the Wild | Turcan Tuna et.al. | 2408.11809v1 | null |
2024-08-20 | LoopSplat: Loop Closure by Registering 3D Gaussian Splats | Liyuan Zhu et.al. | 2408.10154v2 | link |
2024-08-05 | CMR-Agent: Learning a Cross-Modal Agent for Iterative Image-to-Point Cloud Registration | Gongxin Yao et.al. | 2408.02394v1 | null |
2024-08-05 | MaFreeI2P: A Matching-Free Image-to-Point Cloud Registration Paradigm with Active Camera Pose Retrieval | Gongxin Yao et.al. | 2408.02392v1 | null |
2024-07-29 | Correspondence-Free SE(3) Point Cloud Registration in RKHS via Unsupervised Equivariant Learning | Ray Zhang et.al. | 2407.20223v1 | null |
2024-07-24 | Robust Point Cloud Registration in Robotic Inspection with Locally Consistent Gaussian Mixture Model | Lingjie Su et.al. | 2407.17183v1 | null |
2024-07-23 | SE3ET: SE(3)-Equivariant Transformer for Low-Overlap Point Cloud Registration | Chien Erh Lin et.al. | 2407.16823v1 | link |
2024-07-19 | PointRegGPT: Boosting 3D Point Cloud Registration using Generative Point-Cloud Pairs for Training | Suyi Chen et.al. | 2407.14054v1 | link |
2024-07-19 | GlobalPointer: Large-Scale Plane Adjustment with Bi-Convex Relaxation | Bangyan Liao et.al. | 2407.13537v2 | link |
2024-07-22 | Snail-Radar: A large-scale diverse dataset for the evaluation of 4D-radar-based SLAM systems | Jianzhu Huai et.al. | 2407.11705v2 | null |
2024-07-14 | PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud Registration | Runzhao Yao et.al. | 2407.10142v1 | link |
2024-07-13 | ML-SemReg: Boosting Point Cloud Registration with Multi-level Semantic Consistency | Shaocheng Yan et.al. | 2407.09862v1 | link |
2024-07-11 | BiEquiFormer: Bi-Equivariant Representations for Global Point Cloud Registration | Stefanos Pertigkiozoglou et.al. | 2407.08729v1 | null |
2024-07-10 | Incremental Multiview Point Cloud Registration with Two-stage Candidate Retrieval | Shiqi Li et.al. | 2407.07525v1 | null |
2024-07-08 | SGOR: Outlier Removal by Leveraging Semantic and Geometric Information for Robust Point Cloud Registration | Guiyu Zhao et.al. | 2407.06297v1 | link |
2024-07-08 | GeoNLF: Geometry guided Pose-Free Neural LiDAR Fields | Weiyi Xue et.al. | 2407.05597v1 | null |
2024-07-07 | GaussReg: Fast 3D Registration with Gaussian Splatting | Jiahao Chang et.al. | 2407.05254v1 | null |
2024-07-06 | Incremental Multiview Point Cloud Registration | Xiaoya Cheng et.al. | 2407.05021v1 | link |
2024-06-25 | Point Tree Transformer for Point Cloud Registration | Meiling Wang et.al. | 2406.17530v1 | null |
2024-06-17 | Correspondence Free Multivector Cloud Registration using Conformal Geometric Algebra | Francisco Xavier Vasconcelos et.al. | 2406.11732v1 | link |
2024-06-05 | L-PR: Exploiting LiDAR Fiducial Marker for Unordered Low Overlap Multiview Point Cloud Registration | Yibo Liu et.al. | 2406.03298v1 | link |
2024-05-25 | Deep-PE: A Learning-Based Pose Evaluator for Point Cloud Registration | Junjie Gao et.al. | 2405.16085v1 | null |
2024-05-26 | NV-LIO: LiDAR-Inertial Odometry using Normal Vectors Towards Robust SLAM in Multifloor Environments | Dongha Chung et.al. | 2405.12563v2 | link |
2024-05-13 | RGBD-Glue: General Feature Combination for Robust RGB-D Point Cloud Registration | Congjia Chen et.al. | 2405.07594v1 | null |
2024-05-10 | Benchmarking Classical and Learning-Based Multibeam Point Cloud Registration | Li Ling et.al. | 2405.06279v1 | link |
2024-05-09 | Rotation Initialization and Stepwise Refinement for Universal LiDAR Calibration | Yifan Duan et.al. | 2405.05589v1 | null |
2024-05-07 | Speak the Same Language: Global LiDAR Registration on BIM Using Pose Hough Transform | Zhijian Qiao et.al. | 2405.03969v1 | null |
2024-05-06 | Deep Learning-based Point Cloud Registration for Augmented Reality-guided Surgery | Maximilian Weber et.al. | 2405.03314v1 | null |
2024-04-27 | FRAME: A Modular Framework for Autonomous Map-merging: Advancements in the Field | Nikolaos Stathoulopoulos et.al. | 2404.18006v1 | null |
2024-04-22 | PointDifformer: Robust Point Cloud Registration With Neural Diffusion and Transformer | Rui She et.al. | 2404.14034v1 | null |
2024-04-22 | A Comprehensive Survey and Taxonomy on Point Cloud Registration Based on Deep Learning | Yu-Xin Zhang et.al. | 2404.13830v1 | link |
2024-04-09 | Efficient and Robust Point Cloud Registration via Heuristics-guided Parameter Search | Tianyu Huang et.al. | 2404.06155v1 | link |
2024-04-08 | Rendering-Enhanced Automatic Image-to-Point Cloud Registration for Roadside Scenes | Yu Sheng et.al. | 2404.05164v1 | null |
2024-04-06 | Learning Instance-Aware Correspondences for Robust Multi-Instance Point Cloud Registration in Cluttered Scenes | Zhiyuan Yu et.al. | 2404.04557v1 | link |
2024-04-05 | A Ground Mobile Robot for Autonomous Terrestrial Laser Scanning-Based Field Phenotyping | Javier Rodriguez-Sanchez et.al. | 2404.04404v1 | null |
2024-04-01 | FPGA-Accelerated Correspondence-free Point Cloud Registration with PointNet Features | Keisuke Sugiura et.al. | 2404.01237v1 | null |
2024-03-28 | SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks | Yaxu Xie et.al. | 2403.19474v1 | link |
2024-03-26 | Global Point Cloud Registration Network for Large Transformations | Hanz Cuevas-Velasquez et.al. | 2403.18040v1 | link |
2024-03-28 | Exploring Accurate 3D Phenotyping in Greenhouse through Neural Radiance Fields | Junhong Zhao et.al. | 2403.15981v2 | null |
2024-03-15 | VRHCF: Cross-Source Point Cloud Registration via Voxel Representation and Hierarchical Correspondence Filtering | Guiyu Zhao et.al. | 2403.10085v1 | link |
2024-03-15 | MEDPNet: Achieving High-Precision Adaptive Registration for Complex Die Castings | Yu Du et.al. | 2403.09996v1 | null |
2024-03-15 | CLOSURE: Fast Quantification of Pose Uncertainty Sets | Yihuai Gao et.al. | 2403.09990v1 | null |
2024-03-13 | FastMAC: Stochastic Spectral Sampling of Correspondence Graph | Yifei Zhang et.al. | 2403.08770v1 | link |
2024-03-13 | NeRF-Supervised Feature Point Detection and Description | Ali Youssef et.al. | 2403.08156v1 | link |
2024-03-10 | PSS-BA: LiDAR Bundle Adjustment with Progressive Spatial Smoothing | Jianping Li et.al. | 2403.06124v1 | null |
2024-03-27 | Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension | Quan Liu et.al. | 2403.03532v2 | link |
2024-03-15 | RELEAD: Resilient Localization with Enhanced LiDAR Odometry in Adverse Environments | Zhiqiang Chen et.al. | 2402.18934v2 | null |
2024-02-28 | PCR-99: A Practical Method for Point Cloud Registration with 99% Outliers | Seong Hun Lee et.al. | 2402.16598v2 | link |
2024-02-23 | CLIPPER+: A Fast Maximal Clique Algorithm for Robust Global Registration | Kaveh Fathian et.al. | 2402.15464v1 | link |
2024-02-11 | CLIPPER: Robust Data Association without an Initial Guess | Parker C. Lusk et.al. | 2402.07284v1 | null |
2024-02-08 | Tightly Coupled Range Inertial Localization on a 3D Prior Map Based on Sliding Window Factor Graph Optimization | Kenji Koide et.al. | 2402.05540v1 | null |
2024-01-16 | Registration of algebraic varieties using Riemannian optimization | Florentin Goyens et.al. | 2401.08562v1 | link |
2024-01-09 | Iterative Feedback Network for Unsupervised Point Cloud Registration | Yifan Xie et.al. | 2401.04357v1 | link |
2024-01-06 | PosDiffNet: Positional Neural Diffusion for Point Cloud Registration in a Large Field of View with Perturbations | Rui She et.al. | 2401.03167v1 | null |
2024-01-04 | OptFlow: Fast Optimization-based Scene Flow Estimation without Supervision | Rahul Ahuja et.al. | 2401.02550v1 | null |
2024-01-17 | Diff-PCR: Diffusion-Based Correspondence Searching in Doubly Stochastic Matrix Space for Point Cloud Registration | Qianliang Wu et.al. | 2401.00436v4 | null |
2023-12-22 | On Partial Optimal Transport: Revising the Infeasibility of Sinkhorn and Efficient Gradient Methods | Anh Duc Nguyen et.al. | 2312.13970v2 | link |
2023-12-20 | D3Former: Jointly Learning Repeatable Dense Detectors and Feature-enhanced Descriptors via Saliency-guided Transformer | Junjie Gao et.al. | 2312.12970v1 | null |
2023-12-14 | SPEAL: Skeletal Prior Embedded Attention Learning for Cross-Source Point Cloud Registration | Kezheng Xiong et.al. | 2312.08664v1 | null |
2023-12-11 | PCRDiffusion: Diffusion Probabilistic Models for Point Cloud Registration | Yue Wu et.al. | 2312.06063v1 | null |
2023-12-05 | DiffusionPCR: Diffusion Models for Robust Multi-Step Point Cloud Registration | Zhi Chen et.al. | 2312.03053v1 | link |
2023-12-08 | Zero-Shot Point Cloud Registration | Weijie Wang et.al. | 2312.03032v2 | null |
2023-12-05 | A Dynamic Network for Efficient Point Cloud Registration | Yang Ai et.al. | 2312.02877v1 | null |
2023-12-05 | 6D Assembly Pose Estimation by Point Cloud Registration for Robot Manipulation | K. Samarawickrama et.al. | 2312.02593v1 | link |
2023-12-04 | Rotation-Invariant Rapid TRISO-Fueled Pebble Identification Based on Feature Matching and Point Cloud Registration | Ming Fang et.al. | 2312.02006v1 | null |
2023-12-27 | E2PNet: Event to Point Cloud Registration with Spatio-Temporal Representation Learning | Xiuhong Lin et.al. | 2311.18433v2 | link |
2023-11-15 | Nothing Stands Still: A Spatiotemporal Benchmark on 3D Point Cloud Registration Under Large Geometric and Temporal Change | Tao Sun et.al. | 2311.09346v1 | null |
2023-11-02 | Transformation Decoupling Strategy based on Screw Theory for Deterministic Point Cloud Registration with Gravity Prior | Xinyi Li et.al. | 2311.01432v1 | link |
2023-11-02 | Cross-Modal Information-Guided Network using Contrastive Learning for Point Cloud Registration | Yifan Xie et.al. | 2311.01202v1 | link |
2023-10-29 | HDMNet: A Hierarchical Matching Network with Double Attention for Large-scale Outdoor LiDAR Point Cloud Registration | Weiyi Xue et.al. | 2310.18874v1 | null |
2023-10-27 | Do we need scan-matching in radar odometry? | Vladimír Kubelka et.al. | 2310.18117v1 | link |
2023-10-26 | SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation | Haobo Jiang et.al. | 2310.17359v1 | null |
2023-10-18 | DBDNet:Partial-to-Partial Point Cloud Registration with Dual Branches Decoupling | Shiqi Li et.al. | 2310.11733v1 | null |
2023-10-15 | OAAFormer: Robust and Efficient Point Cloud Registration Through Overlapping-Aware Attention in Transformer | Junjie Gao et.al. | 2310.09817v1 | null |
2023-10-09 | FeatSense -- A Feature-based Registration Algorithm with GPU-accelerated TSDF-Mapping Backend for NVIDIA Jetson Boards | Julian Gaal et.al. | 2310.05766v1 | link |
2023-10-09 | Colmap-PCD: An Open-source Tool for Fine Image-to-point cloud Registration | Chunge Bai et.al. | 2310.05504v1 | link |
2023-10-06 | Light-LOAM: A Lightweight LiDAR Odometry and Mapping based on Graph-Matching | Shiquan Yi et.al. | 2310.04162v1 | link |
2023-10-05 | FreeReg: Image-to-Point Cloud Registration Leveraging Pretrained Diffusion Models and Monocular Depth Estimators | Haiping Wang et.al. | 2310.03420v1 | link |
2023-10-02 | COIN-LIO: Complementary Intensity-Augmented LiDAR Inertial Odometry | Patrick Pfreundschuh et.al. | 2310.01235v1 | link |
2023-09-27 | Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature | Shengze Jin et.al. | 2309.16023v1 | null |
2023-09-27 | Partial Transport for Point-Cloud Registration | Yikun Bai et.al. | 2309.15787v1 | null |
2023-09-27 | KDD-LOAM: Jointly Learned Keypoint Detector and Descriptors Assisted LiDAR Odometry and Mapping | Renlang Huang et.al. | 2309.15394v1 | null |
2023-09-26 | CoFiI2P: Coarse-to-Fine Correspondences for Image-to-Point Cloud Registration | Shuhao Kang et.al. | 2309.14660v1 | null |
2023-09-20 | AutoSynth: Learning to Generate 3D Training Data for Object Point Cloud Registration | Zheng Dang et.al. | 2309.11170v1 | null |
2023-09-19 | LiDAR-Generated Images Derived Keypoints Assisted Point Cloud Registration Scheme in Odometry Estimation | Haizhou Zhang et.al. | 2309.10436v1 | link |
2023-09-17 | Hamiltonian Dynamics Learning from Point Cloud Observations for Nonholonomic Mobile Robot Control | Abdullah Altawaitan et.al. | 2309.09163v1 | link |
2023-09-16 | FF-LOGO: Cross-Modality Point Cloud Registration with Feature Filtering and Local to Global Optimization | Nan Ma et.al. | 2309.08966v1 | null |
2023-09-16 | Outram: One-shot Global Localization via Triangulated Scene Graph and Global Outlier Pruning | Pengyu Yin et.al. | 2309.08914v1 | link |
2023-09-15 | A Ground Segmentation Method Based on Point Cloud Map for Unstructured Roads | Zixuan Li et.al. | 2309.08164v1 | null |
2023-09-15 | Fast and Accurate Deep Loop Closing and Relocalization for Reliable LiDAR SLAM | Chenghao Shi et.al. | 2309.08086v1 | null |
2023-09-14 | EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization | Minjung Kim et.al. | 2309.07471v1 | link |
2023-09-12 | SGFeat: Salient Geometric Feature for Point Cloud Registration | Qianliang Wu et.al. | 2309.06207v1 | null |
2023-09-01 | Point-TTA: Test-Time Adaptation for Point Cloud Registration Using Multitask Meta-Auxiliary Learning | Ahmed Hatem et.al. | 2308.16481v2 | null |
2023-08-21 | In-Rack Test Tube Pose Estimation Using RGB-D Data | Hao Chen et.al. | 2308.10411v1 | null |
2023-08-18 | DReg-NeRF: Deep Registration for Neural Radiance Fields | Yu Chen et.al. | 2308.09386v1 | link |
2023-08-18 | Overlap Bias Matching is Necessary for Point Cloud Registration | Pengcheng Shi et.al. | 2308.09364v1 | null |
2023-08-10 | Deep Semantic Graph Matching for Large-scale Outdoor Point Clouds Registration | Shaocong Liu et.al. | 2308.05314v1 | null |
2023-08-09 | PointMBF: A Multi-scale Bidirectional Fusion Network for Unsupervised RGB-D Point Cloud Registration | Mingzhi Yuan et.al. | 2308.04782v1 | link |
2023-07-25 | GeoTransformer: Fast and Robust Point Cloud Registration with Geometric Transformer | Zheng Qin et.al. | 2308.03768v1 | link |
2023-07-26 | One-Nearest Neighborhood Guides Inlier Estimation for Unsupervised Point Cloud Registration | Yongzhe Yuan et.al. | 2307.14019v1 | null |
2023-07-22 | Pyramid Semantic Graph-based Global Point Cloud Registration with Low Overlap | Zhijian Qiao et.al. | 2307.12116v1 | link |
2023-09-12 | ELiOT : End-to-end Lidar Odometry using Transformer Framework | Daegyu Lee et.al. | 2307.11998v4 | null |
2023-08-08 | Density-invariant Features for Distant Point Cloud Registration | Quan Liu et.al. | 2307.09788v2 | link |
2023-07-18 | SphereNet: Learning a Noise-Robust and General Descriptor for Point Cloud Registration | Guiyu Zhao et.al. | 2307.09351v1 | null |
2023-07-14 | CFI2P: Coarse-to-Fine Cross-Modal Correspondence Learning for Image-to-Point Cloud Registration | Gongxin Yao et.al. | 2307.07142v1 | null |
2023-07-11 | Exact Point Cloud Downsampling for Fast and Accurate Global Trajectory Optimization | Kenji Koide et.al. | 2307.02948v2 | link |
2023-07-03 | Direct Superpoints Matching for Fast and Robust Point Cloud Registration | Aniket Gupta et.al. | 2307.01362v1 | link |
2023-07-04 | A denoised Mean Teacher for domain adaptive point cloud registration | Alexander Bigalke et.al. | 2306.14749v2 | link |
2023-06-20 | End-to-end 2D-3D Registration between Image and LiDAR Point Cloud for Vehicle Localization | Guangming Wang et.al. | 2306.11346v1 | null |
2023-06-14 | ICET Online Accuracy Characterization for Geometry-Based Laser Scan Matching | Matthew McDermott et.al. | 2306.08690v1 | link |
2023-06-12 | Volume-DROID: A Real-Time Implementation of Volumetric Mapping with DROID-SLAM | Peter Stratton et.al. | 2306.06850v1 | link |
2023-06-11 | PWR-Align: Leveraging Part-Whole Relationships for Part-wise Rigid Point Cloud Registration in Mixed Reality Applications | Manorama Jha et.al. | 2306.06717v1 | null |
2023-06-07 | Robust-DefReg: A Robust Deformable Point Cloud Registration Method based on Graph Convolutional Neural Networks | Sara Monji-Azad et.al. | 2306.04701v1 | null |
2023-05-23 | Cross-source Point Cloud Registration: Challenges, Progress and Prospects | Xiaoshui Huang et.al. | 2305.13570v1 | null |
2023-05-19 | Efficient and Deterministic Search Strategy Based on Residual Projections for Point Cloud Registration | Xinyi Li et.al. | 2305.11716v1 | null |
2023-05-18 | 3D Registration with Maximal Cliques | Xiyu Zhang et.al. | 2305.10854v1 | link |
2023-05-05 | HD2Reg: Hierarchical Descriptors and Detectors for Point Cloud Registration | Canhui Tang et.al. | 2305.03487v1 | link |
2023-05-08 | APR: Online Distant Point Cloud Registration Through Aggregated Point Cloud Reconstruction | Quan Liu et.al. | 2305.02893v2 | link |
2023-04-27 | RegHEC: Hand-Eye Calibration via Simultaneous Multi-view Point Clouds Registration of Arbitrary Object | Shiyu Xing et.al. | 2304.14092v1 | link |
2023-04-26 | Non-rigid Point Cloud Registration for Middle Ear Diagnostics with Endoscopic Optical Coherence Tomography | Peng Liu et.al. | 2304.13618v1 | link |
2023-04-25 | BO-ICP: Initialization of Iterative Closest Point Based on Bayesian Optimization | Harel Biggie et.al. | 2304.13114v1 | link |
2023-04-18 | SDFReg: Learning Signed Distance Functions for Point Cloud Registration | Leida Zhang et.al. | 2304.08929v1 | null |
2023-04-12 | SiLK -- Simple Learned Keypoints | Pierre Gleize et.al. | 2304.06194v1 | link |
2023-04-11 | TT-SDF2PC: Registration of Point Cloud and Compressed SDF Directly in the Memory-Efficient Tensor Train Domain | Alexey I. Boyko et.al. | 2304.05342v1 | null |
2023-04-10 | HybridFusion: LiDAR and Vision Cross-Source Point Cloud Fusion | Yu Wang et.al. | 2304.04508v1 | null |
2023-04-09 | Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D Videos | Shiyang Lu et.al. | 2304.04325v1 | null |
2023-04-09 | DSMNet: Deep High-precision 3D Surface Modeling from Sparse Point Cloud Frames | Changjie Qiu et.al. | 2304.04200v1 | null |
2023-04-02 | Robust Multiview Point Cloud Registration with Reliable Pose Graph Initialization and History Reweighting | Haiping Wang et.al. | 2304.00467v1 | link |
2023-03-31 | kNN-Res: Residual Neural Network with kNN-Graph coherence for point cloud registration | Muhammad S. Battikh et.al. | 2304.00050v1 | link |
2023-03-31 | RDMNet: Reliable Dense Matching Based Point Cloud Registration for Autonomous Driving | Chenghao Shi et.al. | 2303.18084v1 | null |
2023-04-23 | HybridPoint: Point Cloud Registration Based on Hybrid Point Sampling and Matching | Yiheng Li et.al. | 2303.16526v2 | link |
2023-03-27 | Learnable Graph Matching: A Practical Paradigm for Data Association | Jiawei He et.al. | 2303.15414v1 | link |
2023-03-23 | Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration | Guofeng Mei et.al. | 2303.13290v1 | link |
2023-03-22 | RegFormer: An Efficient Projection-Aware Transformer Network for Large-Scale Point Cloud Registration | Jiuming Liu et.al. | 2303.12384v1 | link |
2023-03-17 | Deep Graph-based Spatial Consistency for Robust Non-rigid Point Cloud Registration | Zheng Qin et.al. | 2303.09950v1 | link |
2023-03-14 | RoCNet: 3D Robust Registration of Point-Clouds using Deep Learning | Karim Slimani et.al. | 2303.07963v1 | null |
2023-03-07 | GMCR: Graph-based Maximum Consensus Estimation for Point Cloud Registration | Michael Gentner et.al. | 2303.04032v1 | null |
2023-03-02 | Neural Intrinsic Embedding for Non-rigid Point Cloud Matching | Puhua Jiang et.al. | 2303.01038v1 | null |
2023-03-14 | A Unified BEV Model for Joint Learning of 3D Local Features and Overlap Estimation | Lin Li et.al. | 2302.14511v2 | link |
2023-02-28 | PCR-CG: Point Cloud Registration via Deep Color and Geometry | Yu Zhang et.al. | 2302.14418v1 | link |
2023-02-28 | Efficient Implicit Neural Reconstruction Using LiDAR | Dongyu Yan et.al. | 2302.14363v1 | link |
2023-02-25 | Accurate Gaussian Process Distance Fields with applications to Echolocation and Mapping | Cedric Le Gentil et.al. | 2302.13005v1 | null |
2023-02-14 | Point Cloud Registration for LiDAR and Photogrammetric Data: a Critical Synthesis and Performance Analysis on Classic and Deep Learning Algorithms | Ningli Xu et.al. | 2302.07184v1 | link |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-07-09 | PointVDP: Learning View-Dependent Projection by Fireworks Rays for 3D Point Cloud Segmentation | Yang Chen et.al. | 2507.06618v1 | null |
2025-07-09 | Ambiguity-aware Point Cloud Segmentation by Adaptive Margin Contrastive Learning | Yang Chen et.al. | 2507.06592v1 | null |
2025-07-07 | All in One: Visual-Description-Guided Unified Point Cloud Segmentation | Zongyan Han et.al. | 2507.05211v1 | null |
2025-06-29 | High-quality Pseudo-labeling for Point Cloud Segmentation with Scene-level Annotation | Lunhao Duan et.al. | 2506.23227v1 | null |
2025-06-26 | TSDASeg: A Two-Stage Model with Direct Alignment for Interactive Point Cloud Segmentation | Chade Li et.al. | 2506.20991v1 | null |
2025-06-16 | SRKD: Towards Efficient 3D Point Cloud Segmentation via Structure- and Relation-aware Knowledge Distillation | Yuqi Li et.al. | 2506.17290v1 | null |
2025-06-11 | Enhancing Human-Robot Collaboration: A Sim2Real Domain Adaptation Algorithm for Point Cloud Segmentation in Industrial Environments | Fatemeh Mohammadi Amin et.al. | 2506.09552v1 | null |
2025-06-05 | Point Cloud Segmentation of Agricultural Vehicles using 3D Gaussian Splatting | Alfred T. Christiansen et.al. | 2506.05009v1 | null |
2025-06-05 | OpenMaskDINO3D : Reasoning 3D Segmentation via Large Language Model | Kunshen Zhang et.al. | 2506.04837v1 | link |
2025-05-25 | Staircase Recognition and Location Based on Polarization Vision | Weifeng Kong et.al. | 2505.19026v1 | null |
2025-05-23 | Generative Data Augmentation for Object Point Cloud Segmentation | Dekai Zhu et.al. | 2505.17783v1 | null |
2025-05-15 | APCoTTA: Continual Test-Time Adaptation for Semantic Segmentation of Airborne LiDAR Point Clouds | Yuan Gao et.al. | 2505.09971v1 | link |
2025-04-26 | WLTCL: Wide Field-of-View 3-D LiDAR Truck Compartment Automatic Localization System | Guodong Sun et.al. | 2504.18870v1 | null |
2025-04-16 | 3D-PointZshotS: Geometry-Aware 3D Point Cloud Zero-Shot Semantic Segmentation Narrowing the Visual-Semantic Gap | Minmin Yang et.al. | 2504.12442v1 | link |
2025-04-09 | UAV Position Estimation using a LiDAR-based 3D Object Detection Method | Uthman Olawoye et.al. | 2504.07028v1 | null |
2025-04-08 | Turin3D: Evaluating Adaptation Strategies under Label Scarcity in Urban LiDAR Segmentation with Semi-Supervised Techniques | Luca Barco et.al. | 2504.05882v1 | null |
2025-04-12 | Robust Unsupervised Domain Adaptation for 3D Point Cloud Segmentation Under Source Adversarial Attacks | Haosheng Li et.al. | 2504.01659v3 | null |
2025-04-12 | ProtoGuard-guided PROPEL: Class-Aware Prototype Enhancement and Progressive Labeling for Incremental 3D Point Cloud Segmentation | Haosheng Li et.al. | 2504.01648v2 | null |
2025-03-24 | DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation | Karim Abou Zeid et.al. | 2503.18944v1 | link |
2025-03-21 | GeoT: Geometry-guided Instance-dependent Transition Matrix for Semi-supervised Tooth Point Cloud Segmentation | Weihao Yu et.al. | 2503.16976v1 | link |
2025-05-20 | Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model | Zhaochong An et.al. | 2503.16282v2 | link |
2025-03-19 | Depth-Aware Range Image-Based Model for Point Cloud Segmentation | Bike Chen et.al. | 2503.14955v1 | null |
2025-03-18 | Deep Unsupervised Segmentation of Log Point Clouds | Fedor Zolotarev et.al. | 2503.14244v1 | null |
2025-03-07 | Joint 3D Point Cloud Segmentation using Real-Sim Loop: From Panels to Trees and Branches | Tian Qiu et.al. | 2503.05630v1 | null |
2025-03-05 | Label-Efficient LiDAR Semantic Segmentation with 2D-3D Vision Transformer Adapters | Julia Hindel et.al. | 2503.03299v1 | null |
2025-03-01 | Explainable LiDAR 3D Point Cloud Segmentation and Clustering for Detecting Airplane-Generated Wind Turbulence | Zhan Qu et.al. | 2503.00518v1 | null |
2025-02-26 | PFSD: A Multi-Modal Pedestrian-Focus Scene Dataset for Rich Tasks in Semi-Structured Environments | Yueting Liu et.al. | 2502.15342v3 | link |
2025-02-18 | An Experimental Study of SOTA LiDAR Segmentation Models | Bike Chen et.al. | 2502.12860v1 | null |
2025-01-30 | Ground Awareness in Deep Learning for Large Outdoor Point Cloud Segmentation | Kevin Qiu et.al. | 2501.18246v1 | null |
2025-01-29 | 3DSES: an indoor Lidar point cloud segmentation dataset with real and pseudo-labels from a 3D model | Maxime Mérizette et.al. | 2501.17534v1 | null |
2025-01-24 | LiDAR-Based Vehicle Detection and Tracking for Autonomous Racing | Marcello Cellina et.al. | 2501.14502v1 | null |
2025-01-06 | The 2nd Place Solution from the 3D Semantic Segmentation Track in the 2024 Waymo Open Dataset Challenge | Qing Wu et.al. | 2501.05472v1 | null |
2025-01-03 | MRG: A Multi-Robot Manufacturing Digital Scene Generation Method Using Multi-Instance Point Cloud Registration | Songjie Han et.al. | 2501.02041v1 | null |
2025-01-18 | Impact of color and mixing proportion of synthetic point clouds on semantic segmentation | Shaojie Zhou et.al. | 2412.19145v2 | link |
2024-12-02 | The Bare Necessities: Designing Simple, Effective Open-Vocabulary Scene Graphs | Christina Kassab et.al. | 2412.01539v1 | null |
2024-11-30 | Density-aware Global-Local Attention Network for Point Cloud Segmentation | Chade Li et.al. | 2412.00489v1 | null |
2024-11-28 | Textured As-Is BIM via GIS-informed Point Cloud Segmentation | Mohamed S. H. Alabassy et.al. | 2411.18898v1 | null |
2024-11-27 | Towards Cross-device and Training-free Robotic Grasping in 3D Open World | Weiguang Zhao et.al. | 2411.18133v1 | null |
2024-11-20 | BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation | Umamaheswaran Raman Kumar et.al. | 2411.13251v1 | null |
2024-11-13 | Biomass phenotyping of oilseed rape through UAV multi-view oblique imaging with 3DGS and SAM model | Yutao Shen et.al. | 2411.08453v1 | null |
2024-11-13 | Multiscale Graph Construction Using Non-local Cluster Features | Reina Kaneko et.al. | 2411.08371v1 | null |
2024-10-30 | Automated Image-Based Identification and Consistent Classification of Fire Patterns with Quantitative Shape Analysis and Spatial Location Identification | Pengkun Liu et.al. | 2410.23105v1 | null |
2024-11-03 | Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation | Zhaochong An et.al. | 2410.22489v2 | link |
2024-10-28 | Exploring contextual modeling with linear complexity for point cloud segmentation | Yong Xien Chng et.al. | 2410.21211v1 | null |
2024-10-14 | Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies | Yanjie Ze et.al. | 2410.10803v1 | link |
2024-10-09 | Evaluating the Impact of Point Cloud Colorization on Semantic Segmentation Accuracy | Qinfeng Zhu et.al. | 2410.06725v1 | null |
2024-09-24 | Underground Mapping and Localization Based on Ground-Penetrating Radar | Jinchang Zhang et.al. | 2409.16446v1 | null |
2024-09-22 | Lidar Panoptic Segmentation in an Open World | Anirudh S Chakravarthy et.al. | 2409.14273v1 | link |
2024-09-03 | When 3D Partial Points Meets SAM: Tooth Point Cloud Segmentation with Sparse Labels | Yifan Liu et.al. | 2409.01691v1 | null |
2024-09-03 | Efficiently Expanding Receptive Fields: Local Split Attention and Parallel Aggregation for Enhanced Large-scale Point Cloud Semantic Segmentation | Haodong Wang et.al. | 2409.01662v1 | null |
2024-08-29 | Towards Modality-agnostic Label-efficient Segmentation with Entropy-Regularized Distribution Alignment | Liyao Tang et.al. | 2408.16520v1 | link |
2024-08-21 | GSTran: Joint Geometric and Semantic Coherence for Point Cloud Segmentation | Abiao Li et.al. | 2408.11558v1 | link |
2024-08-02 | Trainable Pointwise Decoder Module for Point Cloud Segmentation | Bike Chen et.al. | 2408.01548v1 | null |
2024-07-31 | Fine-grained Metrics for Point Cloud Semantic Segmentation | Zhuheng Lu et.al. | 2407.21289v1 | null |
2024-07-19 | Scale Disparity of Instances in Interactive Point Cloud Segmentation | Chenrui Han et.al. | 2407.14009v1 | null |
2024-07-18 | SegPoint: Segment Any Point Cloud via Large Language Model | Shuting He et.al. | 2407.13761v1 | null |
2024-07-17 | Dual-level Adaptive Self-Labeling for Novel Class Discovery in Point Cloud Segmentation | Ruijie Xu et.al. | 2407.12489v1 | link |
2024-07-17 | HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation | Tianpei Zou et.al. | 2407.12387v1 | link |
2024-07-17 | Serialized Point Mamba: A Serialized Point Cloud Mamba Segmentation Model | Tao Wang et.al. | 2407.12319v1 | null |
2024-07-12 | Uplifting Range-View-based 3D Semantic Segmentation in Real-Time with Multi-Sensor Fusion | Shiqi Tan et.al. | 2407.09697v1 | null |
2024-07-01 | fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence | Francis Williams et.al. | 2407.01781v1 | null |
2024-06-25 | Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model | Zhuoyuan Li et.al. | 2406.17442v1 | null |
2024-08-04 | Twin Deformable Point Convolutions for Point Cloud Semantic Segmentation in Remote Sensing Scenes | Yong-Qiang Mao et.al. | 2405.19735v2 | null |
2024-05-24 | 3D Unsupervised Learning by Distilling 2D Open-Vocabulary Segmentation Models for Autonomous Driving | Boyi Sun et.al. | 2405.15286v1 | link |
2024-05-25 | Filling Missing Values Matters for Range Image-Based Point Cloud Segmentation | Bike Chen et.al. | 2405.10175v2 | null |
2024-04-16 | ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation | Iaroslav Melekhov et.al. | 2404.10699v1 | link |
2024-04-04 | OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views | Francis Engelmann et.al. | 2404.03650v1 | null |
2024-03-28 | RiEMann: Near Real-Time SE(3)-Equivariant Robot Manipulation without Point Cloud Segmentation | Chongkai Gao et.al. | 2403.19460v1 | null |
2024-05-30 | CurbNet: Curb Detection Framework Based on LiDAR Point Cloud Segmentation | Guoyang Zhao et.al. | 2403.16794v2 | link |
2024-03-18 | EffiPerception: an Efficient Framework for Various Perception Tasks | Xinhao Xiang et.al. | 2403.12317v1 | null |
2024-03-11 | 3DRef: 3D Dataset and Benchmark for Reflection Detection in RGB and Lidar Data | Xiting Zhao et.al. | 2403.06538v1 | null |
2024-03-11 | Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation | Peng Zhang et.al. | 2403.06401v1 | null |
2024-03-03 | Region-Transformer: Self-Attention Region Based Class-Agnostic Point Cloud Segmentation | Dipesh Gyawali et.al. | 2403.01407v1 | null |
2024-01-29 | Dynamic Prototype Adaptation with Distillation for Few-shot Point Cloud Segmentation | Jie Liu et.al. | 2401.16051v1 | link |
2024-01-19 | Symbol as Points: Panoptic Symbol Spotting via Point-based Representation | Wenlong Liu et.al. | 2401.10556v1 | link |
2023-12-29 | Multi-modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation | Xiawei Li et.al. | 2312.16578v2 | link |
2023-12-19 | Point Cloud Segmentation Using Transfer Learning with RandLA-Net: A Case Study on Urban Areas | Alperen Enes Bayar et.al. | 2312.11880v1 | null |
2023-12-15 | T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning | Weijie Wei et.al. | 2312.10217v1 | link |
2023-12-14 | FAPP: Fast and Adaptive Perception and Planning for UAVs in Dynamic Cluttered Environments | Minghao Lu et.al. | 2312.08743v1 | null |
2023-12-12 | Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic Segmentation | Yuanbin Wang et.al. | 2312.07221v1 | null |
2023-12-11 | Densify Your Labels: Unsupervised Clustering with Bipartite Matching for Weakly Supervised Point Cloud Segmentation | Shaobo Xia et.al. | 2312.06799v1 | null |
2024-01-15 | Provable Adversarial Robustness for Group Equivariant Tasks: Graphs, Point Clouds, Molecules, and More | Jan Schuchardt et.al. | 2312.02708v2 | null |
2023-11-24 | OneFormer3D: One Transformer for Unified Point Cloud Segmentation | Maxim Kolodiazhnyi et.al. | 2311.14405v1 | null |
2023-11-18 | DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields | Yu Chi et.al. | 2311.12063v1 | link |
2023-11-10 | U3DS |
Jiaxu Liu et.al. | 2311.06018v1 | null |
2023-11-06 | Leveraging Large-Scale Pretrained Vision Foundation Models for Label-Efficient 3D Point Cloud Segmentation | Shichao Dong et.al. | 2311.01989v2 | null |
2023-10-19 | 2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision | Cheng-Kun Yang et.al. | 2310.12817v1 | null |
2023-10-11 | PointHR: Exploring High-Resolution Architectures for 3D Point Cloud Segmentation | Haibo Qiu et.al. | 2310.07743v1 | link |
2023-09-26 | Addressing Data Misalignment in Image-LiDAR Fusion on Point Cloud Segmentation | Wei Jong Yang et.al. | 2309.14932v1 | null |
2023-09-20 | Towards Robust Few-shot Point Cloud Semantic Segmentation | Yating Xu et.al. | 2309.11228v1 | link |
2023-09-20 | Generalized Few-Shot Point Cloud Segmentation Via Geometric Words | Yating Xu et.al. | 2309.11222v1 | link |
2023-08-29 | Compositional Semantic Mix for Domain Adaptation in Point Cloud Segmentation | Cristiano Saltori et.al. | 2308.14619v2 | link |
2023-08-22 | Hierarchical Point-based Active Learning for Semi-supervised Point Cloud Semantic Segmentation | Zongyi Xu et.al. | 2308.11166v1 | link |
2023-08-14 | Autonomous Point Cloud Segmentation for Power Lines Inspection in Smart Grid | Alexander Kyuroson et.al. | 2308.07283v1 | null |
2023-08-08 | Boosting Few-shot 3D Point Cloud Segmentation via Query-Guided Enhancement | Zhenhua Ning et.al. | 2308.03177v2 | link |
2023-07-31 | pCTFusion: Point Convolution-Transformer Fusion with Semantic Aware Loss for Outdoor LiDAR Point Cloud Segmentation | Abhishek Kuriyal et.al. | 2307.14777v2 | link |
2023-07-27 | Clustering based Point Cloud Representation Learning for 3D Analysis | Tuo Feng et.al. | 2307.14605v1 | link |
2023-07-20 | See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data | Yuhang Lu et.al. | 2307.10782v1 | null |
2023-07-14 | Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar | Runwei Guan et.al. | 2307.07102v1 | link |
2023-07-08 | BPNet: Bézier Primitive Segmentation on 3D Point Clouds | Rao Fu et.al. | 2307.04013v1 | link |
2023-06-28 | Point2Point : A Framework for Efficient Deep Learning on Hilbert sorted Point Clouds with applications in Spatio-Temporal Occupancy Prediction | Athrva Atul Pandhare et.al. | 2306.16306v1 | null |
2023-05-30 | Dynamic Clustering Transformer Network for Point Cloud Segmentation | Dening Lu et.al. | 2306.08073v1 | null |
2023-05-23 | Prototype Adaption and Projection for Few- and Zero-shot 3D Point Cloud Semantic Segmentation | Shuting He et.al. | 2305.14335v1 | link |
2023-05-22 | Contrastive Predictive Autoencoders for Dynamic Point Cloud Self-Supervised Learning | Xiaoxiao Sheng et.al. | 2305.12959v1 | null |
2023-05-17 | Tinto: Multisensor Benchmark for 3D Hyperspectral Point Cloud Segmentation in the Geosciences | Ahmed J. Afifi et.al. | 2305.09928v1 | null |
2023-05-08 | OctFormer: Octree-based Transformers for 3D Point Clouds | Peng-Shuai Wang et.al. | 2305.03045v2 | link |
2023-05-22 | Urban GeoBIM construction by integrating semantic LiDAR point clouds with as-designed BIM models | Jie Shao et.al. | 2304.11719v2 | null |
2023-04-22 | Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic Segmentation | Feng Jiang et.al. | 2304.11393v1 | link |
2023-06-02 | Transformer-Based Visual Segmentation: A Survey | Xiangtai Li et.al. | 2304.09854v2 | link |
2023-04-11 | Feature-assisted interactive geometry reconstruction in 3D point clouds using incremental region growing | Attila Szabo et.al. | 2304.05109v1 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-07-23 | Attention (as Discrete-Time Markov) Chains | Yotam Erel et.al. | 2507.17657v1 | null |
2025-07-23 | Who Attacks, and Why? Using LLMs to Identify Negative Campaigning in 18M Tweets across 19 Countries | Victor Hartman et.al. | 2507.17636v1 | null |
2025-07-23 | Decoding Consumer Preferences Using Attention-Based Language Models | Joshua Foster et.al. | 2507.17564v1 | null |
2025-07-23 | Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning | Xinyao Liu et.al. | 2507.17539v1 | null |
2025-07-23 | Probing Vision-Language Understanding through the Visual Entailment Task: promises and pitfalls | Elena Pitta et.al. | 2507.17467v1 | null |
2025-07-23 | Language-Conditioned Open-Vocabulary Mobile Manipulation with Pretrained Models | Shen Tan et.al. | 2507.17379v1 | null |
2025-07-23 | A Conditional Probability Framework for Compositional Zero-shot Learning | Peng Wu et.al. | 2507.17377v1 | null |
2025-07-23 | Application of Whisper in Clinical Practice: the Post-Stroke Speech Assessment during a Naming Task | Milena Davudova et.al. | 2507.17326v1 | null |
2025-07-23 | Exploring the Potential of LLMs for Serendipity Evaluation in Recommender Systems | Li Kang et.al. | 2507.17290v1 | null |
2025-07-23 | PolarAnything: Diffusion-based Polarimetric Image Synthesis | Kailong Zhang et.al. | 2507.17268v1 | null |
2025-07-22 | Task-Specific Zero-shot Quantization-Aware Training for Object Detection | Changhao Li et.al. | 2507.16782v1 | null |
2025-07-22 | Never Come Up Empty: Adaptive HyDE Retrieval for Improving LLM Developer Support | Fangjian Lei et.al. | 2507.16754v1 | null |
2025-07-22 | CMP: A Composable Meta Prompt for SAM-Based Cross-Domain Few-Shot Segmentation | Shuai Chen et.al. | 2507.16753v1 | null |
2025-07-22 | SALM: Spatial Audio Language Model with Structured Embeddings for Understanding and Editing | Jinbo Hu et.al. | 2507.16724v1 | null |
2025-07-22 | Are Foundation Models All You Need for Zero-shot Face Presentation Attack Detection? | Lazaro Janier Gonzalez-Sole et.al. | 2507.16393v1 | null |
2025-07-22 | Detect Any Sound: Open-Vocabulary Sound Event Detection with Multi-Modal Queries | Pengfei Cai et.al. | 2507.16343v1 | null |
2025-07-22 | Quality Text, Robust Vision: The Role of Language in Enhancing Visual Robustness of Vision-Language Models | Futa Waseda et.al. | 2507.16257v1 | null |
2025-07-22 | LMM4Edit: Benchmarking and Evaluating Multimodal Image Editing with LMMs | Zitong Xu et.al. | 2507.16193v1 | null |
2025-07-22 | Characterizing Online Activities Contributing to Suicide Mortality among Youth | Aparna Ananthasubramaniam et.al. | 2507.16185v1 | null |
2025-07-22 | PUSA V1.0: Surpassing Wan-I2V with $500 Training Cost by Vectorized Timestep Adaptation | Yaofang Liu et.al. | 2507.16116v1 | null |
2025-07-21 | VeriRAG: A Retrieval-Augmented Framework for Automated RTL Testability Repair | Haomin Qi et.al. | 2507.15664v1 | null |
2025-07-21 | Smart Eyes for Silent Threats: VLMs and In-Context Learning for THz Imaging | Nicolas Poggi et.al. | 2507.15576v1 | null |
2025-07-21 | HOLa: Zero-Shot HOI Detection with Low-Rank Decomposed VLM Feature Adaptation | Qinqian Lei et.al. | 2507.15542v1 | null |
2025-07-21 | One Last Attention for Your Vision-Language Model | Liang Chen et.al. | 2507.15480v1 | null |
2025-07-21 | PDEformer-2: A Versatile Foundation Model for Two-Dimensional Partial Differential Equations | Zhanhong Ye et.al. | 2507.15409v1 | null |
2025-07-21 | Beyond Easy Wins: A Text Hardness-Aware Benchmark for LLM-generated Text Detection | Navid Ayoobi et.al. | 2507.15286v1 | null |
2025-07-21 | A2TTS: TTS for Low Resource Indian Languages | Ayush Singh Bhadoriya et.al. | 2507.15272v1 | null |
2025-07-21 | FreeCus: Free Lunch Subject-driven Customization in Diffusion Transformers | Yanbing Zhang et.al. | 2507.15249v1 | null |
2025-07-20 | Deep Generative Models in Condition and Structural Health Monitoring: Opportunities, Limitations and Future Outlook | Xin Yang et.al. | 2507.15026v1 | null |
2025-07-20 | DMOSpeech 2: Reinforcement Learning for Duration Prediction in Metric-Optimized Speech Synthesis | Yinghao Aaron Li et.al. | 2507.14988v1 | null |
2025-07-18 | Blind Super Resolution with Reference Images and Implicit Degradation Representation | Huu-Phu Do et.al. | 2507.13915v1 | null |
2025-07-18 | SPARQL Query Generation with LLMs: Measuring the Impact of Training Data Memorization and Knowledge Injection | Aleksandr Gashkov et.al. | 2507.13859v1 | null |
2025-07-18 | Causal Knowledge Transfer for Multi-Agent Reinforcement Learning in Dynamic Environments | Kathrin Korte et.al. | 2507.13846v1 | null |
2025-07-17 | Learning Pluralistic User Preferences through Reinforcement Learning Fine-tuned Summaries | Hyunji Nam et.al. | 2507.13579v1 | null |
2025-07-17 | LoRA-Loop: Closing the Synthetic Replay Cycle for Continual VLM Learning | Kaihong Wang et.al. | 2507.13568v1 | null |
2025-07-17 | Revisiting Prompt Engineering: A Comprehensive Evaluation for LLM-based Personalized Recommendation | Genki Kusano et.al. | 2507.13525v1 | null |
2025-07-17 | Improving Out-of-distribution Human Activity Recognition via IMU-Video Cross-modal Representation Learning | Seyyed Saeid Cheshmi et.al. | 2507.13482v1 | null |
2025-07-17 | "PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models | Jing Gu et.al. | 2507.13428v1 | null |
2025-07-17 | Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes | Tyler Loakman et.al. | 2507.13335v1 | null |
2025-07-17 | Detecting LLM-generated Code with Subtle Modification by Adversarial Training | Xin Yin et.al. | 2507.13123v1 | null |
2025-07-17 | GLAD: Generalizable Tuning for Vision-Language Models | Yuqi Peng et.al. | 2507.13089v1 | null |
2025-07-17 | DEMONSTRATE: Zero-shot Language to Robotic Control via Multi-task Demonstration Learning | Rahel Rickenbach et.al. | 2507.12855v1 | null |
2025-07-17 | MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval | Jeong-Woo Park et.al. | 2507.12819v1 | null |
2025-07-17 | Think-Before-Draw: Decomposing Emotion Semantics & Fine-Grained Controllable Expressive Talking Head Generation | Hanlei Shi et.al. | 2507.12761v1 | null |
2025-07-17 | osmAG-LLM: Zero-Shot Open-Vocabulary Object Navigation via Semantic Maps and Large Language Models Reasoning | Fujing Xie et.al. | 2507.12753v1 | null |
2025-07-16 | Reconstruct, Inpaint, Finetune: Dynamic Novel-view Synthesis from Monocular Videos | Kaihua Chen et.al. | 2507.12646v1 | null |
2025-07-16 | Funnel-HOI: Top-Down Perception for Zero-Shot HOI Detection | Sandipan Sarma et.al. | 2507.12628v1 | null |
2025-07-16 | Generate to Ground: Multimodal Text Conditioning Boosts Phrase Grounding in Medical Vision-Language Models | Felix Nützel et.al. | 2507.12236v1 | null |
2025-07-16 | SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation | Jun Yin et.al. | 2507.11994v1 | null |
2025-07-16 | Style Composition within Distinct LoRA modules for Traditional Art | Jaehyun Lee et.al. | 2507.11986v1 | null |
2025-07-16 | GS-Bias: Global-Spatial Bias Learner for Single-Image Test-Time Adaptation of Vision-Language Models | Zhaohong Huang et.al. | 2507.11969v1 | null |
2025-07-16 | Imbalanced Regression Pipeline Recommendation | Juscimara G. Avelino et.al. | 2507.11901v1 | null |
2025-07-16 | SynCoGen: Synthesizable 3D Molecule Generation via Joint Reaction and Coordinate Modeling | Andrei Rekesh et.al. | 2507.11818v1 | null |
2025-07-15 | AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles | Matteo Fasulo et.al. | 2507.11764v1 | null |
2025-07-15 | Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning | Fan Shi et.al. | 2507.11761v1 | null |
2025-07-15 | Torsional-GFN: a conditional conformation generator for small molecules | Alexandra Volokhova et.al. | 2507.11759v1 | null |
2025-07-15 | CRABS: A syntactic-semantic pincer strategy for bounding LLM interpretation of Python notebooks | Meng Li et.al. | 2507.11742v1 | null |
2025-07-15 | Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation | Zhen Xu et.al. | 2507.11540v1 | null |
2025-07-15 | HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing | Pan Du et.al. | 2507.11474v1 | null |
2025-07-15 | Foundation Models for Logistics: Toward Certifiable, Conversational Planning Interfaces | Yunhao Yang et.al. | 2507.11352v1 | null |
2025-07-15 | How Far Have Medical Vision-Language Models Come? A Comprehensive Benchmarking Study | Che Liu et.al. | 2507.11200v1 | null |
2025-07-15 | MSA at ImageCLEF 2025 Multimodal Reasoning: Multilingual Multimodal Reasoning With Ensemble Vision Language Models | Seif Ahmed et.al. | 2507.11114v1 | null |
2025-07-15 | Bridge Feature Matching and Cross-Modal Alignment with Mutual-filtering for Zero-shot Anomaly Detection | Yuhu Bai et.al. | 2507.11003v1 | null |
2025-07-15 | Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation | Yanbo Wang et.al. | 2507.11001v1 | null |
2025-07-15 | MalCodeAI: Autonomous Vulnerability Detection and Remediation via Language Agnostic Code Reasoning | Jugal Gajjar et.al. | 2507.10898v1 | null |
2025-07-14 | LLM-Guided Agentic Object Detection for Open-World Understanding | Furkan Mumcu et.al. | 2507.10844v1 | null |
2025-07-14 | EmbRACE-3K: Embodied Reasoning and Action in Complex Environments | Mingxian Lin et.al. | 2507.10548v1 | null |
2025-07-14 | Graph World Model | Tao Feng et.al. | 2507.10539v1 | null |
2025-07-14 | Fine-Grained Zero-Shot Object Detection | Hongxu Ma et.al. | 2507.10358v1 | null |
2025-07-14 | Prompt Informed Reinforcement Learning for Visual Coverage Path Planning | Venkat Margapuri et.al. | 2507.10284v1 | null |
2025-07-14 | Conditional Chemical Language Models are Versatile Tools in Drug Discovery | Lu Zhu et.al. | 2507.10273v1 | null |
2025-07-14 | Natural Language-based Assessment of L2 Oral Proficiency using LLMs | Stefano Bannò et.al. | 2507.10200v1 | null |
2025-07-14 | DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation | Ivan Martinović et.al. | 2507.10118v1 | null |
2025-07-14 | FIX-CLIP: Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text | Bingchao Wang et.al. | 2507.10095v1 | null |
2025-07-14 | MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second | Chenguo Lin et.al. | 2507.10065v1 | null |
2025-07-14 | Automating SPARQL Query Translations between DBpedia and Wikidata | Malte Christian Bartels et.al. | 2507.10045v1 | null |
2025-07-11 | Compress Any Segment Anything Model (SAM) | Juntong Fan et.al. | 2507.08765v1 | null |
2025-07-11 | NL in the Middle: Code Translation with LLMs and Intermediate Representations | Chi-en Amy Tai et.al. | 2507.08627v1 | null |
2025-07-11 | BayesTTA: Continual-Temporal Test-Time Adaptation for Vision-Language Models via Gaussian Discriminant Analysis | Shuang Cui et.al. | 2507.08607v1 | null |
2025-07-11 | Unlocking Speech Instruction Data Potential with Query Rewriting | Yonghua Hei et.al. | 2507.08603v1 | null |
2025-07-11 | Visual Semantic Description Generation with MLLMs for Image-Text Matching | Junyu Chen et.al. | 2507.08590v1 | null |
2025-07-11 | Large Multi-modal Model Cartographic Map Comprehension for Textual Locality Georeferencing | Kalana Wijegunarathna et.al. | 2507.08575v1 | null |
2025-07-11 | AbbIE: Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling | Preslav Aleksandrov et.al. | 2507.08567v1 | null |
2025-07-11 | MIDI-VALLE: Improving Expressive Piano Performance Synthesis Through Neural Codec Language Modelling | Jingjing Tang et.al. | 2507.08530v1 | null |
2025-07-11 | SPINT: Spatial Permutation-Invariant Neural Transformer for Consistent Intracortical Motor Decoding | Trung Le et.al. | 2507.08402v1 | null |
2025-07-11 | PanMatch: Unleashing the Potential of Large Vision Models for Unified Matching Models | Yongjian Zhang et.al. | 2507.08400v1 | null |
2025-07-10 | Impact of Pretraining Word Co-occurrence on Compositional Generalization in Multimodal Models | Helen Qu et.al. | 2507.08000v1 | null |
2025-07-10 | MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization | Mingkai Jia et.al. | 2507.07997v1 | null |
2025-07-10 | CLIP Won't Learn Object-Attribute Binding from Natural Data and Here is Why | Bijay Gurung et.al. | 2507.07985v1 | null |
2025-07-10 | SAGE: A Visual Language Model for Anomaly Detection via Fact Enhancement and Entropy-aware Alignment | Guoxin Zang et.al. | 2507.07939v1 | null |
2025-07-10 | Lost in Pronunciation: Detecting Chinese Offensive Language Disguised by Phonetic Cloaking Replacement | Haotan Guo et.al. | 2507.07640v1 | null |
2025-07-10 | Exploring the Limits of Model Compression in LLMs: A Knowledge Distillation Study on QA Tasks | Joyeeta Datta et.al. | 2507.07630v1 | null |
2025-07-10 | LOSC: LiDAR Open-voc Segmentation Consolidator | Nermin Samet et.al. | 2507.07605v1 | null |
2025-07-10 | Mix-Geneformer: Unified Representation Learning for Human and Mouse scRNA-seq Data | Yuki Nishio et.al. | 2507.07454v1 | null |
2025-07-10 | EscherNet++: Simultaneous Amodal Completion and Scalable View Synthesis through Masked Fine-Tuning and Enhanced Feed-Forward 3D Reconstruction | Xinan Zhang et.al. | 2507.07410v1 | null |
2025-07-10 | Phishing Detection in the Gen-AI Era: Quantized LLMs vs Classical Models | Jikesh Thapa et.al. | 2507.07406v1 | null |
2025-07-09 | Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data | Ke Fan et.al. | 2507.07095v1 | null |
2025-07-09 | Free on the Fly: Enhancing Flexibility in Test-Time Adaptation with Online EM | Qiyuan Dai et.al. | 2507.06973v1 | null |
2025-07-09 | MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection | Ziyan Liu et.al. | 2507.06908v1 | null |
2025-07-09 | MADPOT: Medical Anomaly Detection with CLIP Adaptation and Partial Optimal Transport | Mahshid Shiri et.al. | 2507.06733v1 | null |
2025-07-09 | CLI-RAG: A Retrieval-Augmented Framework for Clinically Structured and Context Aware Text Generation with LLMs | Garapati Keerthana et.al. | 2507.06715v1 | null |
2025-07-09 | Text-promptable Object Counting via Quantity Awareness Enhancement | Miaojing Shi et.al. | 2507.06679v1 | null |
2025-07-09 | Few-shot Learning on AMS Circuits and Its Application to Parasitic Capacitance Prediction | Shan Shen et.al. | 2507.06538v1 | null |
2025-07-08 | VisioPath: Vision-Language Enhanced Model Predictive Control for Safe Autonomous Navigation in Mixed Traffic | Shanting Wang et.al. | 2507.06441v1 | null |
2025-07-08 | Tile-Based ViT Inference with Visual-Cluster Priors for Zero-Shot Multi-Species Plant Identification | Murilo Gustineli et.al. | 2507.06093v1 | null |
2025-07-08 | Conditional Multi-Stage Failure Recovery for Embodied Agents | Youmna Farag et.al. | 2507.06016v1 | null |
2025-07-08 | From General Relation Patterns to Task-Specific Decision-Making in Continual Multi-Agent Coordination | Chang Yao et.al. | 2507.06004v1 | null |
2025-07-08 | DocIE@XLLM25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations | Nicholas Popovič et.al. | 2507.05997v1 | null |
2025-07-08 | Automatic Synthesis of High-Quality Triplet Data for Composed Image Retrieval | Haiwen Li et.al. | 2507.05970v1 | null |
2025-07-09 | A Wireless Foundation Model for Multi-Task Prediction | Yucheng Sheng et.al. | 2507.05938v2 | null |
2025-07-08 | Differentiable Reward Optimization for LLM based TTS system | Changfeng Gao et.al. | 2507.05911v1 | null |
2025-07-08 | Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models | L'ea Dubois et.al. | 2507.05822v1 | null |
2025-07-08 | DreamGrasp: Zero-Shot 3D Multi-Object Reconstruction from Partial-View Images for Robotic Manipulation | Young Hun Kim et.al. | 2507.05627v1 | null |
2025-07-07 | SenseCF: LLM-Prompted Counterfactuals for Intervention and Sensor Data Augmentation | Shovito Barua Soumma et.al. | 2507.05541v1 | null |
2025-07-07 | Modeling Latent Partner Strategies for Adaptive Zero-Shot Human-Agent Collaboration | Benjamin Li et.al. | 2507.05244v1 | null |
2025-07-07 | In-Context Learning as an Effective Estimator of Functional Correctness of LLM-Generated Code | Susmita Das et.al. | 2507.05200v1 | null |
2025-07-07 | VERITAS: Verification and Explanation of Realness in Images for Transparency in AI Systems | Aadi Srivastava et.al. | 2507.05146v1 | null |
2025-07-07 | An Evaluation of Large Language Models on Text Summarization Tasks Using Prompt Engineering Techniques | Walid Mohamed Aly et.al. | 2507.05123v1 | null |
2025-07-07 | Multi-modal Representations for Fine-grained Multi-label Critical View of Safety Recognition | Britty Baby et.al. | 2507.05007v1 | null |
2025-07-08 | Do We Really Need Specialization? Evaluating Generalist Text Embeddings for Zero-Shot Recommendation and Search | Matteo Attimonelli et.al. | 2507.05006v2 | null |
2025-07-07 | Harnessing Pairwise Ranking Prompting Through Sample-Efficient Ranking Distillation | Junru Wu et.al. | 2507.04820v1 | null |
2025-07-07 | An analysis of vision-language models for fabric retrieval | Francesco Giuliari et.al. | 2507.04735v1 | null |
2025-07-07 | Why We Feel What We Feel: Joint Detection of Emotions and Their Opinion Triggers in E-commerce | Arnav Attri et.al. | 2507.04708v1 | null |
2025-07-07 | VectorLLM: Human-like Extraction of Structured Building Contours vis Multimodal LLMs | Tao Zhang et.al. | 2507.04664v1 | null |
2025-07-03 | MultiGen: Using Multimodal Generation in Simulation to Learn Multimodal Policies in Real | Renhao Wang et.al. | 2507.02864v1 | null |
2025-07-03 | RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation | Liheng Zhang et.al. | 2507.02792v1 | null |
2025-07-06 | KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs | Yuzhang Xie et.al. | 2507.02773v2 | null |
2025-07-03 | DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment | Ke-Han Lu et.al. | 2507.02768v1 | null |
2025-07-03 | DexVLG: Dexterous Vision-Language-Grasp Model at Scale | Jiawei He et.al. | 2507.02747v1 | null |
2025-07-03 | Hierarchical Multi-Label Contrastive Learning for Protein-Protein Interaction Prediction Across Organisms | Shiyi Liu et.al. | 2507.02724v1 | null |
2025-07-03 | Learning few-step posterior samplers by unfolding and distillation of diffusion models | Charlesquin Kemajou Mbakam et.al. | 2507.02686v1 | null |
2025-07-03 | A Matrix Variational Auto-Encoder for Variant Effect Prediction in Pharmacogenes | Antoine Honoré et.al. | 2507.02624v1 | null |
2025-07-03 | LLMREI: Automating Requirements Elicitation Interviews with LLMs | Alexander Korn et.al. | 2507.02564v1 | null |
2025-07-03 | IGDNet: Zero-Shot Robust Underexposed Image Enhancement via Illumination-Guided and Denoising | Hailong Yan et.al. | 2507.02445v1 | null |
2025-07-02 | Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning | Qingdong He et.al. | 2507.01908v1 | null |
2025-07-02 | Towards Foundation Auto-Encoders for Time-Series Anomaly Detection | Gastón García González et.al. | 2507.01875v1 | null |
2025-07-02 | MoIRA: Modular Instruction Routing Architecture for Multi-Task Robotics | Dmytro Kuzmenko et.al. | 2507.01843v1 | null |
2025-07-02 | RobuSTereo: Robust Zero-Shot Stereo Matching under Adverse Weather | Yuran Wang et.al. | 2507.01653v1 | null |
2025-07-02 | Adapting Language Models to Indonesian Local Languages: An Empirical Study of Language Transferability on Zero-Shot Settings | Rifki Afina Putri et.al. | 2507.01645v1 | null |
2025-07-02 | Depth Anything at Any Condition | Boyuan Sun et.al. | 2507.01634v1 | null |
2025-07-02 | NOCTIS: Novel Object Cyclic Threshold based Instance Segmentation | Max Gandyra et.al. | 2507.01463v1 | null |
2025-07-02 | La RoSA: Enhancing LLM Efficiency via Layerwise Rotated Sparse Activation | Kai Liu et.al. | 2507.01299v1 | null |
2025-07-02 | AIGVE-MACS: Unified Multi-Aspect Commenting and Scoring Model for AI-Generated Video Evaluation | Xiao Liu et.al. | 2507.01255v1 | null |
2025-07-01 | VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers | Yating Wang et.al. | 2507.01016v1 | null |
2025-06-30 | Foundation Models for Zero-Shot Segmentation of Scientific Images without AI-Ready Data | Shubhabrata Mukherjee et.al. | 2506.24039v1 | null |
2025-06-30 | Machine Understanding of Scientific Language | Dustin Wright et.al. | 2506.23990v1 | null |
2025-06-30 | Leveraging the Potential of Prompt Engineering for Hate Speech Detection in Low-Resource Languages | Ruhina Tabasshum Prome et.al. | 2506.23930v1 | null |
2025-06-30 | World4Omni: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation | Haonan Chen et.al. | 2506.23919v1 | null |
2025-06-30 | Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model | Shiming Chen et.al. | 2506.23822v1 | null |
2025-06-30 | Zero-Shot Contextual Embeddings via Offline Synthetic Corpus Generation | Philip Lippmann et.al. | 2506.23662v1 | null |
2025-06-30 | Blending Concepts with Text-to-Image Diffusion Models | Lorenzo Olearo et.al. | 2506.23630v1 | null |
2025-06-30 | StackCLIP: Clustering-Driven Stacked Prompt in Zero-Shot Industrial Anomaly Detection | Yanning Hou et.al. | 2506.23577v1 | null |
2025-06-30 | AdFair-CLIP: Adversarial Fair Contrastive Language-Image Pre-training for Chest X-rays | Chenlang Yi et.al. | 2506.23467v1 | null |
2025-06-29 | Federated Timeline Synthesis: Scalable and Private Methodology For Model Training and Deployment | Pawel Renc et.al. | 2506.23358v1 | null |
2025-06-27 | Reinforcement Learning with Physics-Informed Symbolic Program Priors for Zero-Shot Wireless Indoor Navigation | Tao Li et.al. | 2506.22365v1 | null |
2025-06-27 | OutDreamer: Video Outpainting with a Diffusion Transformer | Linhao Zhong et.al. | 2506.22298v1 | null |
2025-06-27 | Frequency-Semantic Enhanced Variational Autoencoder for Zero-Shot Skeleton-based Action Recognition | Wenhan Wu et.al. | 2506.22179v1 | null |
2025-06-27 | Partial CLIP is Enough: Chimera-Seg for Zero-shot Semantic Segmentation | Jialei Chen et.al. | 2506.22032v1 | null |
2025-06-27 | SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-shot 3D Visual Grounding | Zhao Jin et.al. | 2506.21924v1 | null |
2025-06-27 | ZeroReg3D: A Zero-shot Registration Pipeline for 3D Consecutive Histopathology Image Reconstruction | Juming Xiong et.al. | 2506.21923v1 | null |
2025-06-27 | Embodied Domain Adaptation for Object Detection | Xiangyu Shi et.al. | 2506.21860v1 | null |
2025-06-30 | ProSAM: Enhancing the Robustness of SAM-based Visual Reference Segmentation with Probabilistic Prompts | Xiaoqi Wang et.al. | 2506.21835v2 | null |
2025-06-26 | WAFT: Warping-Alone Field Transforms for Optical Flow | Yihan Wang et.al. | 2506.21526v1 | null |
2025-06-26 | Lightweight Physics-Informed Zero-Shot Ultrasound Plane Wave Denoising | Hojat Asgariandehkordi et.al. | 2506.21499v1 | null |
2025-06-26 | Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection | Ali Şenol et.al. | 2506.21443v1 | null |
2025-06-26 | SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning | Melanie Rieff et.al. | 2506.21355v1 | null |
2025-06-26 | Zero-Shot Learning for Obsolescence Risk Forecasting | Elie Saad et.al. | 2506.21240v1 | null |
2025-06-26 | Efficient Skill Discovery via Regret-Aware Optimization | He Zhang et.al. | 2506.21044v1 | null |
2025-06-26 | EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning | Xiao Zhang et.al. | 2506.20986v1 | null |
2025-06-27 | DFVEdit: Conditional Delta Flow Vector for Zero-shot Video Editing | Lingling Cai et.al. | 2506.20967v2 | null |
2025-06-26 | Consistent Zero-shot 3D Texture Synthesis Using Geometry-aware Diffusion and Temporal Video Models | Donggoo Kang et.al. | 2506.20946v1 | null |
2025-06-25 | MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans | Shubhankar Borse et.al. | 2506.20879v1 | null |
2025-06-25 | Uncovering Hidden Violent Tendencies in LLMs: A Demographic Analysis via Behavioral Vignettes | Quintin Myers et.al. | 2506.20822v1 | null |
2025-06-25 | Behavior Foundation Model: Towards Next-Generation Whole-Body Control System of Humanoid Robots | Mingqi Yuan et.al. | 2506.20487v1 | null |
2025-06-25 | HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling | Tobias Vontobel et.al. | 2506.20452v1 | null |
2025-06-25 | Recognizing Surgical Phases Anywhere: Few-Shot Test-time Adaptation and Task-graph Guided Refinement | Kun Yuan et.al. | 2506.20254v1 | null |
2025-06-25 | Zero-Shot Attribution for Large Language Models: A Distribution Testing Approach | Clément L. Canonne et.al. | 2506.20197v1 | null |
2025-06-25 | An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-shot Multi-speaker TTS | Marie Kunešová et.al. | 2506.20190v1 | null |
2025-06-25 | CCRS: A Zero-Shot LLM-as-a-Judge Framework for Comprehensive RAG Evaluation | Aashiq Muhamed et.al. | 2506.20128v1 | null |
2025-06-24 | Universal pre-training by iterated random computation | Peter Bloem et.al. | 2506.20057v1 | null |
2025-06-24 | TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design | Geonwoo Cho et.al. | 2506.19997v1 | null |
2025-06-24 | MILAAP: Mobile Link Allocation via Attention-based Prediction | Yung-Fu Chen et.al. | 2506.19947v1 | null |
2025-06-26 | ReactEMG: Zero-Shot, Low-Latency Intent Detection via sEMG | Runsheng Wang et.al. | 2506.19815v2 | null |
2025-06-24 | SAM2-SGP: Enhancing SAM2 for Medical Image Segmentation via Support-Set Guided Prompting | Yang Xing et.al. | 2506.19658v1 | null |
2025-06-24 | ChordPrompt: Orchestrating Cross-Modal Prompt Synergy for Multi-Domain Incremental Learning in CLIP | Zhiyuan Wang et.al. | 2506.19608v1 | null |
2025-06-24 | Commonsense Generation and Evaluation for Dialogue Systems using Large Language Models | Marcos Estecha-Garitagoitia et.al. | 2506.19483v1 | null |
2025-06-24 | Commander-GPT: Dividing and Routing for Multimodal Sarcasm Detection | Yazhou Zhang et.al. | 2506.19420v1 | null |
2025-06-24 | Maximal Update Parametrization and Zero-Shot Hyperparameter Transfer for Fourier Neural Operators | Shanda Li et.al. | 2506.19396v1 | null |
2025-06-24 | Zero-Shot Parameter Learning of Robot Dynamics Using Bayesian Statistics and Prior Knowledge | Carsten Reiners et.al. | 2506.19350v1 | null |
2025-06-24 | Robotic Perception with a Large Tactile-Vision-Language Model for Physical Property Inference | Zexiang Guo et.al. | 2506.19303v1 | null |
2025-06-23 | Spiritual-LLM : Gita Inspired Mental Health Therapy In the Era of LLMs | Janak Kapuriya et.al. | 2506.19185v1 | null |
2025-06-23 | EEG Foundation Challenge: From Cross-Task to Cross-Subject EEG Decoding | Bruno Aristimunha et.al. | 2506.19141v1 | null |
2025-06-23 | Universal Video Temporal Grounding with Generative Multi-modal Large Language Models | Zeqian Li et.al. | 2506.18883v1 | null |
2025-06-23 | A Modular Taxonomy for Hate Speech Definitions and Its Impact on Zero-Shot LLM Classification Performance | Matteo Melis et.al. | 2506.18576v1 | null |
2025-06-23 | Standard Applicability Judgment and Cross-jurisdictional Reasoning: A RAG-based Framework for Medical Device Compliance | Yu Han et.al. | 2506.18511v1 | null |
2025-06-23 | Generalizing Vision-Language Models to Novel Domains: A Comprehensive Survey | Xinyao Li et.al. | 2506.18504v1 | null |
2025-06-23 | GraspMAS: Zero-Shot Language-driven Grasp Detection with Multi-Agent System | Quang Nguyen et.al. | 2506.18448v1 | null |
2025-06-23 | CPAM: Context-Preserving Adaptive Manipulation for Zero-Shot Real Image Editing | Dinh-Khoi Vo et.al. | 2506.18438v1 | null |
2025-06-23 | A Multi-Scale Spatial Attention-Based Zero-Shot Learning Framework for Low-Light Image Enhancement | Muhammad Azeem Aslam et.al. | 2506.18323v1 | null |
2025-06-23 | Team LA at SCIDOCA shared task 2025: Citation Discovery via relation-based zero-shot retrieval | Trieu An et.al. | 2506.18316v1 | null |
2025-06-23 | GeNeRT: A Physics-Informed Approach to Intelligent Wireless Channel Modeling via Generalizable Neural Ray Tracing | Kejia Bian et.al. | 2506.18295v1 | null |
2025-06-23 | Learning Causal Graphs at Scale: A Foundation Model Approach | Naiyu Yin et.al. | 2506.18285v1 | null |
2025-06-23 | Emergent Temporal Correspondences from Video Diffusion Transformers | Jisu Nam et.al. | 2506.17220v2 | link |
2025-06-20 | Monocular One-Shot Metric-Depth Alignment for RGB-Based Robot Grasping | Teng Guo et.al. | 2506.17110v1 | null |
2025-06-20 | Prmpt2Adpt: Prompt-Based Zero-Shot Domain Adaptation for Resource-Constrained Environments | Yasir Ali Farrukh et.al. | 2506.16994v1 | null |
2025-06-20 | LunarLoc: Segment-Based Global Localization on the Moon | Annika Thomas et.al. | 2506.16940v1 | link |
2025-06-20 | Single-shot thermometry of simulated Bose--Einstein condensates using artificial intelligence | Jack Griffiths et.al. | 2506.16925v1 | null |
2025-06-20 | With Limited Data for Multimodal Alignment, Let the STRUCTURE Guide You | Fabian Gröger et.al. | 2506.16895v1 | null |
2025-06-20 | AnyTraverse: An off-road traversability framework with VLM and human operator in the loop | Sattwik Sahu et.al. | 2506.16826v1 | null |
2025-06-20 | Generalizable Agent Modeling for Agent Collaboration-Competition Adaptation with Multi-Retrieval and Dynamic Generation | Chenxu Wang et.al. | 2506.16718v1 | link |
2025-06-20 | LegiGPT: Party Politics and Transport Policy with Large Language Model | Hyunsoo Yun et.al. | 2506.16692v1 | null |
2025-06-19 | History-Augmented Vision-Language Models for Frontier-Based Zero-Shot Object Navigation | Mobin Habibpour et.al. | 2506.16623v1 | null |
2025-06-18 | Task-Agnostic Experts Composition for Continual Learning | Luigi Quarantiello et.al. | 2506.15566v1 | null |
2025-06-18 | Creating User-steerable Projections with Interactive Semantic Mapping | Artur André Oliveira et.al. | 2506.15479v1 | null |
2025-06-18 | Zero-Shot Reinforcement Learning Under Partial Observability | Scott Jeen et.al. | 2506.15446v1 | null |
2025-06-18 | DeVisE: Behavioral Testing of Medical Large Language Models | Camila Zurdo Tagliabue et.al. | 2506.15339v1 | null |
2025-06-18 | A Comparative Study of Task Adaptation Techniques of Large Language Models for Identifying Sustainable Development Goals | Andrea Cadeddu et.al. | 2506.15208v1 | null |
2025-06-18 | ReSeDis: A Dataset for Referring-based Object Search across Large-Scale Image Collections | Ziling Huang et.al. | 2506.15180v1 | null |
2025-06-18 | DyNaVLM: Zero-Shot Vision-Language Navigation System with Dynamic Viewpoints and Self-Refining Graph Memory | Zihe Ji et.al. | 2506.15096v1 | null |
2025-06-17 | From Chat to Checkup: Can Large Language Models Assist in Diabetes Prediction? | Shadman Sakib et.al. | 2506.14949v1 | link |
2025-06-17 | BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models | Bharath Dandala et.al. | 2506.14861v1 | link |
2025-06-17 | Revisiting Chain-of-Thought Prompting: Zero-shot Can Be Stronger than Few-shot | Xiang Cheng et.al. | 2506.14641v1 | null |
2025-06-17 | VisLanding: Monocular 3D Perception for UAV Safe Landing via Depth-Normal Synergy | Zhuoyue Tan et.al. | 2506.14525v1 | null |
2025-06-17 | EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric Optimization | Xiaoqi Wang et.al. | 2506.14356v1 | link |
2025-06-17 | ClutterDexGrasp: A Sim-to-Real System for General Dexterous Grasping in Cluttered Scenes | Zeyuan Chen et.al. | 2506.14317v1 | null |
2025-06-17 | Equivariance Everywhere All At Once: A Recipe for Graph Foundation Models | Ben Finkelshtein et.al. | 2506.14291v1 | link |
2025-06-17 | Investigation of Zero-shot Text-to-Speech Models for Enhancing Short-Utterance Speaker Verification | Yiyang Zhao et.al. | 2506.14226v1 | null |
2025-06-17 | Interpreting Biomedical VLMs on High-Imbalance Out-of-Distributions: An Insight into BiomedCLIP on Radiology | Nafiz Sadman et.al. | 2506.14136v1 | link |
2025-06-17 | Multi-Scale Finetuning for Encoder-based Time Series Foundation Models | Zhongzheng Qiao et.al. | 2506.14087v1 | null |
2025-06-16 | An Interdisciplinary Review of Commonsense Reasoning and Intent Detection | Md Nazmus Sakib et.al. | 2506.14040v1 | null |
2025-06-16 | Comparison of ConvNeXt and Vision-Language Models for Breast Density Assessment in Screening Mammography | Yusdivia Molina-Román et.al. | 2506.13964v1 | null |
2025-06-16 | LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction | Haoru Xue et.al. | 2506.13751v1 | null |
2025-06-16 | OTFusion: Bridging Vision-only and Vision-Language Models via Optimal Transport for Transductive Zero-Shot Learning | Qiyu Xu et.al. | 2506.13723v1 | null |
2025-06-16 | Abstract, Align, Predict: Zero-Shot Stance Detection via Cognitive Inductive Reasoning | Jun Ma et.al. | 2506.13470v1 | null |
2025-06-16 | Zero-Shot Solving of Imaging Inverse Problems via Noise-Refined Likelihood Guided Diffusion Models | Zhen Wang et.al. | 2506.13391v1 | null |
2025-06-16 | TR2M: Transferring Monocular Relative Depth to Metric Depth with Language Descriptions and Scale-Oriented Contrast | Beilei Cui et.al. | 2506.13387v1 | link |
2025-06-16 | Distinct Computations Emerge From Compositional Curricula in In-Context Learning | Jin Hwa Lee et.al. | 2506.13253v1 | null |
2025-06-16 | PRISM2: Unlocking Multi-Modal General Pathology AI with Clinical Dialogue | George Shaikovski et.al. | 2506.13063v1 | null |
2025-06-16 | ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching | Han Zhu et.al. | 2506.13053v1 | link |
2025-06-16 | Knowledge Graph Fusion with Large Language Models for Accurate, Explainable Manufacturing Process Planning | Danny Hoang et.al. | 2506.13026v1 | null |
2025-06-15 | Zero-shot denoising via neural compression: Theoretical and algorithmic framework | Ali Zafari et.al. | 2506.12693v1 | link |
2025-06-13 | On the Performance of LLMs for Real Estate Appraisal | Margot Geerts et.al. | 2506.11812v1 | null |
2025-06-13 | Persona-driven Simulation of Voting Behavior in the European Parliament with Large Language Models | Maximilian Kreutner et.al. | 2506.11798v1 | null |
2025-06-13 | Self-supervised Learning of Echocardiographic Video Representations via Online Cluster Distillation | Divyanshu Mishra et.al. | 2506.11777v1 | link |
2025-06-13 | ExoStart: Efficient learning for dexterous manipulation with sensorized exoskeleton demonstrations | Zilin Si et.al. | 2506.11775v1 | null |
2025-06-13 | Converting Annotated Clinical Cases into Structured Case Report Forms | Pietro Ferrazzi et.al. | 2506.11666v1 | null |
2025-06-13 | Prohibited Items Segmentation via Occlusion-aware Bilayer Modeling | Yunhan Ren et.al. | 2506.11661v1 | link |
2025-06-13 | OV-MAP : Open-Vocabulary Zero-Shot 3D Instance Segmentation Map for Robots | Juno Kim et.al. | 2506.11585v1 | null |
2025-06-13 | Identifying Helpful Context for LLM-based Vulnerability Repair: A Preliminary Study | Gábor Antal et.al. | 2506.11561v1 | null |
2025-06-13 | Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs | Xiao Xu et.al. | 2506.11515v1 | null |
2025-06-13 | Preserving Clusters in Prompt Learning for Unsupervised Domain Adaptation | Tung-Long Vuong et.al. | 2506.11493v1 | null |
2025-06-12 | AIR: Zero-shot Generative Model Adaptation with Iterative Refinement | Guimeng Liu et.al. | 2506.10895v1 | link |
2025-06-12 | The Diffusion Duality | Subham Sekhar Sahoo et.al. | 2506.10892v1 | link |
2025-06-12 | Precise Zero-Shot Pointwise Ranking with LLMs through Post-Aggregated Global Context Information | Kehan Long et.al. | 2506.10859v1 | link |
2025-06-12 | Generalist Models in Medical Image Segmentation: A Survey and Performance Comparison with Task-Specific Approaches | Andrea Moglia et.al. | 2506.10825v1 | null |
2025-06-12 | Prompts to Summaries: Zero-Shot Language-Guided Video Summarization | Mario Barbara et.al. | 2506.10807v1 | null |
2025-06-12 | Neural at ArchEHR-QA 2025: Agentic Prompt Optimization for Evidence-Grounded Clinical Question Answering | Sai Prasanna Teja Reddy Bogireddy et.al. | 2506.10751v1 | null |
2025-06-13 | IQE-CLIP: Instance-aware Query Embedding for Zero-/Few-shot Anomaly Detection in Medical Domain | Hong Huang et.al. | 2506.10730v2 | link |
2025-06-12 | Beyond Single-User Dialogue: Assessing Multi-User Dialogue State Tracking Capabilities of Large Language Models | Sangmin Song et.al. | 2506.10504v1 | null |
2025-06-12 | LLMs Are Not Yet Ready for Deepfake Image Detection | Shahroz Tariq et.al. | 2506.10474v1 | null |
2025-06-12 | Using Vision Language Models to Detect Students' Academic Emotion through Facial Expressions | Deliang Wang et.al. | 2506.10334v1 | null |
2025-06-11 | Large Language Models for Toxic Language Detection in Low-Resource Balkan Languages | Amel Muminovic et.al. | 2506.09992v1 | link |
2025-06-11 | V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning | Mido Assran et.al. | 2506.09985v1 | link |
2025-06-11 | Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking | Wuwei Zhang et.al. | 2506.09944v1 | link |
2025-06-11 | Dataset of News Articles with Provenance Metadata for Media Relevance Assessment | Tomas Peterka et.al. | 2506.09847v1 | null |
2025-06-11 | Superstudent intelligence in thermodynamics | Rebecca Loubet et.al. | 2506.09822v1 | null |
2025-06-11 | Do LLMs Give Psychometrically Plausible Responses in Educational Assessments? | Andreas Säuberli et.al. | 2506.09796v1 | null |
2025-06-11 | Accurate and efficient zero-shot 6D pose estimation with frozen foundation models | Andrea Caraffa et.al. | 2506.09784v1 | null |
2025-06-11 | ELBO-T2IAlign: A Generic ELBO-Based Method for Calibrating Pixel-level Text-Image Alignment in Diffusion Models | Qin Zhou et.al. | 2506.09740v1 | null |
2025-06-11 | CHIP: A multi-sensor dataset for 6D pose estimation of chairs in industrial settings | Mattia Nardon et.al. | 2506.09699v1 | null |
2025-06-11 | Geometric flow regularization in latent spaces for smooth dynamics with the efficient variations of curvature | Andrew Gracyk et.al. | 2506.09679v1 | null |
2025-06-10 | Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models | Chenyu Lian et.al. | 2506.08990v1 | link |
2025-06-10 | Hyperbolic Dual Feature Augmentation for Open-Environment | Peilin Yu et.al. | 2506.08906v1 | null |
2025-06-10 | Advancing STT for Low-Resource Real-World Speech | Flavio D'Intino et.al. | 2506.08836v1 | null |
2025-06-10 | Paths to Causality: Finding Informative Subgraphs Within Knowledge Graphs for Knowledge-Based Causal Discovery | Yuni Susanti et.al. | 2506.08771v1 | link |
2025-06-11 | AraReasoner: Evaluating Reasoning-Based LLMs for Arabic NLP | Ahmed Hasanaath et.al. | 2506.08768v2 | null |
2025-06-11 | ClimateViz: A Benchmark for Statistical Reasoning and Fact Verification on Scientific Charts | Ruiran Su et.al. | 2506.08700v2 | link |
2025-06-10 | Orientation Matters: Making 3D Generative Models Orientation-Aligned | Yichong Lu et.al. | 2506.08640v1 | null |
2025-06-10 | Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings | Liyan Xu et.al. | 2506.08592v1 | link |
2025-06-10 | Fairness is Not Silence: Unmasking Vacuous Neutrality in Small Language Models | Sumanth Manduru et.al. | 2506.08487v1 | null |
2025-06-10 | Detecting Harmful Memes with Decoupled Understanding and Guided CoT Reasoning | Fengjun Pan et.al. | 2506.08477v1 | null |
2025-06-09 | StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets | Anh-Quan Cao et.al. | 2506.08013v1 | link |
2025-06-09 | ZeroVO: Visual Odometry with Minimal Assumptions | Lei Lai et.al. | 2506.08005v1 | null |
2025-06-09 | CXR-LT 2024: A MICCAI challenge on long-tailed, multi-label, and zero-shot disease classification from chest X-ray | Mingquan Lin et.al. | 2506.07984v1 | null |
2025-06-09 | LUCIFER: Language Understanding and Context-Infused Framework for Exploration and Behavior Refinement | Dimitris Panagopoulos et.al. | 2506.07915v1 | null |
2025-06-09 | Evaluating Large Language Models on the Frame and Symbol Grounding Problems: A Zero-shot Benchmark | Shoko Oka et.al. | 2506.07896v1 | link |
2025-06-09 | Deep Equivariant Multi-Agent Control Barrier Functions | Nikolaos Bousias et.al. | 2506.07755v1 | null |
2025-06-09 | Language Embedding Meets Dynamic Graph: A New Exploration for Neural Architecture Representation Learning | Haizhao Jing et.al. | 2506.07735v1 | null |
2025-06-09 | Vuyko Mistral: Adapting LLMs for Low-Resource Dialectal Translation | Roman Kyslyi et.al. | 2506.07617v1 | null |
2025-06-09 | MIRA: Medical Time Series Foundation Model for Real-World Health Data | Hao Li et.al. | 2506.07584v1 | null |
2025-06-09 | Efficient Generation of Diverse Cooperative Agents with World Models | Yi Loo et.al. | 2506.07450v1 | null |
2025-06-06 | RecGPT: A Foundation Model for Sequential Recommendation | Yangqin Jiang et.al. | 2506.06270v1 | link |
2025-06-06 | Masked Language Models are Good Heterogeneous Graph Generalizers | Jinyu Yang et.al. | 2506.06157v1 | link |
2025-06-06 | Let's CONFER: A Dataset for Evaluating Natural Language Inference Models on CONditional InFERence and Presupposition | Tara Azin et.al. | 2506.06133v1 | null |
2025-06-06 | Bridging the Gap: In-Context Learning for Modeling Human Disagreement | Benedetta Muscato et.al. | 2506.06113v1 | null |
2025-06-09 | Text-to-LoRA: Instant Transformer Adaption | Rujikorn Charakorn et.al. | 2506.06105v2 | null |
2025-06-06 | Full Conformal Adaptation of Medical Vision-Language Models | Julio Silva-Rodríguez et.al. | 2506.06076v1 | null |
2025-06-06 | Zero-Shot Detection of LLM-Generated Code via Approximated Task Conditioning | Maor Ashkenazi et.al. | 2506.06069v1 | null |
2025-06-06 | LightGTS: A Lightweight General Time Series Forecasting Model | Yihang Wang et.al. | 2506.06005v1 | null |
2025-06-06 | Improving Long-Range Navigation with Spatially-Enhanced Recurrent Memory via End-to-End Reinforcement Learning | Fan Yang et.al. | 2506.05997v1 | null |
2025-06-06 | MOGO: Residual Quantized Hierarchical Causal Transformer for High-Quality and Real-Time 3D Human Motion Generation | Dongjie Fu et.al. | 2506.05952v1 | null |
2025-06-05 | ProRefine: Inference-time Prompt Refinement with Textual Feedback | Deepak Pandita et.al. | 2506.05305v1 | null |
2025-06-05 | RaySt3R: Predicting Novel Depth Maps for Zero-Shot Object Completion | Bardienus P. Duisterhof et.al. | 2506.05285v1 | null |
2025-06-05 | From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained Videos | Animesh Gupta et.al. | 2506.05274v1 | link |
2025-06-05 | Can Foundation Models Generalise the Presentation Attack Detection Capabilities on ID Cards? | Juan E. Tapia et.al. | 2506.05263v1 | null |
2025-06-05 | Towards Vision-Language-Garment Models For Web Knowledge Garment Understanding and Generation | Jan Ackermann et.al. | 2506.05210v1 | null |
2025-06-05 | Fabrica: Dual-Arm Assembly of General Multi-Part Objects via Integrated Planning and Learning | Yunsheng Tian et.al. | 2506.05168v1 | null |
2025-06-05 | DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning | Tanmay Parekh et.al. | 2506.05128v1 | null |
2025-06-05 | Just a Scratch: Enhancing LLM Capabilities for Self-harm Detection through Intent Differentiation and Emoji Interpretation | Soumitra Ghosh et.al. | 2506.05073v1 | null |
2025-06-05 | Tuning the Right Foundation Models is What you Need for Partial Label Learning | Kuang He et.al. | 2506.05027v1 | link |
2025-06-05 | Structure-Aware Radar-Camera Depth Estimation | Fuyi Zhang et.al. | 2506.05008v1 | null |
2025-06-04 | Object-centric 3D Motion Field for Robot Learning from Human Videos | Zhao-Heng Yin et.al. | 2506.04227v1 | null |
2025-06-04 | Struct2D: A Perception-Guided Framework for Spatial Reasoning in Large Multimodal Models | Fangrui Zhu et.al. | 2506.04220v1 | null |
2025-06-04 | OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis | Junting Chen et.al. | 2506.04217v1 | link |
2025-06-04 | MACS: Multi-Agent Reinforcement Learning for Optimization of Crystal Structures | Elena Zamaraeva et.al. | 2506.04195v1 | null |
2025-06-04 | Physics-Constrained Flow Matching: Sampling Generative Models with Hard Constraints | Utkarsh Utkarsh et.al. | 2506.04171v1 | null |
2025-06-04 | HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset | Ryan Langman et.al. | 2506.04152v1 | null |
2025-06-04 | TextAtari: 100K Frames Game Playing with Language Agents | Wenhao Li et.al. | 2506.04098v1 | link |
2025-06-04 | Towards Better Disentanglement in Non-Autoregressive Zero-Shot Expressive Voice Conversion | Seymanur Akti et.al. | 2506.04013v1 | null |
2025-06-04 | Vocabulary-free few-shot learning for Vision-Language Models | Maxime Zanella et.al. | 2506.04005v1 | null |
2025-06-04 | Kinship in Speech: Leveraging Linguistic Relatedness for Zero-Shot TTS in Indian Languages | Utkarsh Pathak et.al. | 2506.03884v1 | null |
2025-06-03 | Native-Resolution Image Synthesis | Zidong Wang et.al. | 2506.03131v1 | null |
2025-06-03 | Zero-Shot Time Series Forecasting with Covariates via In-Context Learning | Andreas Auer et.al. | 2506.03128v1 | null |
2025-06-03 | Targeted Forgetting of Image Subgroups in CLIP Models | Zeliang Zhang et.al. | 2506.03117v1 | null |
2025-06-03 | Zero-Shot Tree Detection and Segmentation from Aerial Forest Imagery | Michelle Chen et.al. | 2506.03114v1 | link |
2025-06-03 | FuseLIP: Multimodal Embeddings via Early Fusion of Discrete Tokens | Christian Schlarmann et.al. | 2506.03096v1 | link |
2025-06-03 | DFBench: Benchmarking Deepfake Image Detection Capability of Large Multimodal Models | Jiarui Wang et.al. | 2506.03007v1 | null |
2025-06-03 | A Multi-Agent Framework for Mitigating Dialect Biases in Privacy Policy Question-Answering Systems | Đorđe Klisura et.al. | 2506.02998v1 | null |
2025-06-04 | FlySearch: Exploring how vision-language models explore | Adam Pardyl et.al. | 2506.02896v2 | link |
2025-06-03 | DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization | Geonyoung Lee et.al. | 2506.02858v1 | null |
2025-06-03 | PBR-SR: Mesh PBR Texture Super Resolution from 2D Image Priors | Yujin Chen et.al. | 2506.02846v1 | null |
2025-05-30 | Zero-Shot Chinese Character Recognition with Hierarchical Multi-Granularity Image-Text Aligning | Yinglian Zhu et.al. | 2505.24837v1 | null |
2025-05-30 | Multilinguality Does not Make Sense: Investigating Factors Behind Zero-Shot Transfer in Sense-Aware Tasks | Roksana Goworek et.al. | 2505.24834v1 | null |
2025-05-30 | LGAR: Zero-Shot LLM-Guided Neural Ranking for Abstract Screening in Systematic Literature Reviews | Christian Jaumann et.al. | 2505.24757v1 | link |
2025-05-30 | Conformal Prediction for Zero-Shot Models | Julio Silva-Rodríguez et.al. | 2505.24693v1 | link |
2025-05-30 | TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data Synthesis | Xiaorui Wu et.al. | 2505.24672v1 | link |
2025-05-30 | Benchmarking Large Language Models for Cryptanalysis and Mismatched-Generalization | Utsav Maskey et.al. | 2505.24621v1 | null |
2025-05-30 | When Harry Meets Superman: The Role of The Interlocutor in Persona-Based Dialogue Generation | Daniela Occhipinti et.al. | 2505.24613v1 | null |
2025-05-30 | Improving Language and Modality Transfer in Translation by Character-level Modeling | Ioannis Tsiamas et.al. | 2505.24561v1 | null |
2025-05-30 | Can Slow-thinking LLMs Reason Over Time? Empirical Studies in Time Series Forecasting | Jiahao Wang et.al. | 2505.24511v1 | link |
2025-05-30 | Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning | Amit Peleg et.al. | 2505.24424v1 | null |
2025-05-29 | To Trust Or Not To Trust Your Vision-Language Model's Prediction | Hao Dong et.al. | 2505.23745v1 | link |
2025-05-29 | TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning | Andreas Auer et.al. | 2505.23719v1 | link |
2025-05-29 | AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views | Lihan Jiang et.al. | 2505.23716v1 | null |
2025-05-29 | LoLA: Low-Rank Linear Attention With Sparse Caching | Luke McDermott et.al. | 2505.23666v1 | null |
2025-05-29 | D-AR: Diffusion via Autoregressive Models | Ziteng Gao et.al. | 2505.23660v1 | link |
2025-05-29 | ARC: Argument Representation and Coverage Analysis for Zero-Shot Long Document Summarization with Instruction Following LLMs | Mohamed Elaraby et.al. | 2505.23654v1 | null |
2025-05-29 | ZeroSep: Separate Anything in Audio with Zero Training | Chao Huang et.al. | 2505.23625v1 | null |
2025-05-29 | Evaluating AI capabilities in detecting conspiracy theories on YouTube | Leonardo La Rocca et.al. | 2505.23570v1 | link |
2025-05-29 | Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition | Yu Li et.al. | 2505.23566v1 | link |
2025-05-29 | Spoken question answering for visual queries | Nimrod Shabtay et.al. | 2505.23308v1 | null |
2025-05-28 | Zero-Shot Vision Encoder Grafting via LLM Surrogates | Kaiyu Yue et.al. | 2505.22664v1 | link |
2025-05-28 | Learning Composable Chains-of-Thought | Fangcong Yin et.al. | 2505.22635v1 | null |
2025-05-28 | ClaimPKG: Enhancing Claim Verification via Pseudo-Subgraph Generation with Lightweight Specialized LLM | Hoang Pham et.al. | 2505.22552v1 | null |
2025-05-28 | Multi-MLLM Knowledge Distillation for Out-of-Context News Detection | Yimeng Gu et.al. | 2505.22517v1 | null |
2025-05-28 | Zero-Shot 3D Visual Grounding from Vision-Language Models | Rong Li et.al. | 2505.22429v1 | null |
2025-05-29 | Logical Consistency is Vital: Neural-Symbolic Information Retrieval for Negative-Constraint Queries | Ganlin Xu et.al. | 2505.22299v2 | link |
2025-05-28 | Compensating for Data with Reasoning: Low-Resource Machine Translation with LLMs | Samuel Frontull et.al. | 2505.22293v1 | null |
2025-05-28 | Domain Adaptation of Attention Heads for Zero-shot Anomaly Detection | Kiyoon Jeong et.al. | 2505.22259v1 | null |
2025-05-28 | 3D Question Answering via only 2D Vision-Language Models | Fengyun Wang et.al. | 2505.22143v1 | null |
2025-05-28 | Bringing CLIP to the Clinic: Dynamic Soft Labels and Negation-Aware Learning for Medical Analysis | Hanbin Ko et.al. | 2505.22079v1 | null |
2025-05-27 | Vision Transformers with Self-Distilled Registers | Yinjie Chen et.al. | 2505.21501v1 | null |
2025-05-27 | M3S-UPD: Efficient Multi-Stage Self-Supervised Learning for Fine-Grained Encrypted Traffic Classification with Unknown Pattern Discovery | Yali Yuan et.al. | 2505.21462v1 | null |
2025-05-27 | Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO | Muzhi Zhu et.al. | 2505.21457v1 | null |
2025-05-27 | Leveraging Large Language Models for Bengali Math Word Problem Solving with Chain of Thought Reasoning | Bidyarthi Paul et.al. | 2505.21354v1 | null |
2025-05-27 | Breaking the Performance Ceiling in Complex Reinforcement Learning requires Inference Strategies | Felix Chalumeau et.al. | 2505.21236v1 | null |
2025-05-27 | Reason-Align-Respond: Aligning LLM Reasoning with Knowledge Graphs for KGQA | Xiangqing Shen et.al. | 2505.20971v1 | null |
2025-05-27 | Context-Aware Content Moderation for German Newspaper Comments | Felix Krejca et.al. | 2505.20963v1 | null |
2025-05-27 | In Context Learning with Vision Transformers: Case Study | Antony Zhao et.al. | 2505.20872v1 | null |
2025-05-27 | Respond to Change with Constancy: Instruction-tuning with LLM for Non-I.I.D. Network Traffic Classification | Xinjie Lin et.al. | 2505.20866v1 | null |
2025-05-27 | Cold-Start Recommendation with Knowledge-Guided Retrieval-Augmented Generation | Wooseong Yang et.al. | 2505.20773v1 | null |
2025-05-26 | ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers | Fotios Lygerakis et.al. | 2505.20032v1 | null |
2025-05-26 | Correlating instruction-tuning (in multimodal models) with vision-language processing (in the brain) | Subba Reddy Oota et.al. | 2505.20029v1 | link |
2025-05-26 | ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving | Xueyi Liu et.al. | 2505.20024v1 | link |
2025-05-26 | Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval | Rong-Cheng Tu et.al. | 2505.19952v1 | null |
2025-05-26 | Can Visual Encoder Learn to See Arrows? | Naoyuki Terashita et.al. | 2505.19944v1 | null |
2025-05-26 | Multi-Timescale Motion-Decoupled Spiking Transformer for Audio-Visual Zero-Shot Learning | Wenrui Li et.al. | 2505.19938v1 | null |
2025-05-26 | Zero-Shot Pseudo Labels Generation Using SAM and CLIP for Semi-Supervised Semantic Segmentation | Nagito Saito et.al. | 2505.19846v1 | null |
2025-05-26 | MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval | Rong-Cheng Tu et.al. | 2505.19707v1 | null |
2025-05-26 | Graph Guided Diffusion: Unified Guidance for Conditional Graph Generation | Victor M. Tenorio et.al. | 2505.19685v1 | null |
2025-05-26 | Calibrating Pre-trained Language Classifiers on LLM-generated Noisy Labels via Iterative Refinement | Liqin Ye et.al. | 2505.19675v1 | link |
2025-05-23 | FDBPL: Faster Distillation-Based Prompt Learning for Region-Aware Vision-Language Models Adaptation | Zherui Zhang et.al. | 2505.18053v1 | null |
2025-05-23 | Contrastive Distillation of Emotion Knowledge from LLMs for Zero-Shot Emotion Recognition | Minxue Niu et.al. | 2505.18040v1 | link |
2025-05-23 | Clip4Retrofit: Enabling Real-Time Image Labeling on Edge Devices via Cross-Architecture CLIP Distillation | Li Zhong et.al. | 2505.18039v1 | null |
2025-05-23 | LLM assisted web application functional requirements generation: A case study of four popular LLMs over a Mess Management System | Rashmi Gupta et.al. | 2505.18019v1 | null |
2025-05-23 | Diffusion Classifiers Understand Compositionality, but Conditions Apply | Yujin Jeong et.al. | 2505.17955v1 | link |
2025-05-23 | VeriThinker: Learning to Verify Makes Reasoning Model Efficient | Zigeng Chen et.al. | 2505.17941v1 | link |
2025-05-23 | AutoMiSeg: Automatic Medical Image Segmentation via Test-Time Adaptation of Foundation Models | Xingjian Li et.al. | 2505.17931v1 | null |
2025-05-23 | NeuroTrails: Training with Dynamic Sparse Heads as the Key to Effective Ensembling | Bram Grooten et.al. | 2505.17909v1 | null |
2025-05-23 | BLAST: Balanced Sampling Time Series Corpus for Universal Forecasting Models | Zezhi Shao et.al. | 2505.17871v1 | link |
2025-05-23 | Discriminating Form and Meaning in Multilingual Models with Minimal-Pair ABX Tasks | Maureen de Seyssel et.al. | 2505.17747v1 | null |
2025-05-22 | CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning | Jiange Yang et.al. | 2505.17006v1 | null |
2025-05-22 | Native Segmentation Vision Transformers | Guillem Brasó et.al. | 2505.16993v1 | null |
2025-05-22 | Know the Ropes: A Heuristic Strategy for LLM-based Multi-Agent System Design | Zhenkun Li et.al. | 2505.16979v1 | null |
2025-05-22 | Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval | Nandan Thakur et.al. | 2505.16967v1 | null |
2025-05-22 | UAV See, UGV Do: Aerial Imagery and Virtual Teach Enabling Zero-Shot Ground Vehicle Repeat | Desiree Fisker et.al. | 2505.16912v1 | null |
2025-05-22 | T2I-ConBench: Text-to-Image Benchmark for Continual Post-training | Zhehao Huang et.al. | 2505.16875v1 | null |
2025-05-22 | Walk&Retrieve: Simple Yet Effective Zero-shot Retrieval-Augmented Generation via Knowledge Graph Walks | Martin Böckling et.al. | 2505.16849v1 | link |
2025-05-22 | LLM-Based Emulation of the Radio Resource Control Layer: Towards AI-Native RAN Protocols | Ziming liu et.al. | 2505.16821v1 | null |
2025-05-22 | TRIM: Achieving Extreme Sparsity with Targeted Row-wise Iterative Metric-driven Pruning | Florentin Beck et.al. | 2505.16743v1 | link |
2025-05-23 | EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion | Advait Joglekar et.al. | 2505.16691v2 | null |
2025-05-21 | Exploring The Visual Feature Space for Multimodal Neural Decoding | Weihao Xia et.al. | 2505.15755v1 | null |
2025-05-21 | From Grounding to Manipulation: Case Studies of Foundation Model Integration in Embodied Robotic Systems | Xiuchao Sui et.al. | 2505.15685v1 | link |
2025-05-21 | Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization | Jiaming Zhou et.al. | 2505.15660v1 | link |
2025-05-21 | Prompt Tuning Vision Language Models with Margin Regularizer for Few-Shot Learning under Distribution Shifts | Debarshi Brahma et.al. | 2505.15506v1 | link |
2025-05-21 | On the Generalization vs Fidelity Paradox in Knowledge Distillation | Suhas Kamasetty Ramesh et.al. | 2505.15442v1 | link |
2025-05-21 | Prosody-Adaptable Audio Codecs for Zero-Shot Voice Conversion via In-Context Learning | Junchuan Zhao et.al. | 2505.15402v1 | null |
2025-05-21 | Expanding Zero-Shot Object Counting with Rich Prompts | Huilin Zhu et.al. | 2505.15398v1 | null |
2025-05-21 | RAZER: Robust Accelerated Zero-Shot 3D Open-Vocabulary Panoptic Reconstruction with Spatio-Temporal Aggregation | Naman Patel et.al. | 2505.15373v1 | null |
2025-05-21 | Towards Zero-Shot Differential Morphing Attack Detection with Multimodal Large Language Models | Ria Shekhawat et.al. | 2505.15332v1 | null |
2025-05-21 | AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving | Kangan Qian et.al. | 2505.15298v1 | null |
2025-05-20 | SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment | Wonje Jeung et.al. | 2505.14667v1 | null |
2025-05-20 | Void in Language Models | Mani Shemiranifar et.al. | 2505.14467v1 | link |
2025-05-20 | Empowering LLMs in Task-Oriented Dialogues: A Domain-Independent Multi-Agent Framework and Fine-Tuning Strategy | Zihao Feng et.al. | 2505.14299v1 | null |
2025-05-20 | FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation | Shaolin Zhu et.al. | 2505.14256v1 | null |
2025-05-20 | UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning | Sule Bai et.al. | 2505.14231v1 | null |
2025-05-20 | Beginning with You: Perceptual-Initialization Improves Vision-Language Representation and Alignment | Yang Hu et.al. | 2505.14204v1 | null |
2025-05-20 | LMP: Leveraging Motion Prior in Zero-Shot Video Generation with Diffusion Transformer | Changgu Chen et.al. | 2505.14167v1 | null |
2025-05-20 | Breaking Language Barriers or Reinforcing Bias? A Study of Gender and Racial Disparities in Multilingual Contrastive Vision Language Models | Zahraa Al Sahili et.al. | 2505.14160v1 | null |
2025-05-20 | AudSemThinker: Enhancing Audio-Language Models through Reasoning over Semantics of Sound | Gijs Wijngaard et.al. | 2505.14142v1 | link |
2025-05-20 | SeamlessEdit: Background Noise Aware Zero-Shot Speech Editing with in-Context Enhancement | Kuan-Yu Chen et.al. | 2505.14066v1 | null |
2025-05-19 | GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation | Abhay Deshpande et.al. | 2505.13441v1 | null |
2025-05-19 | FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning | Zhuozhao Hu et.al. | 2505.13419v1 | link |
2025-05-19 | From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection | Lincan Cai et.al. | 2505.13233v1 | link |
2025-05-20 | StarFT: Robust Fine-tuning of Zero-shot Models via Spuriosity Alignment | Younghyun Kim et.al. | 2505.13232v2 | link |
2025-05-19 | True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics | Christoph Jürgen Hemmer et.al. | 2505.13192v1 | null |
2025-05-19 | Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space | Zhengrui Ma et.al. | 2505.13181v1 | link |
2025-05-19 | A Case Study of Cross-Lingual Zero-Shot Generalization for Classical Languages in LLMs | V. S. D. S. Mahesh Akavarapu et.al. | 2505.13173v1 | link |
2025-05-19 | Zero-Shot Adaptation of Behavioral Foundation Models to Unseen Dynamics | Maksim Bobrin et.al. | 2505.13150v1 | link |
2025-05-20 | Zero-Shot Iterative Formalization and Planning in Partially Observable Environments | Liancheng Gong et.al. | 2505.13126v2 | link |
2025-05-19 | Francesco Innocenti et.al. | 2505.13124v1 | link | |
2025-05-16 | SurgPose: Generalisable Surgical Instrument Pose Estimation using Zero-Shot Learning and Stereo Vision | Utsav Rai et.al. | 2505.11439v1 | null |
2025-05-16 | Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner | Wenchuan Zhang et.al. | 2505.11404v1 | link |
2025-05-16 | Learning Multimodal AI Algorithms for Amplifying Limited User Input into High-dimensional Control Space | Ali Rabiee et.al. | 2505.11366v1 | link |
2025-05-16 | LegoSLM: Connecting LLM with Speech Encoder using CTC Posteriors | Rao Ma et.al. | 2505.11352v1 | null |
2025-05-16 | Context parroting: A simple but tough-to-beat baseline for foundation models in scientific machine learning | Yuanzhao Zhang et.al. | 2505.11349v1 | null |
2025-05-16 | Benchmarking Critical Questions Generation: A Challenging Reasoning Task for Large Language Models | Banca Calvo Figueras et.al. | 2505.11341v1 | null |
2025-05-19 | Massive-STEPS: Massive Semantic Trajectories for Understanding POI Check-ins -- Dataset and Benchmarks | Wilson Wongso et.al. | 2505.11239v2 | link |
2025-05-16 | Feasibility with Language Models for Open-World Compositional Zero-Shot Learning | Jae Myung Kim et.al. | 2505.11181v1 | null |
2025-05-16 | Foundation Time-Series AI Model for Realized Volatility Forecasting | Anubha Goel et.al. | 2505.11163v1 | null |
2025-05-16 | Hao Gu et.al. | 2505.11079v1 | null | |
2025-05-15 | Depth Anything with Any Prior | Zehan Wang et.al. | 2505.10565v1 | null |
2025-05-15 | NVSPolicy: Adaptive Novel-View Synthesis for Generalizable Language-Conditioned Policy Learning | Le Shi et.al. | 2505.10359v1 | null |
2025-05-15 | MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning | Yue Wang et.al. | 2505.10289v1 | link |
2025-05-15 | Comparing LLM Text Annotation Skills: A Study on Human Rights Violations in Social Media Data | Poli Apollinaire Nemkova et.al. | 2505.10260v1 | link |
2025-05-15 | MMRL++: Parameter-Efficient and Interaction-Aware Representation Learning for Vision-Language Models | Yuncheng Guo et.al. | 2505.10088v1 | link |
2025-05-15 | Advanced Crash Causation Analysis for Freeway Safety: A Large Language Model Approach to Identifying Key Contributing Factors | Ahmed S. Abdelrahman et.al. | 2505.09949v1 | null |
2025-05-14 | Achieving Tokenizer Flexibility in Language Models through Heuristic Adaptation and Supertoken Learning | Shaurya Sharthak et.al. | 2505.09738v1 | link |
2025-05-14 | Unfettered Forceful Skill Acquisition with Physical Reasoning and Coordinate Frame Labeling | William Xie et.al. | 2505.09731v1 | null |
2025-05-14 | Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing | Yingjie Ma et.al. | 2505.09484v1 | null |
2025-05-14 | Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records | Yili He et.al. | 2505.09435v1 | null |
2025-05-14 | MAKE: Multi-Aspect Knowledge-Enhanced Vision-Language Pretraining for Zero-shot Dermatological Assessment | Siyuan Yan et.al. | 2505.09372v1 | link |
2025-05-14 | Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis | Bingxin Ke et.al. | 2505.09358v1 | link |
2025-05-14 | MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning | Bin-Bin Gao et.al. | 2505.09265v1 | null |
2025-05-14 | Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping | Yinuo Wang et.al. | 2505.09252v1 | link |
2025-05-14 | Zero-shot Quantization: A Comprehensive Survey | Minjun Kim et.al. | 2505.09188v1 | null |
2025-05-14 | A Comparative Review of RNA Language Models | He Wang et.al. | 2505.09087v1 | null |
2025-05-14 | Human-like Cognitive Generalization for Large Models via Brain-in-the-loop Supervision | Jiaxuan Chen et.al. | 2505.09085v1 | null |
2025-05-13 | For GPT-4 as with Humans: Information Structure Predicts Acceptability of Long-Distance Dependencies | Nicole Cuneo et.al. | 2505.09005v1 | null |
2025-05-13 | SPAT: Sensitivity-based Multihead-attention Pruning on Time Series Forecasting Models | Suhan Guo et.al. | 2505.08768v1 | null |
2025-05-13 | NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance | Wenzhe Cai et.al. | 2505.08712v1 | null |
2025-05-13 | LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs | K M Sajjadul Islam et.al. | 2505.08704v1 | null |
2025-05-13 | Augmented Reality for RObots (ARRO): Pointing Visuomotor Policies Towards Visual Robustness | Reihaneh Mirjalili et.al. | 2505.08627v1 | null |
2025-05-13 | Boosting Zero-shot Stereo Matching using Large-scale Mixed Images Sources in the Real World | Yuran Wang et.al. | 2505.08607v1 | null |
2025-05-13 | From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation | Yifu Yuan et.al. | 2505.08548v1 | link |
2025-05-13 | LCES: Zero-shot Automated Essay Scoring via Pairwise Comparisons Using Large Language Models | Takumi Shibata et.al. | 2505.08498v1 | null |
2025-05-13 | Large Language Models Meet Stance Detection: A Survey of Tasks, Methods, Applications, Challenges and Future Directions | Lata Pangtey et.al. | 2505.08464v1 | null |
2025-05-13 | Zero-Shot Sim-to-Real Reinforcement Learning for Fruit Harvesting | Emlyn Williams et.al. | 2505.08458v1 | null |
2025-05-13 | Visual Image Reconstruction from Brain Activity via Latent Representation | Yukiyasu Kamitani et.al. | 2505.08429v1 | null |
2025-05-12 | Beyond CLIP Generalization: Against Forward&Backward Forgetting Adapter for Continual Learning of Vision-Language Models | Songlin Dong et.al. | 2505.07690v1 | null |
2025-05-12 | Multimodal Survival Modeling in the Age of Foundation Models | Steven Song et.al. | 2505.07683v1 | link |
2025-05-12 | TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining | Paul Primus et.al. | 2505.07609v1 | null |
2025-05-12 | L-SWAG: Layer-Sample Wise Activation with Gradients information for Zero-Shot NAS on Vision Transformers | Sofia Casarin et.al. | 2505.07300v1 | null |
2025-05-12 | SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models | Peichao Lai et.al. | 2505.07247v1 | link |
2025-05-11 | A Vision-Language Foundation Model for Leaf Disease Identification | Khang Nguyen Quoc et.al. | 2505.07019v1 | link |
2025-05-11 | BridgeIV: Bridging Customized Image and Video Generation through Test-Time Autoregressive Identity Propagation | Panwen Hu et.al. | 2505.06985v1 | null |
2025-05-11 | Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence | Yu Qiao et.al. | 2505.06907v1 | null |
2025-05-11 | Image Classification Using a Diffusion Model as a Pre-Training Model | Kosuke Ukita et.al. | 2505.06890v1 | null |
2025-05-10 | Learning Graph Representation of Agent Diffuser | Youcef Djenouri et.al. | 2505.06761v1 | link |
2025-05-09 | Adapting a Segmentation Foundation Model for Medical Image Classification | Pengfei Gu et.al. | 2505.06217v1 | null |
2025-05-09 | Neuro-Symbolic Concepts | Jiayuan Mao et.al. | 2505.06191v1 | null |
2025-05-09 | MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks | Wenqi Zeng et.al. | 2505.06152v1 | link |
2025-05-09 | Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study | Faeze Ghorbanpour et.al. | 2505.06149v1 | null |
2025-05-09 | ELA-ZSON: Efficient Layout-Aware Zero-Shot Object Navigation Agent with Hierarchical Planning | Jiawei Hou et.al. | 2505.06131v1 | null |
2025-05-12 | LLMs Outperform Experts on Challenging Biology Benchmarks | Lennart Justen et.al. | 2505.06108v2 | null |
2025-05-09 | 3D CAVLA: Leveraging Depth and 3D Context to Generalize Vision Language Action Models for Unseen Tasks | Vineet Bhat et.al. | 2505.05800v1 | null |
2025-05-09 | Towards Embodiment Scaling Laws in Robot Locomotion | Bo Ai et.al. | 2505.05753v1 | null |
2025-05-08 | scDrugMap: Benchmarking Large Foundation Models for Drug Response Prediction | Qing Wang et.al. | 2505.05612v1 | link |
2025-05-08 | KG-HTC: Integrating Knowledge Graphs into LLMs for Effective Zero-shot Hierarchical Text Classification | Qianbo Zang et.al. | 2505.05583v1 | link |
2025-05-08 | Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation | Chao Liao et.al. | 2505.05472v1 | null |
2025-05-08 | Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization | Sooyoung Park et.al. | 2505.05343v1 | link |
2025-05-09 | FlexSpeech: Towards Stable, Controllable and Expressive Text-to-Speech | Linhan Ma et.al. | 2505.05159v2 | null |
2025-05-08 | CacheFL: Efficient Federated Cache Model Fine-Tuning for Vision-Language Models | Mengjun Yi et.al. | 2505.05130v1 | null |
2025-05-08 | Reliably Bounding False Positives: A Zero-Shot Machine-Generated Text Detection Framework via Multiscaled Conformal Prediction | Xiaowei Zhu et.al. | 2505.05084v1 | null |
2025-05-08 | FG-CLIP: Fine-Grained Visual and Textual Alignment | Chunyu Xie et.al. | 2505.05071v1 | link |
2025-05-08 | Performance Evaluation of Large Language Models in Bangla Consumer Health Query Summarization | Ajwad Abrar et.al. | 2505.05070v1 | null |
2025-05-08 | Split Matching for Inductive Zero-shot Semantic Segmentation | Jialei Chen et.al. | 2505.05023v1 | null |
2025-05-08 | The Pitfalls of Growing Group Complexity: LLMs and Social Choice-Based Aggregation for Group Recommendations | Cedric Waterschoot et.al. | 2505.05016v1 | null |
2025-05-08 | SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models | Shun Taguchi et.al. | 2505.04911v1 | null |
2025-05-07 | Implicitly Aligning Humans and Autonomous Agents through Shared Task Abstractions | Stéphane Aroca-Ouellette et.al. | 2505.04579v1 | link |
2025-05-07 | Benchmarking LLMs' Swarm intelligence | Kai Ruan et.al. | 2505.04364v1 | link |
2025-05-07 | Neural Representational Consistency Emerges from Probabilistic Neural-Behavioral Representation Alignment | Yu Zhu et.al. | 2505.04331v1 | link |
2025-05-07 | Unmasking the Canvas: A Dynamic Benchmark for Image Generation Jailbreaking and LLM Content Safety | Variath Madhupal Gautham Nair et.al. | 2505.04146v1 | null |
2025-05-07 | Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment | Xueyao Zhang et.al. | 2505.04113v1 | null |
2025-05-06 | Can Large Language Models Predict Parallel Code Performance? | Gregory Bolet et.al. | 2505.03988v1 | null |
2025-05-06 | Frog Soup: Zero-Shot, In-Context, and Sample-Efficient Frogger Agents | Xiang Li et.al. | 2505.03947v1 | link |
2025-05-06 | Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning | François Role et.al. | 2505.03703v1 | null |
2025-05-06 | CaRaFFusion: Improving 2D Semantic Segmentation with Camera-Radar Point Cloud Fusion and Zero-Shot Image Inpainting | Huawei Sun et.al. | 2505.03679v1 | null |
2025-05-07 | Breaking Annotation Barriers: Generalized Video Quality Assessment via Ranking-based Self-Supervision | Linhan Cao et.al. | 2505.03631v2 | link |
2025-05-06 | CXR-AD: Component X-ray Image Dataset for Industrial Anomaly Detection | Haoyu Bai et.al. | 2505.03412v1 | null |
2025-05-06 | Interpretable Zero-shot Learning with Infinite Class Concepts | Zihan Ye et.al. | 2505.03361v1 | null |
2025-05-06 | From Word to Sentence: A Large-Scale Multi-Instance Dataset for Open-Set Aerial Detection | Guoting Wei et.al. | 2505.03334v1 | null |
2025-05-06 | GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data | Shengliang Deng et.al. | 2505.03233v1 | null |
2025-05-06 | Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability | Lei Wang et.al. | 2505.03097v1 | link |
2025-05-05 | Leveraging Protein Language Model Embeddings for Catalytic Turnover Prediction of Adenylate Kinase Orthologs in a Low-Data Regime | Duncan F. Muir et.al. | 2505.03066v1 | link |
2025-05-05 | Sim2Real Transfer for Vision-Based Grasp Verification | Pau Amargant et.al. | 2505.03046v1 | link |
2025-05-05 | Advancing Generalizable Tumor Segmentation with Anomaly-Aware Open-Vocabulary Attention Maps and Frozen Foundation Diffusion Models | Yankai Jiang et.al. | 2505.02753v1 | link |
2025-05-06 | Knowledge Graphs for Enhancing Large Language Models in Entity Disambiguation | Gerard Pons et.al. | 2505.02737v2 | null |
2025-05-06 | VGLD: Visually-Guided Linguistic Disambiguation for Monocular Depth Scale Recovery | Bojin Wu et.al. | 2505.02704v2 | link |
2025-05-05 | Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality | Xueguang Ma et.al. | 2505.02466v1 | link |
2025-05-05 | Recent Advances in Out-of-Distribution Detection with CLIP-Like Models: A Survey | Chaohua Li et.al. | 2505.02448v1 | null |
2025-05-05 | JTCSE: Joint Tensor-Modulus Constraints and Cross-Attention for Unsupervised Contrastive Learning of Sentence Embeddings | Tianyu Zong et.al. | 2505.02366v1 | link |
2025-05-05 | Advancing Email Spam Detection: Leveraging Zero-Shot Learning and Large Language Models | Ghazaleh SHirvani et.al. | 2505.02362v1 | link |
2025-05-05 | TeDA: Boosting Vision-Lanuage Models for Zero-Shot 3D Object Retrieval via Testing-time Distribution Alignment | Zhichuan Wang et.al. | 2505.02325v1 | link |
2025-05-05 | From Course to Skill: Evaluating LLM Performance in Curricular Analytics | Zhen Xu et.al. | 2505.02324v1 | link |
2025-05-04 | Compositional Image-Text Matching and Retrieval by Grounding Entities | Madhukar Reddy Vongala et.al. | 2505.02278v1 | null |
2025-05-02 | FalconWing: An Open-Source Platform for Ultra-Light Fixed-Wing Aircraft Research | Yan Miao et.al. | 2505.01383v1 | null |
2025-05-05 | Helping Large Language Models Protect Themselves: An Enhanced Filtering and Summarization System | Sheikh Samit Muhaimin et.al. | 2505.01315v2 | null |
2025-05-02 | CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment | Edson Araujo et.al. | 2505.01237v1 | link |
2025-05-02 | Zero-Shot Document-Level Biomedical Relation Extraction via Scenario-based Prompt Design in Two-Stage with LLM | Lei Zhao et.al. | 2505.01077v1 | null |
2025-05-02 | Multi-agents based User Values Mining for Recommendation | Lijian Chen et.al. | 2505.00981v1 | null |
2025-05-01 | ICQuant: Index Coding enables Low-bit LLM Quantization | Xinlin Li et.al. | 2505.00850v1 | null |
2025-05-01 | HMCF: A Human-in-the-loop Multi-Robot Collaboration Framework Based on Large Language Models | Zhaoxing Li et.al. | 2505.00820v1 | null |
2025-05-01 | Constructing an Optimal Behavior Basis for the Option Keyboard | Lucas N. Alegre et.al. | 2505.00787v1 | null |
2025-05-01 | Reasoning Capabilities and Invariability of Large Language Models | Alessandro Raganato et.al. | 2505.00776v1 | link |
2025-05-01 | Investigating Task Arithmetic for Zero-Shot Information Retrieval | Marco Braga et.al. | 2505.00649v1 | link |
2025-05-01 | Voice Cloning: Comprehensive Survey | Hussam Azzuni et.al. | 2505.00579v1 | null |
2025-05-01 | AI-Driven High-Resolution Cell Segmentation and Quantitative Analysis | Shuang Zhang et.al. | 2505.00578v1 | null |
2025-05-01 | DeCo: Task Decomposition and Skill Composition for Zero-Shot Generalization in Long-Horizon 3D Manipulation | Zixuan Chen et.al. | 2505.00527v1 | null |
2025-05-01 | Leveraging Pretrained Diffusion Models for Zero-Shot Part Assembly | Ruiyuan Zhang et.al. | 2505.00426v1 | null |
2025-05-01 | Perceptual Implications of Automatic Anonymization in Pathological Speech | Soroosh Tayebi Arasteh et.al. | 2505.00409v1 | null |
2025-04-30 | Investigating Zero-Shot Diagnostic Pathology in Vision-Language Models with Efficient Prompt Design | Vasudev Sharma et.al. | 2505.00134v1 | null |
2025-04-30 | Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space | Leonhard Sommer et.al. | 2504.21749v1 | link |
2025-04-30 | Precision Where It Matters: A Novel Spike Aware Mixed-Precision Quantization Strategy for LLaMA-based Language Models | Lucas Maisonnave et.al. | 2504.21553v1 | null |
2025-04-30 | Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation Learning | Sangyeon Cho et.al. | 2504.21375v1 | null |
2025-04-30 | Zero-Shot Super-Resolution from Unstructured Data Using a Transformer-Based Neural Operator for Urban Micrometeorology | Yuki Yasuda et.al. | 2504.21361v1 | link |
2025-04-30 | An Evaluation of a Visual Question Answering Strategy for Zero-shot Facial Expression Recognition in Still Images | Modesto Castrillón-Santana et.al. | 2504.21309v1 | null |
2025-04-29 | Graph Synthetic Out-of-Distribution Exposure with Large Language Models | Haoyan Xu et.al. | 2504.21198v1 | null |
2025-04-29 | Small or Large? Zero-Shot or Finetuned? Guiding Language Model Choice for Specialized Applications in Healthcare | Lovedeep Gondara et.al. | 2504.21191v1 | null |
2025-04-29 | GLIP-OOD: Zero-Shot Graph OOD Detection with Foundation Model | Haoyan Xu et.al. | 2504.21186v1 | null |
2025-04-29 | Efficient LLMs with AMP: Attention Heads and MLP Pruning | Leandro Giusti Mugnaini et.al. | 2504.21174v1 | null |
2025-04-30 | Trace-of-Thought Prompting: Investigating Prompt-Based Knowledge Distillation Through Question Decomposition | Tyler McDonald et.al. | 2504.20946v2 | null |
2025-04-29 | An Empirical Study on the Capability of LLMs in Decomposing Bug Reports | Zhiyuan Chen et.al. | 2504.20911v1 | null |
2025-04-29 | JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry | Anum Afzal et.al. | 2504.20849v1 | null |
2025-04-29 | Using LLMs in Generating Design Rationale for Software Architecture Decisions | Xiyu Zhou et.al. | 2504.20781v1 | link |
2025-04-29 | In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer | Zechuan Zhang et.al. | 2504.20690v1 | null |
2025-04-29 | Revisiting the MIMIC-IV Benchmark: Experiments Using Language Models for Electronic Health Records | Jesus Lovon et.al. | 2504.20547v1 | null |
2025-04-29 | MuRAL: A Multi-Resident Ambient Sensor Dataset Annotated with Natural Language for Activities of Daily Living | Xi Chen et.al. | 2504.20505v1 | null |
2025-04-29 | Fane at SemEval-2025 Task 10: Zero-Shot Entity Framing with Large Language Models | Enfa Fane et.al. | 2504.20469v1 | link |
2025-04-29 | Plant Disease Detection through Multimodal Large Language Models and Convolutional Neural Networks | Konstantinos I. Roumeliotis et.al. | 2504.20419v1 | null |
2025-04-29 | FourierSpecNet: Neural Collision Operator Approximation Inspired by the Fourier Spectral Method for Solving the Boltzmann Equation | Jae Yong Lee et.al. | 2504.20408v1 | null |
2025-04-28 | AutoJudge: Judge Decoding Without Manual Annotation | Roman Garipov et.al. | 2504.20039v1 | null |
2025-04-28 | DeeCLIP: A Robust and Generalizable Transformer-Based Framework for Detecting AI-Generated Images | Mamadou Keita et.al. | 2504.19876v1 | link |
2025-04-28 | NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks | Chia-Yu Hung et.al. | 2504.19854v1 | null |
2025-04-28 | Foundation Model-Driven Framework for Human-Object Interaction Prediction with Segmentation Mask Integration | Juhan Park et.al. | 2504.19847v1 | null |
2025-04-28 | EcoWikiRS: Learning Ecological Representation of Satellite Images from Weak Supervision with Species Observations and Wikipedia | Valerie Zermatten et.al. | 2504.19742v1 | null |
2025-04-28 | Interactive Discovery and Exploration of Visual Bias in Generative Text-to-Image Models | Johannes Eschner et.al. | 2504.19703v1 | null |
2025-04-28 | SynergyAmodal: Deocclude Anything with Text Control | Xinyang Li et.al. | 2504.19506v1 | null |
2025-04-28 | Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding | Yan Wang et.al. | 2504.19500v1 | null |
2025-04-28 | EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation | Zhe Dong et.al. | 2504.19432v1 | null |
2025-04-27 | From Inductive to Deductive: LLMs-Based Qualitative Data Analysis in Requirements Engineering | Syed Tauhid Ullah Shah et.al. | 2504.19384v1 | link |
2025-04-25 | RSFR: A Coarse-to-Fine Reconstruction Framework for Diffusion Tensor Cardiac MRI with Semantic-Aware Refinement | Jiahao Huang et.al. | 2504.18520v1 | null |
2025-04-25 | Action-Minimization Meets Generative Modeling: Efficient Transition Path Sampling with the Onsager-Machlup Functional | Sanjeev Raja et.al. | 2504.18506v1 | null |
2025-04-25 | Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization | Kesen Zhao et.al. | 2504.18397v1 | link |
2025-04-25 | Leveraging Decoder Architectures for Learned Sparse Retrieval | Jingfen Qiao et.al. | 2504.18151v1 | null |
2025-04-25 | PropRAG: Guiding Retrieval with Beam Search over Proposition Paths | Jingjin Wang et.al. | 2504.18070v1 | null |
2025-04-25 | From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval | Yabing Wang et.al. | 2504.17990v1 | null |
2025-04-24 | Optimism, Expectation, or Sarcasm? Multi-Class Hope Speech Detection in Spanish and English | Sabur Butt et.al. | 2504.17974v1 | null |
2025-04-24 | The Fourth Monocular Depth Estimation Challenge | Anton Obukhov et.al. | 2504.17787v1 | null |
2025-04-24 | Beyond Labels: Zero-Shot Diabetic Foot Ulcer Wound Segmentation with Self-attention Diffusion Models and the Potential for Text-Guided Customization | Abderrachid Hamrani et.al. | 2504.17628v1 | null |
2025-04-24 | StereoMamba: Real-time and Robust Intraoperative Stereo Disparity Estimation via Long-range Spatial Dependencies | Xu Wang et.al. | 2504.17401v1 | null |
2025-04-24 | Physics-based super-resolved simulation of 3D elastic wave propagation adopting scalable Diffusion Transformer | Hugo Gabrielidis et.al. | 2504.17308v1 | null |
2025-04-24 | Demonstrating Berkeley Humanoid Lite: An Open-source, Accessible, and Customizable 3D-printed Humanoid Robot | Yufeng Chi et.al. | 2504.17249v1 | null |
2025-04-24 | Visual and textual prompts for enhancing emotion recognition in video | Zhifeng Wang et.al. | 2504.17224v1 | null |
2025-04-23 | Tokenization Matters: Improving Zero-Shot NER for Indic Languages | Priyaranjan Pattnayak et.al. | 2504.16977v1 | null |
2025-04-23 | Procedural Dataset Generation for Zero-Shot Stereo Matching | David Yan et.al. | 2504.16930v1 | null |
2025-04-23 | Zero-shot Sim-to-Real Transfer for Reinforcement Learning-based Visual Servoing of Soft Continuum Arms | Hsin-Jung Yang et.al. | 2504.16916v1 | null |
2025-04-23 | Exploring zero-shot structure-based protein fitness prediction | Arnav Sharma et.al. | 2504.16886v1 | null |
2025-04-23 | Improving Significant Wave Height Prediction Using Chronos Models | Yilin Zhai et.al. | 2504.16834v1 | null |
2025-04-23 | Decoupled Global-Local Alignment for Improving Compositional Understanding | Xiaoxing Hu et.al. | 2504.16801v1 | null |
2025-04-23 | FrogDogNet: Fourier frequency Retained visual prompt Output Guidance for Domain Generalization of CLIP in Remote Sensing | Hariseetharam Gunduboina et.al. | 2504.16433v1 | null |
2025-04-24 | Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark | Hanlei Zhang et.al. | 2504.16427v2 | link |
2025-04-23 | Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation | Jiahao Yuan et.al. | 2504.16408v1 | link |
2025-04-22 | CLIRudit: Cross-Lingual Information Retrieval of Scientific Documents | Francisco Valentini et.al. | 2504.16264v1 | null |
2025-04-22 | W-PCA Based Gradient-Free Proxy for Efficient Search of Lightweight Language Models | Shang Wang et.al. | 2504.15983v1 | link |
2025-04-22 | FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation | Zebin Yao et.al. | 2504.15958v1 | link |
2025-04-23 | Language Models to Support Multi-Label Classification of Industrial Data | Waleed Abdeen et.al. | 2504.15922v2 | null |
2025-04-22 | Structure-Preserving Zero-Shot Image Editing via Stage-Wise Latent Injection in Diffusion Models | Dasol Jeong et.al. | 2504.15723v1 | null |
2025-04-22 | ZeroSlide: Is Zero-Shot Classification Adequate for Lifelong Learning in Whole-Slide Image Analysis in the Era of Pathology Vision-Language Foundation Models? | Doanh C. Bui et.al. | 2504.15627v1 | null |
2025-04-22 | Research on Navigation Methods Based on LLMs | Anlong Zhang et.al. | 2504.15600v1 | null |
2025-04-22 | LLM-based Semantic Augmentation for Harmful Content Detection | Elyas Meguellati et.al. | 2504.15548v1 | null |
2025-04-21 | From Reviews to Dialogues: Active Synthesis for Zero-Shot LLM-based Conversational Recommender System | Rohan Surana et.al. | 2504.15476v1 | null |
2025-04-21 | Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated Images | Jonathan Brokman et.al. | 2504.15470v1 | link |
2025-04-21 | Tell Me What You Know About Sexism: Expert-LLM Interaction Strategies and Co-Created Definitions for Zero-Shot Sexism Detection | Myrthe Reuver et.al. | 2504.15392v1 | link |
2025-04-21 | Leveraging Language Models for Automated Patient Record Linkage | Mohammad Beheshti et.al. | 2504.15261v1 | null |
2025-04-21 | Zero-Shot, But at What Cost? Unveiling the Hidden Overhead of MILS's LLM-CLIP Framework for Image Captioning | Yassir Benhammou et.al. | 2504.15199v1 | null |
2025-04-21 | Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL | Simone Papicchio et.al. | 2504.15077v1 | null |
2025-04-22 | Text-to-Decision Agent: Learning Generalist Policies from Natural Language Supervision | Shilin Zhang et.al. | 2504.15046v2 | null |
2025-04-21 | GenCLIP: Generalizing CLIP Prompts for Zero-shot Anomaly Detection | Donghyeong Kim et.al. | 2504.14919v1 | null |
2025-04-21 | Aligning Beam with Imbalanced Multi-modality: A Generative Federated Learning Approach | Jiahui Liang et.al. | 2504.14835v1 | null |
2025-04-20 | Med-2D SegNet: A Light Weight Deep Neural Network for Medical 2D Image Segmentation | Md. Sanaullah Chowdhury et.al. | [2504.14715v1](http://arxiv.org/abs/2504.147 |