A repository for recent 3D Assets.
🫨 Working hard on collecting ...
A incomplete collection of 3D tasks can be found here.
time | paper | Sources | Data Scale | Modality | Task |
---|---|---|---|---|---|
CVPR 2024 | ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding | Objaverse, ShapeNet | (800K real-world 3D shape),(52.5K 3D shapes with 55 annotated categories) | 3D point clouds, images, and language | zero-shot 3D classification, standard 3D classification with fine-tuning, and 3D captioning (3D-to- language generation) |
CVPR 2024 Workshop | 3DCoMPaT Challenge | 3DCoMPaT dataset++ | 3D objects, 3D renderings | recognize and ground compositions of materials on parts of 3D objects | |
CVPR 2024 | LASO: Language-guided Affordance Segmentation on 3D Object | 3D-AffordanceNet | 19,751 point-question pairs, covering 8434 object shapes and 870 expert-crafted questions; | Point Cloud, Text | Language-guided Affordance Segmentation on 3D Object |
NeurlPS 2023 | OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding | ShapeNetCore, 3D-FUTURE,ABO, Objaverse | 876 K | Text-image-3D point cloud | point cloud captioning, point-cloud conditioned image generation |
NeurlPS 2023 | Real3D-AD: A Dataset of Point Cloud Anomaly Detection | Raw | 1,254 high-resolution 3D items (from forty thousand to millions of points for each item) | Point-cloud | 3D industrial anomaly detection |
CVPR2023 | RealImpact: A Dataset of Impact Sound Fields for Real Objects | Raw | 150,000 recordings of impact sounds of 50 everyday objects,5 distinct impact positions | impact locations, microphone locations, contact force profiles, material labels, and RGBD images | listener location classification and visual acoustic matching |
CVPR 2023 | OMNI3D: A Large Benchmark and Model for 3D Object Detection in the Wild | New annotation (SUN RBG-D, ARKitScenes, Hypersim, Objectron, KITTI and nuScenes) | 234k images annotated with more than 3 million instances and 98 3D boxes categories | single-image, 3D cuboids | 3D object detections |
CVPR 2023 | OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation | Raw | 6,000 scanned objects (190 daily categories) | Mesh, Point-cloud, multi-view, videos | 3D perception, novel-view synthesis, neural surface reconstruction, 3D object generation |
CVPR 2022 | Self-supervised Neural Articulated Shape and Appearance Models | No dataset contribution | - | image, 3D shape | few-shot reconstruction, the generation of novel articulations, and novel view-synthesis |
CVPR 2022 | ABO: Dataset and Benchmarks for Real-World 3D Object Understanding | Raw | 7, 953 3D Mesh; 8,222 multi-view images) | Mesh, Multi-view; attribute | single-view 3D reconstruction, material estimation, and cross-domain multi-view object retrieval. |
NeurlPS 2022 Dataset and Benchmark | MBW: Multi-view Bootstrapping in the Wild | Raw Dataset | Multi-view (2~4 cameras) tigers, fish, colobus monkeys, gorillas, chimpanzees, and flamingos from a zoo dataset, each with 2 synchronized videos | Multi-view, 2D landmark of articulated objects | Labeling articulated objects |
CVPR 2021 | 3D AffordanceNet: A Benchmark for Visual Object Affordance Understanding | PartNet | 23 K with 18 affordance classes | 3D point cloud with affordance annotations | Affordance Reasoning |
ICCV 2021 | Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction | Raw (Annotated on MS-COCO) | 1.5 M multi-view (19K objects) <=>(annotation)cameras and 3D point clouds | Multi-view; Point-cloud | new-view-synthesis and category-centric 3D reconstruction |
ECCV 2020 | ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes | ScanNet | two datasets (synthetic dataset of referential utterances (Sr3D) and natural referential utterances (Nr3D)): 5878 (Scene target, distractors) tuples | scene, target objects, relationships between target objects and surrounding object (anchor) | lanuage-assisted 3D point clouds |
ECCV 2016 | ObjectNet3D: A Large Scale Database for 3D Object Recognition | Raw | 100 categories, 90,127 images, 201,888 objects, 44,147 3D shapes | Images, 3D shape (not mesh or pc) | 3D pose / 3D shape recognition |
CVPR 2015 | 3D ShapeNets: A Deep Representation for Volumetric Shapes | Raw (download 3D CAD models (3D Warehouse, Yobi3D)) | 3D CAD, categories | 151,128 3D CAD models <=> 660 unique object categories | object recognition and shape completion |
time | paper | Sources | Data Scale | Modality | Task |
---|---|---|---|---|---|
NeurlPS 2023 | A Dataset of Relighted 3D Interacting Hands | Raw | 1.5M Images, 10 subobject | image; MANO & Mask | track two-hand 3D poses |
time | paper | Sources | Data Scale | Modality | Task |
---|---|---|---|---|---|
WACV 2024 | IKEA Ego 3D Dataset Understanding furniture assembly actions from ego-view 3D Point Clouds | Raw | 493k frames | Point cloud, ego RGBD | action recognition |
T-Ro 2024 | Towards robust robot 3d perception in urban environments: The ut campus object dataset | Raw | 58min (1.3M 3D bounding box); 53 categories | Point-cloud, RGB-D, 9 DoF inertial measurements. | 3D object detection |
time | paper | Sources | Data Scale | Modality | Task |
---|---|---|---|---|---|
CVPR 2024 | Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships | 3DSSG | 1482 scene graphs with 48k object nodes and 544k edges; 93 different attributes on 21k object instances; relationship and affordance w/o exact number | RGB-D | 3D object classification and inter-object relationships prediction |
CVPR 2024 | Multi-Attribute Interactions Matter for 3D Visual Grounding | ScanRefer, ReferIt3D(Sr3D/Nr3D) | (51K descriptions; 11K objects; 800 ScanNet scenes),(41K human-annotated descriptions/83K simple machine-generated descriptions; 707 scenes with object mask) | RGB-D, language | 3D visual grounding |
CVPR 2024 | LASA: Instance Reconstruction from Real Scans using A Large-scale Aligned Shape Annotation Dataset | ArKitScenes | 10,412 CAD aligned with 920 scenes across 17 categories scanned from ArKitScene | Point cloud, Multi-view | indoor instance-level scene reconstruction |
CVPR 2024 | DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision | Raw | 10K videos, 51M frames, with POI annotation | Multi-view RGB | novel view synthesis |
ICLR 2023 | SQA3D: Situated Question Answering in 3D Scenes | ScanNet (New Situation Question Answer) | 6.8K Situation <=> 20.4K description <=> 33.4K Reasoning Answer | 3D scan, egocentric video, bird-eye view <=> situation <=> question | 3D Situation Question Answer |
ICCV 2023 | ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes | Raw | 460 scenes, 280000 DSLR images, 3.7M iPhone RGBD | Point cloud, Mesh, RGBD | novel view synthesis and 3D semantic scene understanding |
ICCV 2023 | Multi3DRefer: Grounding Text Description to Multiple 3D Objects | ScanRefer | 61926 descriptions of 11609 objects | 3D Visual Grounding | |
CVPR 2022 | ScanQA: 3D Question Answering for Spatial Scene Understanding | ScanNet (New Question-Answer Pairs) | 41 K question-answer pairs (800 indoors scenes) | RGB-D | 3D Question-Answer (spatial understanding) |
ECCV 2022 | Language-Grounded Indoor 3D Semantic Segmentation in the WildScanNet200 | ScanNet | 200 categories in ScanNet | - | 3D instance segmentation; |
CVPR 2021 | Scan2Cap: Context-aware Dense Captioning in RGB-D Scans | ScanRefer (New Task) | 51,583 descriptions for 11,046 objects in 800 ScanNet scenes | RGB-D <=> bounding box <=> descriptions | dense scan in 3D scenes |
ECCV 2020 | ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language | ScanNet (New Task) | 51,583 descriptions <=> 11,046 objects | RGB-D 3D Scens; textual; | Object location with text descriptions |
ICCV 2019 | RIO: 3D Object Instance Re-Localization in Changing Indoor Environments | Raw Data | 1482 RGB-D Scans of 478 environmental | RGB-D <=> object instance <=> 6 DoF mapping among scenes | 3D object instance re-localization (RIO) |
ArXiv 2019 | The Replica Dataset: A Digital Replica of Indoor Spaces | Raw | 18 highly photo-realistic 3D indoor scenes | dense mesh; high-resolution high-dynamic range textures <=> semantic class and instance information <=> planar mirror, glass reflectors | 3D instance segmentation; navigation; question-answering; instruction following |
CVPR 2017 | ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes | Raw Data | 2.5M in Multi-views in 1513 scene annotations | RGB-D <=> 3D camera pose <=> surface reconstruction <=> semantic segmentation | 3D object classification, semantic voxel labeling, and CAD model retrieval |
time | paper | Sources | Data Scale | Modality | Task |
---|---|---|---|---|---|
CVPR 2024 | Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering | ||||
CVPR 2023 | GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts | Raw: GAPartNet | 8489part instances on 1166 objects | Point-cloud | part segmentation, part pose estimation, and part-based object manipulation |
NeurIPS 2023 | 3D-Aware Visual Question Answering about Parts, Poses and Occlusions | Super-CLEVR-3D | 5 categories with sub-types and attributes (12; color, material, size). | RGB-D | part questions, 3D pose questions, and occlusion questions |
ArXiv 2212 | GeoCode: Interpretable Shape Programs | - | train: 9,570 chairs, 9,330 vases, and 6,270 tables; validation and test: 957 chairs, 933 vases, and 627 tables | Mesh, Point-Cloud, sketch | 3D geometry edit |
NeurlPS 2022 Datasets and Benchmarks | Breaking Bad: A Dataset for Geometric Fracture and Reassembly | Thingi10K, PartNet | 10,474 shapes, 1,047,400 breakdown patterns | Point cloud | geometry measurements; shape assembly |
CVPR 2022 | Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction | Raw Data: poorly-designed 3D physical objects (point videos of 3D objects) with choices to fix them | 5K | Point cloud | fixing 3D object shapes based on functionality |
ACCV 2022 | The Eyecandies Dataset for Unsupervised Multimodal Anomaly Detection and Localization |
time | paper | Sources | Data Scale | Modality | Task |
---|---|---|---|---|---|
CoRL 2022 | Leveraging Language for Accelerated Learning of Tool Manipulation | - | 36 objects | images | tool utilize |
time | paper | Sources | Data Scale | Modality | Task |
---|---|---|---|---|---|
2024 | 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination | Raw | 40,087 household scenes paired with 6.2 million densely-grounded scene-language instructions | Point-cloud, text | 3D semantic segmentation |
time | paper | Sources | Data Scale | Modality | Task |
---|---|---|---|---|---|
NeurlPS 2022 | PeRFception: Perception using Radiance Fields | CO3D, ScanNet | Co3D(18669 annotated videos with a total 1.5 million of camera-annotated frames), ScanNet(1.5 K indoor scenes with commercial RGB-D sensors) | Multi-view, reconstructed Point-cloud | 2D image classification, 3D object classification, 3D semantic segmentation |
time | paper | Sources | Data Scale | Modality | Task |
---|---|---|---|---|---|
2023.8 | HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions | videos | 308k annotated image frames from 2.2k videos of 212 real-world objects in 17 categories 3D reconstruction | reconstructed mesh | pose estimation and affordance prediction |
time | paper | Sources | Data Scale | Modality | Task |
---|---|---|---|---|---|
CVPR 2024 | SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes | ||||
2024.1 | SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding | ScanNet , ARKitScenes, HM3D, 3RScan, MultiScan, Structured3D, ProcTHOR | 68K scenes and 2.5M scene-language pairs | Point cloud, scan | 3D QA |
ECCV 2022 | OPD: Single-view 3D Openable Part Detection |
! Pending
Type \ Modality | Mesh | Point-Cloud | Multi-view | Scene-Graph |
---|---|---|---|---|
Real-Object | ||||
Real-Scene | ||||
Synthetic-Object | ||||
Synthetic-Scene |
- (ICLR 2024) SyncDreamer: Generating Multiview-consistent Images from a Single-view Image [Paper]
- (ECCV 2024) DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation [Paper] [Project], coarse-to-fine scheme
- (2022.12) Point·E: A System for Generating 3D Point Clouds from Complex Prompts [Paper]
- (2022.10) CommonSim-1: Generating 3D Worlds [Project], text-to-3D dynamic environment.
- (CVPR 2024) Wonder3D: Single Image to 3D using Cross-Domain Diffusion [Paper]
- (2024.9) MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis [Paper], leveraging the power of MLLM.
- (CVPR 2024) Splatter Image: Ultra-Fast Single-View 3D Reconstruction [Paper]
- (ACMM 2024) Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models [Paper]
- (RSS 2024 Workshop) Single-View 3D Reconstruction via SO(2)-Equivariant Gaussian Sculpting Networks [Paper]
- (NeurlPS 2023) One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization [Paper]
- (CVPR 2023) PC^2 Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction [Paper]
- (2023.4) Anything-3D: Towards Single-view Anything Reconstruction in the Wild [Paper]
- (ICCV 2023) Zero-1-to-3: Zero-shot One Image to 3D Object [Paper]
- (CVPR 2024) pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction [Paper]
- (ICCV 2023) NeuS2: Fast Learning of Neural Implicit Surfaces for Multi-view Reconstruction [Paper]
- (CVPR 2024) Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models [Paper]
- (CVPR 2023) Self-positioning Point-based Transformer for Point Cloud Understanding [Paper]
- (CVPR 2017) PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation [Paper] [Code]
- (Proc. SGP 2023) Cross-Shape Attention for Part Segmentation of 3D Point Clouds [Paper] [Code]
- (ICLR 2024 Spotlight) Uni3D: Exploring Unified 3D Representation at Scale [Paper]
- (2024.9) GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction [Paper] [Project]
- (ICCV 2021) Vision Transformers for Dense Prediction [Paper]