🔥Awesome-3D-Assets

A repository for recent 3D Assets.

🫨 Working hard on collecting ...

A incomplete collection of 3D tasks can be found here.

3D Datasets

Real Data

Object

time	paper	Sources	Data Scale	Modality	Task
CVPR 2024	ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding	Objaverse, ShapeNet	(800K real-world 3D shape),(52.5K 3D shapes with 55 annotated categories)	3D point clouds, images, and language	zero-shot 3D classification, standard 3D classification with fine-tuning, and 3D captioning (3D-to- language generation)
CVPR 2024 Workshop	3DCoMPaT Challenge	3DCoMPaT dataset++		3D objects, 3D renderings	recognize and ground compositions of materials on parts of 3D objects
CVPR 2024	LASO: Language-guided Affordance Segmentation on 3D Object	3D-AffordanceNet	19,751 point-question pairs, covering 8434 object shapes and 870 expert-crafted questions;	Point Cloud, Text	Language-guided Affordance Segmentation on 3D Object
NeurlPS 2023	OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding	ShapeNetCore, 3D-FUTURE,ABO, Objaverse	876 K	Text-image-3D point cloud	point cloud captioning, point-cloud conditioned image generation
NeurlPS 2023	Real3D-AD: A Dataset of Point Cloud Anomaly Detection	Raw	1,254 high-resolution 3D items (from forty thousand to millions of points for each item)	Point-cloud	3D industrial anomaly detection
CVPR2023	RealImpact: A Dataset of Impact Sound Fields for Real Objects	Raw	150,000 recordings of impact sounds of 50 everyday objects,5 distinct impact positions	impact locations, microphone locations, contact force profiles, material labels, and RGBD images	listener location classification and visual acoustic matching
CVPR 2023	OMNI3D: A Large Benchmark and Model for 3D Object Detection in the Wild	New annotation (SUN RBG-D, ARKitScenes, Hypersim, Objectron, KITTI and nuScenes)	234k images annotated with more than 3 million instances and 98 3D boxes categories	single-image, 3D cuboids	3D object detections
CVPR 2023	OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation	Raw	6,000 scanned objects (190 daily categories)	Mesh, Point-cloud, multi-view, videos	3D perception, novel-view synthesis, neural surface reconstruction, 3D object generation
CVPR 2022	Self-supervised Neural Articulated Shape and Appearance Models	No dataset contribution	-	image, 3D shape	few-shot reconstruction, the generation of novel articulations, and novel view-synthesis
CVPR 2022	ABO: Dataset and Benchmarks for Real-World 3D Object Understanding	Raw	7, 953 3D Mesh; 8,222 multi-view images)	Mesh, Multi-view; attribute	single-view 3D reconstruction, material estimation, and cross-domain multi-view object retrieval.
NeurlPS 2022 Dataset and Benchmark	MBW: Multi-view Bootstrapping in the Wild	Raw Dataset	Multi-view (2~4 cameras) tigers, fish, colobus monkeys, gorillas, chimpanzees, and flamingos from a zoo dataset, each with 2 synchronized videos	Multi-view, 2D landmark of articulated objects	Labeling articulated objects
CVPR 2021	3D AffordanceNet: A Benchmark for Visual Object Affordance Understanding	PartNet	23 K with 18 affordance classes	3D point cloud with affordance annotations	Affordance Reasoning
ICCV 2021	Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction	Raw (Annotated on MS-COCO)	1.5 M multi-view (19K objects) <=>(annotation)cameras and 3D point clouds	Multi-view; Point-cloud	new-view-synthesis and category-centric 3D reconstruction
ECCV 2020	ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes	ScanNet	two datasets (synthetic dataset of referential utterances (Sr3D) and natural referential utterances (Nr3D)): 5878 (Scene target, distractors) tuples	scene, target objects, relationships between target objects and surrounding object (anchor)	lanuage-assisted 3D point clouds
ECCV 2016	ObjectNet3D: A Large Scale Database for 3D Object Recognition	Raw	100 categories, 90,127 images, 201,888 objects, 44,147 3D shapes	Images, 3D shape (not mesh or pc)	3D pose / 3D shape recognition
CVPR 2015	3D ShapeNets: A Deep Representation for Volumetric Shapes	Raw (download 3D CAD models (3D Warehouse, Yobi3D))	3D CAD, categories	151,128 3D CAD models <=> 660 unique object categories	object recognition and shape completion

Hands

time	paper	Sources	Data Scale	Modality	Task
NeurlPS 2023	A Dataset of Relighted 3D Interacting Hands	Raw	1.5M Images, 10 subobject	image; MANO & Mask	track two-hand 3D poses

Ego

time	paper	Sources	Data Scale	Modality	Task
WACV 2024	IKEA Ego 3D Dataset Understanding furniture assembly actions from ego-view 3D Point Clouds	Raw	493k frames	Point cloud, ego RGBD	action recognition
T-Ro 2024	Towards robust robot 3d perception in urban environments: The ut campus object dataset	Raw	58min (1.3M 3D bounding box); 53 categories	Point-cloud, RGB-D, 9 DoF inertial measurements.	3D object detection

Scene

time	paper	Sources	Data Scale	Modality	Task
CVPR 2024	Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships	3DSSG	1482 scene graphs with 48k object nodes and 544k edges; 93 different attributes on 21k object instances; relationship and affordance w/o exact number	RGB-D	3D object classification and inter-object relationships prediction
CVPR 2024	Multi-Attribute Interactions Matter for 3D Visual Grounding	ScanRefer, ReferIt3D(Sr3D/Nr3D)	(51K descriptions; 11K objects; 800 ScanNet scenes),(41K human-annotated descriptions/83K simple machine-generated descriptions; 707 scenes with object mask)	RGB-D, language	3D visual grounding
CVPR 2024	LASA: Instance Reconstruction from Real Scans using A Large-scale Aligned Shape Annotation Dataset	ArKitScenes	10,412 CAD aligned with 920 scenes across 17 categories scanned from ArKitScene	Point cloud, Multi-view	indoor instance-level scene reconstruction
CVPR 2024	DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision	Raw	10K videos, 51M frames, with POI annotation	Multi-view RGB	novel view synthesis
ICLR 2023	SQA3D: Situated Question Answering in 3D Scenes	ScanNet (New Situation Question Answer)	6.8K Situation <=> 20.4K description <=> 33.4K Reasoning Answer	3D scan, egocentric video, bird-eye view <=> situation <=> question	3D Situation Question Answer
ICCV 2023	ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes	Raw	460 scenes, 280000 DSLR images, 3.7M iPhone RGBD	Point cloud, Mesh, RGBD	novel view synthesis and 3D semantic scene understanding
ICCV 2023	Multi3DRefer: Grounding Text Description to Multiple 3D Objects	ScanRefer	61926 descriptions of 11609 objects		3D Visual Grounding
CVPR 2022	ScanQA: 3D Question Answering for Spatial Scene Understanding	ScanNet (New Question-Answer Pairs)	41 K question-answer pairs (800 indoors scenes)	RGB-D	3D Question-Answer (spatial understanding)
ECCV 2022	Language-Grounded Indoor 3D Semantic Segmentation in the WildScanNet200	ScanNet	200 categories in ScanNet	-	3D instance segmentation;
CVPR 2021	Scan2Cap: Context-aware Dense Captioning in RGB-D Scans	ScanRefer (New Task)	51,583 descriptions for 11,046 objects in 800 ScanNet scenes	RGB-D <=> bounding box <=> descriptions	dense scan in 3D scenes
ECCV 2020	ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language	ScanNet (New Task)	51,583 descriptions <=> 11,046 objects	RGB-D 3D Scens; textual;	Object location with text descriptions
ICCV 2019	RIO: 3D Object Instance Re-Localization in Changing Indoor Environments	Raw Data	1482 RGB-D Scans of 478 environmental	RGB-D <=> object instance <=> 6 DoF mapping among scenes	3D object instance re-localization (RIO)
ArXiv 2019	The Replica Dataset: A Digital Replica of Indoor Spaces	Raw	18 highly photo-realistic 3D indoor scenes	dense mesh; high-resolution high-dynamic range textures <=> semantic class and instance information <=> planar mirror, glass reflectors	3D instance segmentation; navigation; question-answering; instruction following
CVPR 2017	ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes	Raw Data	2.5M in Multi-views in 1513 scene annotations	RGB-D <=> 3D camera pose <=> surface reconstruction <=> semantic segmentation	3D object classification, semantic voxel labeling, and CAD model retrieval

Synthetic

Object

time	paper	Sources	Data Scale	Modality	Task
CVPR 2024	Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering
CVPR 2023	GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts	Raw: GAPartNet	8489part instances on 1166 objects	Point-cloud	part segmentation, part pose estimation, and part-based object manipulation
NeurIPS 2023	3D-Aware Visual Question Answering about Parts, Poses and Occlusions	Super-CLEVR-3D	5 categories with sub-types and attributes (12; color, material, size).	RGB-D	part questions, 3D pose questions, and occlusion questions
ArXiv 2212	GeoCode: Interpretable Shape Programs	-	train: 9,570 chairs, 9,330 vases, and 6,270 tables; validation and test: 957 chairs, 933 vases, and 627 tables	Mesh, Point-Cloud, sketch	3D geometry edit
NeurlPS 2022 Datasets and Benchmarks	Breaking Bad: A Dataset for Geometric Fracture and Reassembly	Thingi10K, PartNet	10,474 shapes, 1,047,400 breakdown patterns	Point cloud	geometry measurements; shape assembly
CVPR 2022	Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction	Raw Data: poorly-designed 3D physical objects (point videos of 3D objects) with choices to fix them	5K	Point cloud	fixing 3D object shapes based on functionality
ACCV 2022	The Eyecandies Dataset for Unsupervised Multimodal Anomaly Detection and Localization

Manipulation

time	paper	Sources	Data Scale	Modality	Task
CoRL 2022	Leveraging Language for Accelerated Learning of Tool Manipulation	-	36 objects	images	tool utilize

Scene

time	paper	Sources	Data Scale	Modality	Task
2024	3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination	Raw	40,087 household scenes paired with 6.2 million densely-grounded scene-language instructions	Point-cloud, text	3D semantic segmentation

Object and Scene

time	paper	Sources	Data Scale	Modality	Task
NeurlPS 2022	PeRFception: Perception using Radiance Fields	CO3D, ScanNet	Co3D(18669 annotated videos with a total 1.5 million of camera-annotated frames), ScanNet(1.5 K indoor scenes with commercial RGB-D sensors)	Multi-view, reconstructed Point-cloud	2D image classification, 3D object classification, 3D semantic segmentation

Real and Synthetic

Object

time	paper	Sources	Data Scale	Modality	Task
2023.8	HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions	videos	308k annotated image frames from 2.2k videos of 212 real-world objects in 17 categories 3D reconstruction	reconstructed mesh	pose estimation and affordance prediction

Scene

time	paper	Sources	Data Scale	Modality	Task
CVPR 2024	SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes
2024.1	SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding	ScanNet , ARKitScenes, HM3D, 3RScan, MultiScan, Structured3D, ProcTHOR	68K scenes and 2.5M scene-language pairs	Point cloud, scan	3D QA
ECCV 2022	OPD: Single-view 3D Openable Part Detection

Scene Datasets

time	paper	size	modality	type	source
2025	TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes	850 scenes	pts, Multi-view RGB	outdoor	nuScenes
2024	SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding	68K indoor scenes	pts	indoor	ScanNet, ARKitScenes, HM3D, 3RScan, MultiScan, RIO
2024	EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI	5k ego-centric RGB-D scenes	RGB-D	indoor	3RScan, ScanNet, Matterport3D
2024	RELLIS-3D Dataset: Data, Benchmarks and Analysis	13,556 LiDAR scans and 6,235 images	RGB, pts	outdoor	-
2024	SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes	710 high-fidelity 3D scenes	RGB-D(reconstructed), pts	indoor	-
2023	Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning	116 scenes?	pts, RGB-D	outdoor	-
2023	ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes	460 scenes	RGB-D, pts	indoor	-
2023	DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision	140+ RGB-D scenes; 10K+ video	Multi-view RGB; video	indoor & outdoor	-
2021	Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting	20,000 sequences of unlabeled lidar point clouds and map-aligned pose	pts	outdoor	-
2021	ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data	5,048 RGB-D sequences	RGB-D, pts	indoor	-
2020	Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions	1482 scenes	RGB-D	indoor	-
2020	ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language	800 ScanNet scenes	RGB-D	indoor	-
2020	ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes	707 scenes with object mask	pts	indoor	-
2020	nuScenes: A multimodal dataset for autonomous driving	1000 scenes	RGB, pts	outdoor	-
2020	Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical Understanding of Outdoor Scene	937.1M points	pts	outdoor	-
2012	Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite	389 stereo and optical flow image pairs, stereo visual odometry sequences	pts	outdoor	-

Statistics

! Pending

Type \ Modality	Mesh	Point-Cloud	Multi-view	Scene-Graph
Real-Object
Real-Scene
Synthetic-Object
Synthetic-Scene

Generative Models or Tools

Object

Image 2 Multi-view

(ICLR 2024) SyncDreamer: Generating Multiview-consistent Images from a Single-view Image [Paper]

Text 2 3D

(ECCV 2024) DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation [Paper] [Project], coarse-to-fine scheme
(2022.12) Point·E: A System for Generating 3D Point Clouds from Complex Prompts [Paper]
(2022.10) CommonSim-1: Generating 3D Worlds [Project], text-to-3D dynamic environment.

Single-View 2 3D

(CVPR 2024) Wonder3D: Single Image to 3D using Cross-Domain Diffusion [Paper]
(2024.9) MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis [Paper], leveraging the power of MLLM.
(CVPR 2024) Splatter Image: Ultra-Fast Single-View 3D Reconstruction [Paper]
(ACMM 2024) Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models [Paper]
(RSS 2024 Workshop) Single-View 3D Reconstruction via SO(2)-Equivariant Gaussian Sculpting Networks [Paper]
(NeurlPS 2023) One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization [Paper]
(CVPR 2023) PC^2 Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction [Paper]
(2023.4) Anything-3D: Towards Single-view Anything Reconstruction in the Wild [Paper]
(ICCV 2023) Zero-1-to-3: Zero-shot One Image to 3D Object [Paper]

PairedImg 2 3D

(CVPR 2024) pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction [Paper]

Multi-View 2 3D

(ICCV 2023) NeuS2: Fast Learning of Neural Implicit Surfaces for Multi-view Reconstruction [Paper]

Text 2 4D

(CVPR 2024) Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models [Paper]

3D part segmentation

(CVPR 2023) Self-positioning Point-based Transformer for Point Cloud Understanding [Paper]
(CVPR 2017) PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation [Paper] [Code]
(Proc. SGP 2023) Cross-Shape Attention for Part Segmentation of 3D Point Clouds [Paper] [Code]

3D general model

(ICLR 2024 Spotlight) Uni3D: Exploring Unified 3D Representation at Scale [Paper]

Scene

(2024.9) GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction [Paper] [Project]

Images

RGB 2 Depth

(ICCV 2021) Vision Transformers for Dense Prediction [Paper]

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
assets		assets
README.md		README.md

lihongcs/Awesome-3D-Assets

Folders and files

Latest commit

History

Repository files navigation

🔥Awesome-3D-Assets

3D Datasets

Contents

Real Data

Object

Hands

Ego

Scene

Synthetic

Object

Manipulation

Scene

Object and Scene

Real and Synthetic

Object

Scene

Scene Datasets

Statistics

Generative Models or Tools

Object

Image 2 Multi-view

Text 2 3D

Single-View 2 3D

PairedImg 2 3D

Multi-View 2 3D

Text 2 4D

3D part segmentation

3D general model

Scene

Images

RGB 2 Depth

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Packages