A Study on the Image Classification Performance of Data Augmentation Techniques Using NeRF*(3D Reconstruction Model)
2024 FALL AI Intensive Class1 (SCE3319, F135-1) Project
A Study on the Image Classification Performance of Data Augmentation Techniques Using NeRF*(3D Reconstruction Model)
- The project aims to utilize 3D object generation models like NeRF, 3D Gaussian Splatting, or Mesh to better understand occluded or unseen parts of objects, such as their backsides, and to address the issue of viewpoint variation in image classification.
- We initiated the project with the belief that these models can enhance image classification accuracy by generating diverse viewpoints.
Dept | Icon | Name | Github |
---|---|---|---|
software | Minchang Kim | ||
software | ![]() |
Jongho Baik |
- Click the image to see the datasets.
/Drive
├── 1. train_label_baseline.csv
├── 2. train_label_3d.csv
├── 3. test_label.csv
├── 4. best_model_baseline.pth
├── 5. best_model_3d.pth
└── datasets
├── 3D images
│ └── dataset6_batch*_output.zip
└── dataset*.zip
- The Google Drive URL will be expired soon because of Storage Limitation and ShapeNetCore Datasets Licenses.
/AI-1
├── 1. 3D-ResNet18.ipynb
├── 2. 3D_ObjectGen.py
├── 3. Baseline-ResNet18.ipynb
├── 4. Capture_Image_from_Object.py
├── 5. dataloaders_ShapeNetCore.ipynb
└── README.md
train_label_baseline.csv
: Training labels for the baseline models.train_label_3d.csv
: Training labels including images generated using InstantMesh.test_label.csv
: Test labels.best_model_baseline.pth
: Weights of the best baseline model.best_model_3d.pth
: Weights of the best model training with images generated using InstantMesh.dataset6_batch*_output.zip
: Images generated using InstantMesh. The original image is from point of view 6.dataset*.zip
: Original datasets, including training and test datasets. Each number corresponds to a point of view.
3D-ResNet18.ipynb
3D_ObjectGen.py
: Generating 3D Object files using InstantMeshCapture_Image_from_Object.py
: Capture each object's 5 view from 3D_ObjectGen.py Result file(.obj, .mtl .png)Baseline-ResNet18.ipynb
dataloaders_ShapeNetCore.ipynb
: Images generated from the ShapeNetCore dataset.
- InstantMesh
- Goals:
- Fast generation from a single image.
- Applicable to various categories, not just for car and chair categories.
- Goals:
- In this project, we use 14 povs as follows.
Filename | Perspective |
---|---|
filename_0.png | Front |
filename_1.png | Back |
filename_2.png | Left Side |
filename_3.png | Right Side |
filename_4.png | Top |
filename_5.png | Top Left |
filename_6.png | Top Right |
filename_7.png | (Back) Top Right |
filename_8.png | (Back) Top Left |
filename_9.png | Bottom |
filename_10.png | Bottom Left |
filename_11.png | Bottom Right |
filename_12.png | (Back) Bottom Right |
filename_13.png | (Back) Bottom Left |
-
When using InstantMesh, we use
filename_6.png
(Top Right perspective) because it represents at least three planes, providing more detailed information about the object.
- The original ShapeNetCore dataset categories are identified by
synset_id
. We have mapped these to custom-defined indices as shown below:
synset_id | category | index |
---|---|---|
2691156 | airplane | 0 |
2747177 | trash bin | 1 |
2773838 | bag | 2 |
2801938 | basket | 3 |
2808440 | bathtub | 4 |
2818832 | bed | 5 |
2828884 | bench | 6 |
2843684 | birdhouse | 7 |
2871439 | bookshelf | 8 |
2876657 | bottle | 9 |
2880940 | bowl | 10 |
2924116 | bus | 11 |
2933112 | cabinet | 12 |
2942699 | camera | 13 |
2946921 | can | 14 |
2954340 | cap | 15 |
2958343 | car | 16 |
2992529 | cellphone | 17 |
3001627 | chair | 18 |
3046257 | clock | 19 |
3085013 | keyboard | 20 |
3207941 | dishwasher | 21 |
3211117 | display | 22 |
3261776 | earphone | 23 |
3325088 | faucet | 24 |
3337140 | file cabinet | 25 |
3467517 | guitar | 26 |
3513137 | helmet | 27 |
3593526 | jar | 28 |
3624134 | knife | 29 |
3636649 | lamp | 30 |
3642806 | laptop | 31 |
3691459 | loudspeaker | 32 |
3710193 | mailbox | 33 |
3759954 | microphone | 34 |
3761084 | microwave | 35 |
3790512 | motorbike | 36 |
3797390 | mug | 37 |
3928116 | piano | 38 |
3938244 | pillow | 39 |
3948459 | pistol | 40 |
3991062 | pot | 41 |
4004475 | printer | 42 |
4074963 | remote | 43 |
4090263 | rifle | 44 |
4099429 | rocket | 45 |
4225987 | skateboard | 46 |
4256520 | sofa | 47 |
4330267 | stove | 48 |
4379243 | table | 49 |
4401088 | telephone | 50 |
4460130 | tower | 51 |
4468005 | train | 52 |
4530566 | watercraft | 53 |
4554684 | washer | 54 |
Baseline Model - ResNet18 | Baseline + 3D generated Images | |
---|---|---|
Accuracy | 77.2% | 78.9% |
- As we can see in Github Issues in Pytorch3D (facebookresearch/pytorch3d#666, facebookresearch/pytorch3d#313,...), ShapeNetCore datasets are far from good qualities.
- It seems that each object needs fine-tuning (lighting, texture, etc.).
- However, ShapeNetCore contains more than 50,000 objects...
- The quality of the image itself surely affects the process of generating 3D objects in InstantMesh.
- It seems that each object needs fine-tuning (lighting, texture, etc.).
- We used 12 Colab sessions simultaneously for this project.
- The biggest weakness might be the limited number of experiments.