- Install pip.
- Install Cuda Toolkit 12.1
setup.bat
To be done.
On Windows, you can run the following command to start the augmentation process:
run.bat
Before running the script, you need to modify the run.bat
file to specify the augmentation parameters:
data_images_path
- path to the folder with images.data_masks_path
- path to the folder with masks (masks are one channel images with 255-pixels for objects and 0-pixels for background).output_path
- path to the output folder.number_of_inpainted_images_per_image_required
- number of augmented images per image required.main_canny_weight
- weight of the canny ControlNet for the main model.main_depth_weight
- weight of the depth ControlNet for the main model.main_soft_edge_weight
- weight of the soft edge ControlNet for the main model.main_usual_ipadapter_weight
- weight of the IPAdapter for general features of neighboring images for the main model.main_plus_ipadapter_weight
- weight of the IPAdapter (Plus) for input image features for the main model.main_neg_plus_ipadapter_weight
- weight of the IPAdapter (Plus) for negative object features of neighboring images for the main model.dataset_name
- name of the dataset for CLIP features storage.positive_prompt
- positive generation prompt.negative_prompt
- negative generation prompt.seed
- random generation seed.
Alternatively, you can run the augmentation process via Python script:
from src.aug_loop import run_augmentation
run_augmentation(
...
)
Generation examples on the Potholes dataset:
Generation examples on the Rooftops dataset:
For all experiments, we used the pretrained YOLOv8n with the default standard augmentations.
Detection results for the Potholes dataset:
Data | Precision | Recall | mAP50-95 |
---|---|---|---|
without our augmentation | 0.647 Β± 0.020 | 0.572 Β± 0.010 | 0.304 Β± 0.004 |
Diff-Aug (prev) | 0.666 Β± 0.019 | 0.552 Β± 0.015 | 0.330 Β± 0.003 |
Diff-Aug | 0.665 Β± 0.012 | 0.565 Β± 0.018 | 0.330 Β± 0.004 |
Segmentation results for the Potholes dataset:
Data | Precision | Recall | mAP50-95 |
---|---|---|---|
without our augmentation | 0.674 Β± 0.012 | 0.556 Β± 0.014 | 0.282 Β± 0.004 |
Diff-Aug (prev) | 0.666 Β± 0.023 | 0.548 Β± 0.013 | 0.294 Β± 0.003 |
Diff-Aug | 0.660 Β± 0.017 | 0.571 Β± 0.021 | 0.297 Β± 0.004 |
This research is financially supported by the Foundation for National Technology Initiative's Projects Support as a part of the roadmap implementation for the development of the high-tech field of Artificial Intelligence for the period up to 2030 (agreement 70-2021-00187).
Diff-Aug: ΠΡΠ³ΠΌΠ΅Π½ΡΠ°ΡΠΈΡ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠΉ Π΄Π»Ρ Π·Π°Π΄Π°Ρ Π΄Π΅ΡΠ΅ΠΊΡΠΈΠΈ ΠΈ ΡΠ΅Π³ΠΌΠ΅Π½ΡΠ°ΡΠΈΠΈ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ Π΄ΠΈΡΡΡΠ·ΠΈΠΎΠ½Π½ΡΡ Π½Π΅ΠΉΡΠΎΠ½Π½ΡΡ ΡΠ΅ΡΠ΅ΠΉ
- Π£ΡΡΠ°Π½ΠΎΠ²ΠΈΡΠ΅ pip.
- Π£ΡΡΠ°Π½ΠΎΠ²ΠΈΡΠ΅ Cuda Toolkit 12.1
setup.bat
Π ΠΏΡΠΎΡΠ΅ΡΡΠ΅.
ΠΠ° Windows Π²Ρ ΠΌΠΎΠΆΠ΅ΡΠ΅ Π·Π°ΠΏΡΡΡΠΈΡΡ ΡΠ»Π΅Π΄ΡΡΡΡΡ ΠΊΠΎΠΌΠ°Π½Π΄Ρ, ΡΡΠΎΠ±Ρ Π½Π°ΡΠ°ΡΡ ΠΏΡΠΎΡΠ΅ΡΡ Π°ΡΠ³ΠΌΠ΅Π½ΡΠ°ΡΠΈΠΈ:
run.bat
ΠΠ΅ΡΠ΅Π΄ Π·Π°ΠΏΡΡΠΊΠΎΠΌ ΡΠΊΡΠΈΠΏΡΠ° Π²Π°ΠΌ Π½Π΅ΠΎΠ±Ρ
ΠΎΠ΄ΠΈΠΌΠΎ ΠΈΠ·ΠΌΠ΅Π½ΠΈΡΡ ΡΠ°ΠΉΠ» run.bat
, ΡΡΠΎΠ±Ρ ΡΠΊΠ°Π·Π°ΡΡ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡΡ Π°ΡΠ³ΠΌΠ΅Π½ΡΠ°ΡΠΈΠΈ:
data_images_path
- ΠΏΡΡΡ ΠΊ ΠΏΠ°ΠΏΠΊΠ΅ Ρ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΡΠΌΠΈ.data_masks_path
- ΠΏΡΡΡ ΠΊ ΠΏΠ°ΠΏΠΊΠ΅ Ρ ΠΌΠ°ΡΠΊΠ°ΠΌΠΈ (ΠΌΠ°ΡΠΊΠΈ - ΡΡΠΎ ΠΎΠ΄Π½ΠΎΠΊΠ°Π½Π°Π»ΡΠ½ΡΠ΅ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΡ, Π³Π΄Π΅ Π·Π½Π°ΡΠ΅Π½ΠΈΡ ΠΏΠΈΠΊΡΠ΅Π»Π΅ΠΉ ΡΠ°Π²Π½Ρ 255 Π΄Π»Ρ ΠΎΠ±ΡΠ΅ΠΊΡΠΎΠ² ΠΈ 0 Π΄Π»Ρ ΡΠΎΠ½Π°).output_path
- ΠΏΡΡΡ ΠΊ ΠΏΠ°ΠΏΠΊΠ΅ Π²ΡΠ²ΠΎΠ΄Π°.number_of_inpainted_images_per_image_required
- ΠΊΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ Π°ΡΠ³ΠΌΠ΅Π½ΡΠΈΡΠΎΠ²Π°Π½Π½ΡΡ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠΉ Π½Π° ΠΎΠ΄Π½ΠΎ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠ΅.main_canny_weight
- Π²Π΅Ρ Canny ControlNet Π΄Π»Ρ ΠΎΡΠ½ΠΎΠ²Π½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ.main_depth_weight
- Π²Π΅Ρ Depth ControlNet Π΄Π»Ρ ΠΎΡΠ½ΠΎΠ²Π½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ.main_soft_edge_weight
- Π²Π΅Ρ Soft Edge ControlNet Π΄Π»Ρ ΠΎΡΠ½ΠΎΠ²Π½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ.main_usual_ipadapter_weight
- Π²Π΅Ρ IPAdapter Π΄Π»Ρ ΠΎΠ±ΡΠΈΡ ΠΏΡΠΈΠ·Π½Π°ΠΊΠΎΠ² ΡΠΎΡΠ΅Π΄Π½ΠΈΡ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠΉ Π΄Π»Ρ ΠΎΡΠ½ΠΎΠ²Π½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ.main_plus_ipadapter_weight
- Π²Π΅Ρ IPAdapter (Plus) Π΄Π»Ρ ΠΏΡΠΈΠ·Π½Π°ΠΊΠΎΠ² Π²Ρ ΠΎΠ΄Π½ΠΎΠ³ΠΎ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΡ Π΄Π»Ρ ΠΎΡΠ½ΠΎΠ²Π½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ.main_neg_plus_ipadapter_weight
- Π²Π΅Ρ IPAdapter (Plus) Π΄Π»Ρ ΠΎΡΡΠΈΡΠ°ΡΠ΅Π»ΡΠ½ΡΡ ΠΏΡΠΈΠ·Π½Π°ΠΊΠΎΠ² ΠΎΠ±ΡΠ΅ΠΊΡΠΎΠ² ΡΠΎΡΠ΅Π΄Π½ΠΈΡ ΠΈΠ·ΠΎΠ±ΡΠ°ΠΆΠ΅Π½ΠΈΠΉ Π΄Π»Ρ ΠΎΡΠ½ΠΎΠ²Π½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ.dataset_name
- ΠΈΠΌΡ Π½Π°Π±ΠΎΡΠ° Π΄Π°Π½Π½ΡΡ Π΄Π»Ρ Ρ ΡΠ°Π½Π΅Π½ΠΈΡ ΠΏΡΠΈΠ·Π½Π°ΠΊΠΎΠ² CLIP.positive_prompt
- ΠΏΠΎΠ»ΠΎΠΆΠΈΡΠ΅Π»ΡΠ½ΡΠΉ ΠΏΡΠΎΠΌΠΏΡ Π³Π΅Π½Π΅ΡΠ°ΡΠΈΠΈ.negative_prompt
- ΠΎΡΡΠΈΡΠ°ΡΠ΅Π»ΡΠ½ΡΠΉ ΠΏΡΠΎΠΌΠΏΡ Π³Π΅Π½Π΅ΡΠ°ΡΠΈΠΈ.seed
- ΡΠ»ΡΡΠ°ΠΉΠ½ΠΎΠ΅ Π·Π΅ΡΠ½ΠΎ Π³Π΅Π½Π΅ΡΠ°ΡΠΈΠΈ.
ΠΡΠΎΠΌΠ΅ ΡΠΎΠ³ΠΎ, Π²Ρ ΠΌΠΎΠΆΠ΅ΡΠ΅ Π·Π°ΠΏΡΡΡΠΈΡΡ ΠΏΡΠΎΡΠ΅ΡΡ Π°ΡΠ³ΠΌΠ΅Π½ΡΠ°ΡΠΈΠΈ ΡΠ΅ΡΠ΅Π· ΡΠΊΡΠΈΠΏΡ Python:
from src.aug_loop import run_augmentation
run_augmentation(
...
)
ΠΡΠΈΠΌΠ΅ΡΡ Π³Π΅Π½Π΅ΡΠ°ΡΠΈΠΈ Π½Π° Π΄Π°ΡΠ°ΡΠ΅ΡΠ΅ Potholes:
ΠΡΠΈΠΌΠ΅ΡΡ Π³Π΅Π½Π΅ΡΠ°ΡΠΈΠΈ Π½Π° Π΄Π°ΡΠ°ΡΠ΅ΡΠ΅ Rooftops:
ΠΠ»Ρ Π²ΡΠ΅Ρ ΡΠΊΡΠΏΠ΅ΡΠΈΠΌΠ΅Π½ΡΠΎΠ² ΠΌΡ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π»ΠΈ ΠΏΡΠ΅Π΄ΠΎΠ±ΡΡΠ΅Π½Π½ΡΡ YOLOv8n Ρ ΡΡΠ°Π½Π΄Π°ΡΡΠ½ΡΠΌΠΈ Π°ΡΠ³ΠΌΠ΅Π½ΡΠ°ΡΠΈΡΠΌΠΈ.
Π Π΅Π·ΡΠ»ΡΡΠ°ΡΡ Π΄Π΅ΡΠ΅ΠΊΡΠΈΠΈ Π½Π° Π΄Π°ΡΠ°ΡΠ΅ΡΠ΅ Potholes:
ΠΠ°Π½Π½ΡΠ΅ | Π’ΠΎΡΠ½ΠΎΡΡΡ | ΠΠΎΠ»Π½ΠΎΡΠ° | mAP50-95 |
---|---|---|---|
Π±Π΅Π· Π½Π°ΡΠ΅ΠΉ Π°ΡΠ³ΠΌΠ΅Π½ΡΠ°ΡΠΈΠΈ | 0.647 Β± 0.020 | 0.572 Β± 0.010 | 0.304 Β± 0.004 |
Diff-Aug (ΠΏΡΠ΅Π΄) | 0.666 Β± 0.019 | 0.552 Β± 0.015 | 0.330 Β± 0.003 |
Diff-Aug | 0.665 Β± 0.012 | 0.565 Β± 0.018 | 0.330 Β± 0.004 |
Π Π΅Π·ΡΠ»ΡΡΠ°ΡΡ ΡΠ΅Π³ΠΌΠ΅Π½ΡΠ°ΡΠΈΠΈ Π½Π° Π΄Π°ΡΠ°ΡΠ΅ΡΠ΅ Potholes:
ΠΠ°Π½Π½ΡΠ΅ | Π’ΠΎΡΠ½ΠΎΡΡΡ | ΠΠΎΠ»Π½ΠΎΡΠ° | mAP50-95 |
---|---|---|---|
Π±Π΅Π· Π½Π°ΡΠ΅ΠΉ Π°ΡΠ³ΠΌΠ΅Π½ΡΠ°ΡΠΈΠΈ | 0.674 Β± 0.012 | 0.556 Β± 0.014 | 0.282 Β± 0.004 |
Diff-Aug (ΠΏΡΠ΅Π΄) | 0.666 Β± 0.023 | 0.548 Β± 0.013 | 0.294 Β± 0.003 |
Diff-Aug | 0.660 Β± 0.017 | 0.571 Β± 0.021 | 0.297 Β± 0.004 |
Π Π΅Π°Π»ΠΈΠ·ΠΎΠ²Π°Π½ΠΎ ΠΏΡΠΈ ΡΠΈΠ½Π°Π½ΡΠΎΠ²ΠΎΠΉ ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΠ΅ Π€ΠΎΠ½Π΄Π° ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΠΈ ΠΏΡΠΎΠ΅ΠΊΡΠΎΠ² ΠΠ°ΡΠΈΠΎΠ½Π°Π»ΡΠ½ΠΎΠΉ ΡΠ΅Ρ Π½ΠΎΠ»ΠΎΠ³ΠΈΡΠ΅ΡΠΊΠΎΠΉ ΠΈΠ½ΠΈΡΠΈΠ°ΡΠΈΠ²Ρ Π² ΡΠ°ΠΌΠΊΠ°Ρ ΡΠ΅Π°Π»ΠΈΠ·Π°ΡΠΈΠΈ "Π΄ΠΎΡΠΎΠΆΠ½ΠΎΠΉ ΠΊΠ°ΡΡΡ" ΡΠ°Π·Π²ΠΈΡΠΈΡ Π²ΡΡΠΎΠΊΠΎΡΠ΅Ρ Π½ΠΎΠ»ΠΎΠ³ΠΈΡΠ½ΠΎΠ³ΠΎ Π½Π°ΠΏΡΠ°Π²Π»Π΅Π½ΠΈΡ "ΠΡΠΊΡΡΡΡΠ²Π΅Π½Π½ΡΠΉ ΠΈΠ½ΡΠ΅Π»Π»Π΅ΠΊΡ" Π½Π° ΠΏΠ΅ΡΠΈΠΎΠ΄ Π΄ΠΎ 2030 Π³ΠΎΠ΄Π° (ΠΠΎΠ³ΠΎΠ²ΠΎΡ β 70-2021-00187).