[UnderReview] DiCLIP: Diffusion Model Enhances CLIP’s Dense Knowledge for Weakly Supervised Semantic Segmentation
- If you find this work helpful, please give us a 🌟 to receive the updation !
Mar. 28th, 2025
: DiCLIP is Submitted.- Code will be available once DiCLIP is accepted... 🔥🔥🔥
We propose DiCLIP, a novel WSSS framework, which leverages the generative diffusion model to enhance CLIP's dense knowledge across vision and text modalities
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
The augmented annotations are from SBD dataset. The download link of the augmented annotations at
DropBox. After downloading SegmentationClassAug.zip
, you should unzip it and move it to VOCdevkit/VOC2012/
.
VOCdevkit/
└── VOC2012
├── Annotations
├── ImageSets
├── JPEGImages
├── SegmentationClass
├── SegmentationClassAug
└── SegmentationObject
wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip
To generate VOC style segmentation labels for COCO, you could use the scripts provided at this repo, or just download the generated masks from Google Drive.
COCO/
├── JPEGImages
│ ├── train2014
│ └── val2014
└── SegmentationClass
├── train2014
└── val2014
Please refer to the requirements.txt.
### train voc
bash run_train.sh scripts/train_voc.py [gpu_device] [gpu_number] [master_port] train_voc
### train coco
bash run_train.sh scripts/train_coco.py [gpu_devices] [gpu_numbers] [master_port] train_coco
### eval voc seg and LAM
bash run_evaluate_voc.sh tools/infer_lam.py [gpu_device] [gpu_number] [infer_set] [checkpoint_path]
### eval coco seg
bash run_evaluate_seg_coco.sh tools/infer_seg_coco.py [gpu_device] [gpu_number] [infer_set] [checkpoint_path]
- Quantitative Results
Semantic performance on VOC and COCO. Logs are available now. Checkpoints will be available soon.
Dataset | Backbone | Val | Test | Log | Weight |
---|---|---|---|---|---|
PASCAL VOC | ViT-B | 78.8 | 78.9 | log | Checkpoint |
MS COCO | ViT-B | 48.7 | - | log | Checkpoint |
- Qualitative Results
- CAM Comparison
- VOC Segmentation
- COCO Segmentation
Please cite our work if you find it helpful to your reseach. 💕