Skip to content
/ DiCLIP Public

[UnderReview] DiCLIP: Diffusion Model Enhances CLIP’s Dense Knowledge for Weakly Supervised Semantic Segmentation

Notifications You must be signed in to change notification settings

zwyang6/DiCLIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

[UnderReview] DiCLIP: Diffusion Model Enhances CLIP’s Dense Knowledge for Weakly Supervised Semantic Segmentation

News

  • If you find this work helpful, please give us a 🌟 to receive the updation !
  • Mar. 28th, 2025: DiCLIP is Submitted.
  • Code will be available once DiCLIP is accepted... 🔥🔥🔥

Overview

We propose DiCLIP, a novel WSSS framework, which leverages the generative diffusion model to enhance CLIP's dense knowledge across vision and text modalities

DiCLIP pipeline

Data Preparation

PASCAL VOC 2012

1. Download

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar

2. Segmentation Labels

The augmented annotations are from SBD dataset. The download link of the augmented annotations at DropBox. After downloading SegmentationClassAug.zip, you should unzip it and move it to VOCdevkit/VOC2012/.

VOCdevkit/
└── VOC2012
    ├── Annotations
    ├── ImageSets
    ├── JPEGImages
    ├── SegmentationClass
    ├── SegmentationClassAug
    └── SegmentationObject

MSCOCO 2014

1. Download

wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip

2. Segmentation Labels

To generate VOC style segmentation labels for COCO, you could use the scripts provided at this repo, or just download the generated masks from Google Drive.

COCO/
├── JPEGImages
│    ├── train2014
│    └── val2014
└── SegmentationClass
     ├── train2014
     └── val2014

Requirement

Please refer to the requirements.txt.

Train DiCLIP

### train voc
bash run_train.sh scripts/train_voc.py [gpu_device] [gpu_number] [master_port]  train_voc

### train coco
bash run_train.sh scripts/train_coco.py [gpu_devices] [gpu_numbers] [master_port] train_coco

Evaluate DiCLIP

### eval voc seg and LAM
bash run_evaluate_voc.sh tools/infer_lam.py [gpu_device] [gpu_number] [infer_set] [checkpoint_path]

### eval coco seg
bash run_evaluate_seg_coco.sh tools/infer_seg_coco.py [gpu_device] [gpu_number] [infer_set] [checkpoint_path]

Main Results

  • Quantitative Results

Semantic performance on VOC and COCO. Logs are available now. Checkpoints will be available soon.

Dataset Backbone Val Test Log Weight
PASCAL VOC ViT-B 78.8 78.9 log Checkpoint
MS COCO ViT-B 48.7 - log Checkpoint
  • Qualitative Results
  1. CAM Comparison

DiCLIP results

  1. VOC Segmentation

DiCLIP results

  1. COCO Segmentation

DiCLIP results

Citation

Please cite our work if you find it helpful to your reseach. 💕

About

[UnderReview] DiCLIP: Diffusion Model Enhances CLIP’s Dense Knowledge for Weakly Supervised Semantic Segmentation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published