Skip to content

VCIP-RGBD/DFormer-Jittor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

12 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

DFormer for RGBD Semantic Segmentation (Jittor Implementation)

Framework License Python

This is the Jittor implementation of DFormer and DFormerv2 for RGBD semantic segmentation. Developed based on the Jittor deep learning framework, it provides efficient solutions for training and inference.

This repository contains the official Jittor implementation of the following papers:

DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation
Bowen Yin, Xuying Zhang, Zhongyu Li, Li Liu, Ming-Ming Cheng, Qibin Hou*
ICLR 2024. Paper Link | Homepage | PyTorch Version

DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation
Bo-Wen Yin, Jiao-Long Cao, Ming-Ming Cheng, Qibin Hou*
CVPR 2025. Paper Link | Chinese Version | PyTorch Version


โœจ About the Jittor Framework: An Architectural Deep Dive โœจ

This project is built upon Jittor, a cutting-edge deep learning framework that pioneers a design centered around Just-In-Time (JIT) compilation and meta-operators. This architecture provides a unique combination of high performance and exceptional flexibility. Instead of relying on static, pre-compiled libraries, Jittor operates as a dynamic, programmable system that compiles itself and the user's code on the fly.

The Core Philosophy: From Static Library to Dynamic Compiler

Jittor's design philosophy is to treat the deep learning framework not as a fixed set of tools, but as a domain-specific compiler. The high-level Python code written by the user serves as a directive to this compiler, which then generates highly optimized, hardware-specific machine code at runtime. This approach unlocks a level of performance and flexibility that is difficult to achieve with traditional frameworks.

Key Innovations of Jittor

  • A Truly Just-in-Time (JIT) Compiled Framework:

    Jittor's most significant innovation is that the entire framework is JIT compiled. This goes beyond merely compiling a static computation graph. When a Jittor program runs, the Python code, including the core framework logic and the user's model, is first parsed into an intermediate representation. The Jittor compiler then performs a series of advanced optimizationsโ€”such as operator fusion, memory layout optimization, and dead code eliminationโ€”before generating and executing native C++ or CUDA code. This "whole-program" compilation approach means that the framework can adapt to the specific logic of your model, enabling optimizations that are impossible when linking against a static, pre-compiled library.

  • Meta-Operators and Dynamic Kernel Fusion:

    At the heart of Jittor lies the concept of meta-operators. These are not monolithic, pre-written kernels (like in other frameworks), but rather elementary building blocks defined in Python. For instance, a complex operation like Conv2d followed by ReLU is not two separate kernel calls. Instead, Jittor composes them from meta-operators, and its JIT compiler fuses them into a single, efficient CUDA kernel at runtime. This kernel fusion is critical for performance on modern accelerators like GPUs, as it drastically reduces the time spent on high-latency memory I/O and kernel launch overhead, which are often the primary bottlenecks.

  • The Unified Computation Graph: Flexibility Meets Performance:

    Jittor elegantly resolves the classic trade-off between the flexibility of dynamic graphs (like PyTorch) and the performance of static graphs (like TensorFlow 1.x). You can write your model using all the native features of Python, including complex control flow like if/else statements and data-dependent for loops. Jittor's compiler traces these dynamic execution paths and still constructs a graph representation that it can optimize globally. It achieves this by JIT-compiling different graph versions for different execution paths, thus preserving Python's expressiveness without sacrificing optimization potential.

  • Decoupling of Frontend Logic and Backend Optimization:

    Jittor champions a clean separation that empowers researchers. You focus on the "what"โ€”the mathematical logic of your modelโ€”using a clean, high-level Python API. Jittor's backend automatically handles the "how"โ€”the complex task of writing high-performance, hardware-specific code. This frees researchers who are experts in their domain (e.g., computer vision) from needing to become experts in low-level GPU programming, thus accelerating the pace of innovation.


๐Ÿšฉ Performance


Chart 1: Comparison of mIoU changes between Jittor implementation and Pytorch implementation of Dformer-Large.


Chart 2: Comparisons of lantency between Jittor implementation and Pytorch implementation

๐Ÿš€ Getting Started

Environment Setup

# Create a conda environment
conda create -n dformer_jittor python=3.8 -y
conda activate dformer_jittor

# Install Jittor
pip install jittor

# Install other dependencies
pip install opencv-python pillow numpy scipy tqdm tensorboardX tabulate easydict

Dataset Preparation

Supported datasets:

  • NYUDepthv2: An indoor RGBD semantic segmentation dataset.
  • SUNRGBD: A large-scale dataset for indoor scene understanding.

Download links:

Dataset GoogleDrive OneDrive BaiduNetdisk

Pre-trained Models

Model Dataset mIoU Download Link
DFormer-Small NYUDepthv2 52.3 BaiduNetdisk
DFormer-Base NYUDepthv2 54.1 BaiduNetdisk
DFormer-Large NYUDepthv2 55.8 BaiduNetdisk
DFormerv2-Small NYUDepthv2 53.7 BaiduNetdisk
DFormerv2-Base NYUDepthv2 55.3 BaiduNetdisk
DFormerv2-Large NYUDepthv2 57.1 BaiduNetdisk

Directory Structure

DFormer-Jittor/
โ”œโ”€โ”€ checkpoints/              # Directory for pre-trained models
โ”‚   โ”œโ”€โ”€ pretrained/          # ImageNet pre-trained models
โ”‚   โ””โ”€โ”€ trained/             # Trained models
โ”œโ”€โ”€ datasets/                # Directory for datasets
โ”‚   โ”œโ”€โ”€ NYUDepthv2/         # NYU dataset
โ”‚   โ””โ”€โ”€ SUNRGBD/            # SUNRGBD dataset
โ”œโ”€โ”€ local_configs/          # Configuration files
โ”œโ”€โ”€ models/                 # Model definitions
โ”œโ”€โ”€ utils/                  # Utility functions
โ”œโ”€โ”€ train.sh               # Training script
โ”œโ”€โ”€ eval.sh                # Evaluation script
โ””โ”€โ”€ infer.sh               # Inference script

๐Ÿ“– Usage

Training

Use the provided training script:

bash train.sh

Or use the Python command directly:

python utils/train.py --config local_configs.NYUDepthv2.DFormer_Base

Evaluation

bash eval.sh

Alternatively:

python utils/eval.py --config local_configs.NYUDepthv2.DFormer_Base --checkpoint checkpoints/trained/NYUDepthv2/DFormer_Base/best.pkl

Inference/Visualization

bash infer.sh

๐ŸŽฏ Performance

NYUDepthv2 Dataset

Method Backbone mIoU Params FLOPs
DFormer-T DFormer-Tiny 48.5 5.0M 15.2G
DFormer-S DFormer-Small 52.3 13.1M 28.4G
DFormer-B DFormer-Base 54.1 35.4M 75.0G
DFormer-L DFormer-Large 55.8 62.3M 132.8G
DFormerv2-S DFormerv2-Small 53.7 13.1M 28.4G
DFormerv2-B DFormerv2-Base 55.3 35.4M 75.0G
DFormerv2-L DFormerv2-Large 57.1 62.3M 132.8G

SUNRGBD Dataset

Method Backbone mIoU Params FLOPs
DFormer-T DFormer-Tiny 46.2 5.0M 15.2G
DFormer-S DFormer-Small 49.8 13.1M 28.4G
DFormer-B DFormer-Base 51.6 35.4M 75.0G
DFormer-L DFormer-Large 53.4 62.3M 132.8G
DFormerv2-S DFormerv2-Small 51.2 13.1M 28.4G
DFormerv2-B DFormerv2-Base 52.8 35.4M 75.0G
DFormerv2-L DFormerv2-Large 54.5 62.3M 132.8G

๐Ÿ”ง Configuration

The project uses Python configuration files located in the local_configs/ directory:

# local_configs/NYUDepthv2/DFormer_Base.py
class C:
    # Dataset configuration
    dataset_name = "NYUDepthv2"
    dataset_dir = "datasets/NYUDepthv2"
    num_classes = 40
    
    # Model configuration
    backbone = "DFormer_Base"
    pretrained_model = "checkpoints/pretrained/DFormer_Base.pth"
    
    # Training configuration
    batch_size = 8
    nepochs = 500
    lr = 0.01
    momentum = 0.9
    weight_decay = 0.0001
    
    # Other configurations
    log_dir = "logs"
    checkpoint_dir = "checkpoints"

๐Ÿ“Š Benchmarking

FLOPs and Parameters

python benchmark.py --config local_configs.NYUDepthv2.DFormer_Base

Inference Speed

python utils/latency.py --config local_configs.NYUDepthv2.DFormer_Base

โš ๏ธ Note

Root Cause of the Issue

What is CUTLASS?
CUTLASS (CUDA Templates for Linear Algebra Subroutines) is a high-performance CUDA matrix operation template library launched by NVIDIA, primarily used for efficiently implementing core operators like GEMM/Conv on Tensor Cores. It is utilized by many frameworks (Jittor, PyTorch XLA, TVM, etc.) for custom operators or as a low-level acceleration for Auto-Tuning.

Why does Jittor pull CUTLASS in cuDNN unit tests?
When Jittor loads/compiles external CUDA libraries, it automatically compiles several custom operators from CUTLASS (setup_cutlass()). If the local cache is missing, it will call install_cutlass() to download and extract a cutlass.zip.

Direct Cause of the Crash

The install_cutlass() function in version 1.3.9.14 uses a download link that has become invalid (confirmed by community Issue #642).
After the download fails, a partial ~/.cache/jittor/cutlass directory is left behind; when running the function again, it attempts to execute shutil.rmtree('.../cutlass/cutlass'), but this subdirectory does not exist, triggering a FileNotFoundError and ultimately causing the main process to core dump.

่งฃๅ†ณๆ–นๆกˆ (ๆŒ‰ๆŽจ่้กบๅบ้€‰ๆ‹ฉๅ…ถไธ€)

ๆ–นๆกˆ ๆ“ไฝœๆญฅ้ชค ้€‚็”จๅœบๆ™ฏ
1๏ธโƒฃ ไธดๆ—ถ่ทณ่ฟ‡ CUTLASS bash<br># ไป…ๅฏนๅฝ“ๅ‰ shell ็”Ÿๆ•ˆ<br>export use_cutlass=0<br>python3.8 -m jittor.test.test_cudnn_op<br> ๅชๆƒณๅ…ˆ่ท‘้€š cuDNN ๅ•ๆต‹ / ไธ้œ€่ฆ CUTLASS ็ฎ—ๅญ
2๏ธโƒฃ ๆ‰‹ๅŠจๅฎ‰่ฃ… CUTLASS bash<br># ๆธ…็†ๆฎ‹็•™<br>rm -rf ~/.cache/jittor/cutlass<br><br># ๆ‰‹ๅŠจๅ…‹้š†ๆœ€ๆ–ฐ็‰ˆ<br>mkdir -p ~/.cache/jittor/cutlass && \<br>cd ~/.cache/jittor/cutlass && \<br>git clone --depth 1 https://github.com/NVIDIA/cutlass.git cutlass<br><br># ๅ†ๆฌก่ฟ่กŒ<br>python3.8 -m jittor.test.test_cudnn_op<br> ไปๆƒณไฟ็•™ CUTLASS ็›ธๅ…ณ็ฎ—ๅญๅŠŸ่ƒฝ
3๏ธโƒฃ ๅ‡็บง Jittor ่‡ณไฟฎๅค็‰ˆๆœฌ bash<br>pip install -U jittor jittor-utils<br>

็คพๅŒบ 1.3.9.15+ ๅทฒๆŠŠๅคฑๆ•ˆ้“พๆŽฅๆ”นๅˆฐ้•œๅƒๆบ๏ผŒๅ‡็บงๅŽๅณๅฏ่‡ชๅŠจ้‡ๆ–ฐไธ‹่ฝฝใ€‚
ๅ…่ฎธๅ‡็บง็ŽฏๅขƒๅนถๅธŒๆœ›ๅŽ็ปญ่‡ชๅŠจ็ฎก็†

๐Ÿค Contributing

We welcome all forms of contributions:

  1. Bug Reports: Report issues in GitHub Issues.
  2. Feature Requests: Suggest new features.
  3. Code Contributions: Submit Pull Requests.
  4. Documentation Improvements: Improve README and code comments.

๐Ÿ“ž Contact

If you have any questions about our work, feel free to contact us:

๐Ÿ“š Citation

If you use our work in your research, please cite the following papers:

@inproceedings{yin2024dformer,
  title={DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation},
  author={Yin, Bowen and Zhang, Xuying and Li, Zhong-Yu and Liu, Li and Cheng, Ming-Ming and Hou, Qibin},
  booktitle={ICLR},
  year={2024}
}

@inproceedings{yin2025dformerv2,
  title={DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation},
  author={Yin, Bo-Wen and Cao, Jiao-Long and Cheng, Ming-Ming and Hou, Qibin},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={19345--19355},
  year={2025}
}

๐Ÿ™ Acknowledgements

Our implementation is mainly based on the following open-source projects:

Thanks to all the contributors for their efforts!

๐Ÿ“„ License

This project is for non-commercial use only. See the LICENSE file for details.


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published