This is the Jittor implementation of DFormer and DFormerv2 for RGBD semantic segmentation. Developed based on the Jittor deep learning framework, it provides efficient solutions for training and inference.
This repository contains the official Jittor implementation of the following papers:
DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation
Bowen Yin, Xuying Zhang, Zhongyu Li, Li Liu, Ming-Ming Cheng, Qibin Hou*
ICLR 2024. Paper Link | Homepage | PyTorch Version
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation
Bo-Wen Yin, Jiao-Long Cao, Ming-Ming Cheng, Qibin Hou*
CVPR 2025. Paper Link | Chinese Version | PyTorch Version
This project is built upon Jittor, a cutting-edge deep learning framework that pioneers a design centered around Just-In-Time (JIT) compilation and meta-operators. This architecture provides a unique combination of high performance and exceptional flexibility. Instead of relying on static, pre-compiled libraries, Jittor operates as a dynamic, programmable system that compiles itself and the user's code on the fly.
Jittor's design philosophy is to treat the deep learning framework not as a fixed set of tools, but as a domain-specific compiler. The high-level Python code written by the user serves as a directive to this compiler, which then generates highly optimized, hardware-specific machine code at runtime. This approach unlocks a level of performance and flexibility that is difficult to achieve with traditional frameworks.
-
A Truly Just-in-Time (JIT) Compiled Framework:
Jittor's most significant innovation is that the entire framework is JIT compiled. This goes beyond merely compiling a static computation graph. When a Jittor program runs, the Python code, including the core framework logic and the user's model, is first parsed into an intermediate representation. The Jittor compiler then performs a series of advanced optimizationsโsuch as operator fusion, memory layout optimization, and dead code eliminationโbefore generating and executing native C++ or CUDA code. This "whole-program" compilation approach means that the framework can adapt to the specific logic of your model, enabling optimizations that are impossible when linking against a static, pre-compiled library.
-
Meta-Operators and Dynamic Kernel Fusion:
At the heart of Jittor lies the concept of meta-operators. These are not monolithic, pre-written kernels (like in other frameworks), but rather elementary building blocks defined in Python. For instance, a complex operation like
Conv2d
followed byReLU
is not two separate kernel calls. Instead, Jittor composes them from meta-operators, and its JIT compiler fuses them into a single, efficient CUDA kernel at runtime. This kernel fusion is critical for performance on modern accelerators like GPUs, as it drastically reduces the time spent on high-latency memory I/O and kernel launch overhead, which are often the primary bottlenecks. -
The Unified Computation Graph: Flexibility Meets Performance:
Jittor elegantly resolves the classic trade-off between the flexibility of dynamic graphs (like PyTorch) and the performance of static graphs (like TensorFlow 1.x). You can write your model using all the native features of Python, including complex control flow like
if/else
statements and data-dependentfor
loops. Jittor's compiler traces these dynamic execution paths and still constructs a graph representation that it can optimize globally. It achieves this by JIT-compiling different graph versions for different execution paths, thus preserving Python's expressiveness without sacrificing optimization potential. -
Decoupling of Frontend Logic and Backend Optimization:
Jittor champions a clean separation that empowers researchers. You focus on the "what"โthe mathematical logic of your modelโusing a clean, high-level Python API. Jittor's backend automatically handles the "how"โthe complex task of writing high-performance, hardware-specific code. This frees researchers who are experts in their domain (e.g., computer vision) from needing to become experts in low-level GPU programming, thus accelerating the pace of innovation.
Chart 1: Comparison of mIoU changes between Jittor implementation and Pytorch implementation of Dformer-Large.
Chart 2: Comparisons of lantency between Jittor implementation and Pytorch implementation
# Create a conda environment
conda create -n dformer_jittor python=3.8 -y
conda activate dformer_jittor
# Install Jittor
pip install jittor
# Install other dependencies
pip install opencv-python pillow numpy scipy tqdm tensorboardX tabulate easydict
Supported datasets:
- NYUDepthv2: An indoor RGBD semantic segmentation dataset.
- SUNRGBD: A large-scale dataset for indoor scene understanding.
Download links:
Dataset | GoogleDrive | OneDrive | BaiduNetdisk |
---|
Model | Dataset | mIoU | Download Link |
---|---|---|---|
DFormer-Small | NYUDepthv2 | 52.3 | BaiduNetdisk |
DFormer-Base | NYUDepthv2 | 54.1 | BaiduNetdisk |
DFormer-Large | NYUDepthv2 | 55.8 | BaiduNetdisk |
DFormerv2-Small | NYUDepthv2 | 53.7 | BaiduNetdisk |
DFormerv2-Base | NYUDepthv2 | 55.3 | BaiduNetdisk |
DFormerv2-Large | NYUDepthv2 | 57.1 | BaiduNetdisk |
DFormer-Jittor/
โโโ checkpoints/ # Directory for pre-trained models
โ โโโ pretrained/ # ImageNet pre-trained models
โ โโโ trained/ # Trained models
โโโ datasets/ # Directory for datasets
โ โโโ NYUDepthv2/ # NYU dataset
โ โโโ SUNRGBD/ # SUNRGBD dataset
โโโ local_configs/ # Configuration files
โโโ models/ # Model definitions
โโโ utils/ # Utility functions
โโโ train.sh # Training script
โโโ eval.sh # Evaluation script
โโโ infer.sh # Inference script
Use the provided training script:
bash train.sh
Or use the Python command directly:
python utils/train.py --config local_configs.NYUDepthv2.DFormer_Base
bash eval.sh
Alternatively:
python utils/eval.py --config local_configs.NYUDepthv2.DFormer_Base --checkpoint checkpoints/trained/NYUDepthv2/DFormer_Base/best.pkl
bash infer.sh
Method | Backbone | mIoU | Params | FLOPs |
---|---|---|---|---|
DFormer-T | DFormer-Tiny | 48.5 | 5.0M | 15.2G |
DFormer-S | DFormer-Small | 52.3 | 13.1M | 28.4G |
DFormer-B | DFormer-Base | 54.1 | 35.4M | 75.0G |
DFormer-L | DFormer-Large | 55.8 | 62.3M | 132.8G |
DFormerv2-S | DFormerv2-Small | 53.7 | 13.1M | 28.4G |
DFormerv2-B | DFormerv2-Base | 55.3 | 35.4M | 75.0G |
DFormerv2-L | DFormerv2-Large | 57.1 | 62.3M | 132.8G |
Method | Backbone | mIoU | Params | FLOPs |
---|---|---|---|---|
DFormer-T | DFormer-Tiny | 46.2 | 5.0M | 15.2G |
DFormer-S | DFormer-Small | 49.8 | 13.1M | 28.4G |
DFormer-B | DFormer-Base | 51.6 | 35.4M | 75.0G |
DFormer-L | DFormer-Large | 53.4 | 62.3M | 132.8G |
DFormerv2-S | DFormerv2-Small | 51.2 | 13.1M | 28.4G |
DFormerv2-B | DFormerv2-Base | 52.8 | 35.4M | 75.0G |
DFormerv2-L | DFormerv2-Large | 54.5 | 62.3M | 132.8G |
The project uses Python configuration files located in the local_configs/
directory:
# local_configs/NYUDepthv2/DFormer_Base.py
class C:
# Dataset configuration
dataset_name = "NYUDepthv2"
dataset_dir = "datasets/NYUDepthv2"
num_classes = 40
# Model configuration
backbone = "DFormer_Base"
pretrained_model = "checkpoints/pretrained/DFormer_Base.pth"
# Training configuration
batch_size = 8
nepochs = 500
lr = 0.01
momentum = 0.9
weight_decay = 0.0001
# Other configurations
log_dir = "logs"
checkpoint_dir = "checkpoints"
python benchmark.py --config local_configs.NYUDepthv2.DFormer_Base
python utils/latency.py --config local_configs.NYUDepthv2.DFormer_Base
What is CUTLASS?
CUTLASS (CUDA Templates for Linear Algebra Subroutines) is a high-performance CUDA matrix operation template library launched by NVIDIA, primarily used for efficiently implementing core operators like GEMM/Conv on Tensor Cores. It is utilized by many frameworks (Jittor, PyTorch XLA, TVM, etc.) for custom operators or as a low-level acceleration for Auto-Tuning.
Why does Jittor pull CUTLASS in cuDNN unit tests?
When Jittor loads/compiles external CUDA libraries, it automatically compiles several custom operators from CUTLASS (setup_cutlass()). If the local cache is missing, it will call install_cutlass() to download and extract a cutlass.zip.
The install_cutlass() function in version 1.3.9.14 uses a download link that has become invalid (confirmed by community Issue #642).
After the download fails, a partial ~/.cache/jittor/cutlass directory is left behind; when running the function again, it attempts to execute shutil.rmtree('.../cutlass/cutlass'), but this subdirectory does not exist, triggering a FileNotFoundError and ultimately causing the main process to core dump.
ๆนๆก | ๆไฝๆญฅ้ชค | ้็จๅบๆฏ |
---|---|---|
1๏ธโฃ ไธดๆถ่ทณ่ฟ CUTLASS | bash<br># ไป
ๅฏนๅฝๅ shell ็ๆ<br>export use_cutlass=0<br>python3.8 -m jittor.test.test_cudnn_op<br> |
ๅชๆณๅ ่ท้ cuDNN ๅๆต / ไธ้่ฆ CUTLASS ็ฎๅญ |
2๏ธโฃ ๆๅจๅฎ่ฃ CUTLASS | bash<br># ๆธ
็ๆฎ็<br>rm -rf ~/.cache/jittor/cutlass<br><br># ๆๅจๅ
้ๆๆฐ็<br>mkdir -p ~/.cache/jittor/cutlass && \<br>cd ~/.cache/jittor/cutlass && \<br>git clone --depth 1 https://github.com/NVIDIA/cutlass.git cutlass<br><br># ๅๆฌก่ฟ่ก<br>python3.8 -m jittor.test.test_cudnn_op<br> |
ไปๆณไฟ็ CUTLASS ็ธๅ ณ็ฎๅญๅ่ฝ |
3๏ธโฃ ๅ็บง Jittor ่ณไฟฎๅค็ๆฌ | bash<br>pip install -U jittor jittor-utils<br> ็คพๅบ 1.3.9.15+ ๅทฒๆๅคฑๆ้พๆฅๆนๅฐ้ๅๆบ๏ผๅ็บงๅๅณๅฏ่ชๅจ้ๆฐไธ่ฝฝใ |
ๅ ่ฎธๅ็บง็ฏๅขๅนถๅธๆๅ็ปญ่ชๅจ็ฎก็ |
We welcome all forms of contributions:
- Bug Reports: Report issues in GitHub Issues.
- Feature Requests: Suggest new features.
- Code Contributions: Submit Pull Requests.
- Documentation Improvements: Improve README and code comments.
If you have any questions about our work, feel free to contact us:
- Email: bowenyin@mail.nankai.edu.cn, caojiaolong@mail.nankai.edu.cn
- GitHub Issues: Submit an issue
If you use our work in your research, please cite the following papers:
@inproceedings{yin2024dformer,
title={DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation},
author={Yin, Bowen and Zhang, Xuying and Li, Zhong-Yu and Liu, Li and Cheng, Ming-Ming and Hou, Qibin},
booktitle={ICLR},
year={2024}
}
@inproceedings{yin2025dformerv2,
title={DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation},
author={Yin, Bo-Wen and Cao, Jiao-Long and Cheng, Ming-Ming and Hou, Qibin},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={19345--19355},
year={2025}
}
Our implementation is mainly based on the following open-source projects:
- Jittor: A deep learning framework.
- DFormer: The original PyTorch implementation.
- mmsegmentation: A semantic segmentation toolbox.
Thanks to all the contributors for their efforts!
This project is for non-commercial use only. See the LICENSE file for details.