TL;DR HCNQA improves 3D Visual Question-Answering by hierarchically supervising reasoning phases (BoI, OoI, OoT) to suppress shortcuts, achieving better accuracy and robustness.
conda create -n hcnqa python=3.9 -y
conda activate hcnqa
pip install -r requirements.txt
cd model/vision/pointnet2
pip install .
- Please refer to the tutorial of 3D-VisTA, download and extract the ScanQA dataset.
- download the annotation of our work from coarse_ground_train_5.json and coarse_ground_val_5.json.
- Download the checkpoint for the language encoder (bert-base-uncased).
- Download the checkpoint for our model from eqa_235_5x5_ft5_2389.pth.
python run.py --config project/vista/scanqa_train.yml
python run.py --config project/vista/scanqa_eval.yml
This repository is released under the MIT License (see LICENSE).
The whole codebase inherits heavily from the open-sourced 3D-VisTA project.
We gratefully acknowledge their clean design and utilities.
If you find this project useful in your research, please consider citing:
@misc{zhou2025hcnqaenhancing3dvqa,
title={HCNQA: Enhancing 3D VQA with Hierarchical Concentration Narrowing Supervision},
author={Shengli Zhou and Jianuo Zhu and Qilin Huang and Fangjing Wang and Yanfu Zhang and Feng Zheng},
year={2025},
eprint={2507.01800},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2507.01800},
}