NaviDiffusor: Cost-Guided Diffusion Model for Visual Navigation

🏆 Accepted at ICRA 2025
🔗 arXiv | Bilibili | Youtube

✅ TODO List

Training code updates
Simulation Envs

⚙️ Setup

Run the commands below inside the project directory:

Set up the conda environment:

conda env create -f train/train_environment.yml

Source the conda environment:
```
conda activate navidiffusor
```
Install the vint_train packages:
```
pip install -e train/
```

Install the diffusion_policy package from this repo:

git clone git@github.com:real-stanford/diffusion_policy.git
pip install -e diffusion_policy/

Install the depth_anything_v2 package from this repo:

git clone https://github.com/DepthAnything/Depth-Anything-V2.git
pip install -r Depth-Anything-V2/requirements.txt

Data

We recommend you to download these (and any other datasets you may want to train on) and run the processing steps below.

Data Processing

We provide some sample scripts to process these datasets, either directly from a rosbag or from a custom format like HDF5s:

Run process_bags.py with the relevant args, or process_recon.py for processing RECON HDF5s. You can also manually add your own dataset by following our structure below.
Run data_split.py on your dataset folder with the relevant args.
Expected structure:

├── <dataset_name>
│   ├── <name_of_traj1>
│   │   ├── 0.jpg
│   │   ├── 1.jpg
│   │   ├── ...
│   │   ├── T_1.jpg
│   │   └── traj_data.pkl
│   ├── <name_of_traj2>
│   │   ├── 0.jpg
│   │   ├── 1.jpg
│   │   ├── ...
│   │   ├── T_2.jpg
│   │   └── traj_data.pkl
│   ...
└── └── <name_of_trajN>
    	├── 0.jpg
    	├── 1.jpg
    	├── ...
        ├── T_N.jpg
        └── traj_data.pkl

Each *.jpg file contains an forward-facing RGB observation from the robot, and they are temporally labeled. The traj_data.pkl file is the odometry data for the trajectory. It’s a pickled dictionary with the keys:

"position": An np.ndarray [T, 2] of the xy-coordinates of the robot at each image observation.
"yaw": An np.ndarray [T,] of the yaws of the robot at each image observation.

After step 2 of data processing, the processed data-split should the following structure inside /train/vint_train/data/data_splits/:

├── <dataset_name>
│   ├── train
|   |   └── traj_names.txt
└── └── test
        └── traj_names.txt

Model Training

cd /train
python train.py -c <path_of_train_config_file>

The config yaml files are in the train/config directory.

Deployment

Inference with Guidance

🚀 Our method is designed to provide guidance for any diffusion-based navigation model while inferece, improving path generation quality for both PointGoal and ImageGoal tasks. Here, we use NoMaD as an example, an adaptable implementation in guide.py is provided for integrating with your own diffusion model.

cd deployment/src/
sh ./navigate.sh --model <model_name> --dir <topomap_dir> --point-goal False  # set --point-goal=True for PointGoal navigation, False for ImageGoal

The <model_name> is the name of the model in the /deployment/config/models.yaml file. In this file, you specify these parameters of the model for each model (defaults used):

config_path (str): path of the *.yaml file in /train/config/ used to train the model
ckpt_path (str): path of the *.pth file in /deployment/model_weights/

Make sure these configurations match what you used to train the model. The configurations for the models we provided the weights for are provided in yaml file for your reference.

The <topomap_dir> is the name of the directory in /deployment/topomaps/images that has the images corresponding to the nodes in the topological map. The images are ordered by name from 0 to N.

This command opens up 4 windows:

roslaunch vint_locobot.launch: This launch file opens the usb_cam node for the camera, the joy node for the joystick, and several nodes for the robot’s mobile base.
python navigate.py --model <model_name> --dir <topomap_dir>: This python script starts a node that reads in image observations from the /usb_cam/image_raw topic, inputs the observations and the map into the model, and publishes actions to the /waypoint topic.
python joy_teleop.py: This python script starts a node that reads inputs from the joy topic and outputs them on topics that teleoperate the robot’s base.
python pd_controller.py: This python script starts a node that reads messages from the /waypoint topic (waypoints from the model) and outputs velocities to navigate the robot’s base.

When the robot is finishing navigating, kill the pd_controller.py script, and then kill the tmux session. If you want to take control of the robot while it is navigating, the joy_teleop.py script allows you to do so with the joystick.

Citing

  @article{zeng2025navidiffusor,
  title={NaviDiffusor: Cost-Guided Diffusion Model for Visual Navigation},
  author={Zeng, Yiming and Ren, Hao and Wang, Shuhang and Huang, Junlong and Cheng, Hui},
  journal={arXiv preprint arXiv:2504.10003},
  year={2025}
}

Acknowlegdment

NaviDiffusor is inspired by the contributions of the following works to the open-source community:NoMaD, Depthanythingv2 and ViPlanner. We thank the authors for sharing their outstanding work.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
deployment		deployment
train		train
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NaviDiffusor: Cost-Guided Diffusion Model for Visual Navigation

✅ TODO List

⚙️ Setup

Data

Data Processing

Model Training

Deployment

Inference with Guidance

Citing

Acknowlegdment

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

SYSU-RoboticsLab/NaviD

Folders and files

Latest commit

History

Repository files navigation

NaviDiffusor: Cost-Guided Diffusion Model for Visual Navigation

✅ TODO List

⚙️ Setup

Data

Data Processing

Model Training

Deployment

Inference with Guidance

Citing

Acknowlegdment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages