- Training code updates
- Simulation Envs
Run the commands below inside the project directory:
- Set up the conda environment:
conda env create -f train/train_environment.yml
- Source the conda environment:
conda activate navidiffusor
- Install the vint_train packages:
pip install -e train/
- Install the
diffusion_policy
package from this repo:git clone git@github.com:real-stanford/diffusion_policy.git pip install -e diffusion_policy/
- Install the
depth_anything_v2
package from this repo:git clone https://github.com/DepthAnything/Depth-Anything-V2.git pip install -r Depth-Anything-V2/requirements.txt
We recommend you to download these (and any other datasets you may want to train on) and run the processing steps below.
We provide some sample scripts to process these datasets, either directly from a rosbag or from a custom format like HDF5s:
- Run
process_bags.py
with the relevant args, orprocess_recon.py
for processing RECON HDF5s. You can also manually add your own dataset by following our structure below. - Run
data_split.py
on your dataset folder with the relevant args. - Expected structure:
├── <dataset_name>
│ ├── <name_of_traj1>
│ │ ├── 0.jpg
│ │ ├── 1.jpg
│ │ ├── ...
│ │ ├── T_1.jpg
│ │ └── traj_data.pkl
│ ├── <name_of_traj2>
│ │ ├── 0.jpg
│ │ ├── 1.jpg
│ │ ├── ...
│ │ ├── T_2.jpg
│ │ └── traj_data.pkl
│ ...
└── └── <name_of_trajN>
├── 0.jpg
├── 1.jpg
├── ...
├── T_N.jpg
└── traj_data.pkl
Each *.jpg
file contains an forward-facing RGB observation from the robot, and they are temporally labeled. The traj_data.pkl
file is the odometry data for the trajectory. It’s a pickled dictionary with the keys:
"position"
: An np.ndarray [T, 2] of the xy-coordinates of the robot at each image observation."yaw"
: An np.ndarray [T,] of the yaws of the robot at each image observation.
After step 2 of data processing, the processed data-split should the following structure inside /train/vint_train/data/data_splits/
:
├── <dataset_name>
│ ├── train
| | └── traj_names.txt
└── └── test
└── traj_names.txt
cd /train
python train.py -c <path_of_train_config_file>
The config yaml files are in the train/config
directory.
🚀 Our method is designed to provide guidance for any diffusion-based navigation model while inferece, improving path generation quality for both PointGoal and ImageGoal tasks. Here, we use NoMaD as an example, an adaptable implementation in guide.py is provided for integrating with your own diffusion model.
cd deployment/src/
sh ./navigate.sh --model <model_name> --dir <topomap_dir> --point-goal False # set --point-goal=True for PointGoal navigation, False for ImageGoal
The <model_name>
is the name of the model in the /deployment/config/models.yaml
file. In this file, you specify these parameters of the model for each model (defaults used):
config_path
(str): path of the *.yaml file in/train/config/
used to train the modelckpt_path
(str): path of the *.pth file in/deployment/model_weights/
Make sure these configurations match what you used to train the model. The configurations for the models we provided the weights for are provided in yaml file for your reference.
The <topomap_dir>
is the name of the directory in /deployment/topomaps/images
that has the images corresponding to the nodes in the topological map. The images are ordered by name from 0 to N.
This command opens up 4 windows:
roslaunch vint_locobot.launch
: This launch file opens the usb_cam node for the camera, the joy node for the joystick, and several nodes for the robot’s mobile base.python navigate.py --model <model_name> --dir <topomap_dir>
: This python script starts a node that reads in image observations from the/usb_cam/image_raw
topic, inputs the observations and the map into the model, and publishes actions to the/waypoint
topic.python joy_teleop.py
: This python script starts a node that reads inputs from the joy topic and outputs them on topics that teleoperate the robot’s base.python pd_controller.py
: This python script starts a node that reads messages from the/waypoint
topic (waypoints from the model) and outputs velocities to navigate the robot’s base.
When the robot is finishing navigating, kill the pd_controller.py
script, and then kill the tmux session. If you want to take control of the robot while it is navigating, the joy_teleop.py
script allows you to do so with the joystick.
@article{zeng2025navidiffusor,
title={NaviDiffusor: Cost-Guided Diffusion Model for Visual Navigation},
author={Zeng, Yiming and Ren, Hao and Wang, Shuhang and Huang, Junlong and Cheng, Hui},
journal={arXiv preprint arXiv:2504.10003},
year={2025}
}
NaviDiffusor is inspired by the contributions of the following works to the open-source community:NoMaD, Depthanythingv2 and ViPlanner. We thank the authors for sharing their outstanding work.