Sizhe Lester Li, Annan Zhang, Boyuan Chen, Hanna Matusik, Chao Liu, Daniela Rus, Vincent Sitzmann
π Paper (Nature, 2025) | π Project Website | π Tutorial | π₯ Explainer | π¦Dataset
[TL;DR] Neural Jacobian Fields are a general-purpose representation of robotic systems that can be learned from perception.
- [2025-09-23] Added FAQ about training time and supervision types.
- [2025-08-29] Released the Allegro-Hand-Only Dataset β a lighter version containing only the Allegro Hand, making it much faster to download.
- [2025-06-25] Our paper is now published in Nature.
- [2025-04-20] Dataset now live on HuggingFace: Link.
- [2025-03-23] Major tutorial updates for training in 2D simulations.
We provide the software implementations of:
- π§ 3D Jacobian Field:
project/neural_jacobian_field - β 2D Jacobian Field:
project/jacobian - π§ͺ Custom simulator:
mujoco-phys-sim
conda create --name neural-jacobian-field python=3.10.8
conda activate neural-jacobian-fieldbash install.sh
Download from Google Drive and place them under:
notebooks/inference_demo_data/real_world_pretrained_ckpts
notebooks/tutorial/tutorial_pretrained_ckpts
Tutorial Notebooks (2D, ~30 mins each)
βοΈ Ready-to-Run Demos
We provide two datasets depending on your needs:
β¨ Recommended β lightweight, faster to download and work with.
Command to download:
huggingface-cli download --resume-download --repo-type dataset sizhe-lester-li/neural-jacobian-field-allegro-only
A comprehensive multiview video-action dataset with camera poses, containing:
- π€ Pneumatic robot hand (mounted on robot arm)
- β Allegro robot hand
- π§© Handed Shearing Auxetics platform
- π¦Ύ Poppy robot arm
Command to download:
huggingface-cli download --resume-download --repo-type dataset sizhe-lester-li/neural-jacobian-field
On a 4 x A8000s server, perception training takes 1 day, and Jacobian training takes 12 hours to 1 day.
python3 -m neural_jacobian_field.train dataset=dataset_allegro model=model_allegro dataset.mode=perception
Replace the checkpoint flag with what you have on wandb :) and start training
python3 -m neural_jacobian_field.train dataset=dataset_allegro model=model_allegro dataset.mode=action checkpoint.load=wandb://entity/project/usoftylr:v5
- Extrinsics: OpenCV-style camera-to-world matrices (+Z = look vector, +X = right, βY = up)
- Intrinsics: Normalized (row 1 Γ· width, row 2 Γ· height)
Yes, everything is fine! The number of training steps in the default config (50 million) is somewhat arbitrary. In practice, you can stop once you see good 3D reconstruction results during stage 1 (PixelNeRF), and then move on to fitting Jacobian fields. You usually donβt need to run the full 50M steps.
We tested training on:
- 4 Γ A8000s
- 4 Γ A100s
For testing on a local robot-ready PC after training, we used a single RTX 4090.
Yes. The training script supports multi-GPU setups. By default, the script will use all available GPUs. You can set CUDA_VISIBLE_DEVICES to select specific GPUs. We recommend multi-GPU for large-scale training, especially with the full dataset.
This usually happens if action_supervision_type is set to tracks.
- In track supervision,
rays_per_batchis ignored. - Instead, the number of rays is determined by:
num_positive_samples+num_negative_samples. If both values arenull(default), the dataloader uses all tracks (often ~10,000 rays), which easily causes OOM.
For the Allegro hand dataset, we by default use optical flow supervision (via RAFT), not track supervision. Both types of supervision have been tested and work well.
If you find our work useful, please consider citing us:
@Article{Li2025,
author={Li, Sizhe Lester
and Zhang, Annan
and Chen, Boyuan
and Matusik, Hanna
and Liu, Chao
and Rus, Daniela
and Sitzmann, Vincent},
title={Controlling diverse robots by inferring Jacobian fields with deep networks},
journal={Nature},
year={2025},
month={Jun},
day={25},
issn={1476-4687},
doi={10.1038/s41586-025-09170-0},
url={https://doi.org/10.1038/s41586-025-09170-0}
}
The authors thank Hyung Ju Terry Suh for his writing suggestions (system dynamics) and Tao Chen and Pulkit Agrawal for their hardware support on the Allegro hand. V.S. acknowledges support from the Solomon Buchsbaum Research Fund through MITβs Research Suppport Committee. S.L.L. was supported through an MIT Presidential Fellowship. A.Z., H.M., C.L., and D.R. acknowledge support from the National Science Foundation EFRI grant 1830901 and the Gwangju Institute of Science and Technology.
