Controlling diverse robots by inferring Jacobian fields with deep networks

Technical Summary Video

Abstract

Mirroring the complex structures and diverse functions of natural organisms is a long-standing challenge in robotics. Modern fabrication techniques have dramatically expanded feasible hardware, yet deploying these systems requires control software to translate desired motions into actuator commands. While conventional robots can easily be modeled as rigid links connected via joints, it remains an open challenge to model and control bio-inspired robots that are often multi-material or soft, lack sensing capabilities, and may change their material properties with use. Here, we introduce Neural Jacobian Fields, an architecture that autonomously learns to model and control robots from vision alone. Our approach makes no assumptions about the robot's materials, actuation, or sensing, requires only a single camera for control, and learns to control the robot without expert intervention by observing the execution of random commands. We demonstrate our method on a diverse set of robot manipulators, varying in actuation, materials, fabrication, and cost. Our approach achieves accurate closed-loop control and recovers the causal dynamic structure of each robot. By enabling robot control with a generic camera as the only sensor, we anticipate our work will dramatically broaden the design space of robotic systems and serve as a starting point for lowering the barrier to robotic automation.

Controlling robots from vision alone

Our model requires only multi-view video of the robot performing random actions to learn both its 3D morphology and control through the Neural Jacobian Field. The learned model can then be used to plan control commands for desired motions.

First, we sample random control commands to be executed on the robot. Using a setup of 12 RGB-D cameras, we record multi-view captures both before each command is executed and after each command has settled into a steady state, forming our dataset.

Using the above dataset, our method learns a mapping from a single RGB image to a neural scene representation. This scene representation combines a Radiance Field, which contains visual and geometric information, with a Jacobian Field, which contains the kinematics information of the scene. We refer to this combined representation as the Neural Jacobian Field.

The Neural Jacobian Field can be used to query the kinematics Jacobian at every coordinate in the 3D scene with respect to the robot's control commands. This capability allows us to visually identify the kinematics chain and plan a sequence of control commands for a desired motion using gradient-based optimization.

Results

With the Neural Jacobian Field, we can visualize the robot's geometry and kinematics inferred from a single image. The figure below shows the predicted depth and colorized kinematics chain as derived from the Neural Jacobian Field. The colorization is completely learned and not labeled by a human.

Our method can perform closed-loop control of diverse robots from vision, including soft robots that are traditionally hard to model, and a $220 3D-printed janky robot arm.

Quantitatively, our controller can effectively reduce the distance-to-goal for specified motion in the form of point movement, providing a viable method to control robots that are traditionally impossible to control, due to challenges in estimating their kinematic structures.

BibTeX


        @Article{Li2025,
        author={Li, Sizhe Lester
        and Zhang, Annan
        and Chen, Boyuan
        and Matusik, Hanna
        and Liu, Chao
        and Rus, Daniela
        and Sitzmann, Vincent},
        title={Controlling diverse robots by inferring Jacobian fields with deep networks},
        journal={Nature},
        year={2025},
        month={Jun},
        day={25},
        issn={1476-4687},
        doi={10.1038/s41586-025-09170-0},
        url={https://doi.org/10.1038/s41586-025-09170-0}  
        }