Unifying 3D Representation and Control of Diverse Robots with a Single Camera



[TL;DR] Neural Jacobian Fields are a kinematic representation of robots learned from vision. They can learn to control any robot from a single camera, without any other sensors.

Technical Summary Video

Abstract

Mirroring the complex structures and diverse functions of natural organisms is a long-standing challenge in robotics. Modern fabrication techniques have dramatically expanded feasible hardware, yet deploying these systems requires control software to translate desired motions into actuator commands. While conventional robots can easily be modeled as rigid links connected via joints, it remains an open challenge to model and control bio-inspired robots that are often multi-material or soft, lack sensing capabilities, and may change their material properties with use. Here, we introduce Neural Jacobian Fields, an architecture that autonomously learns to model and control robots from vision alone. Our approach makes no assumptions about the robot's materials, actuation, or sensing, requires only a single camera for control, and learns to control the robot without expert intervention by observing the execution of random commands. We demonstrate our method on a diverse set of robot manipulators, varying in actuation, materials, fabrication, and cost. Our approach achieves accurate closed-loop control and recovers the causal dynamic structure of each robot. By enabling robot control with a generic camera as the only sensor, we anticipate our work will dramatically broaden the design space of robotic systems and serve as a starting point for lowering the barrier to robotic automation.

Controlling robots from vision alone

Our model requires only multi-view video of the robot performing random actions to learn both its 3D morphology and control through the Neural Jacobian Field. The learned model can then be used to plan control commands for desired motions.


Neural Jacobian Field Data.

First, we sample random control commands to be executed on the robot. Using a setup of 12 RGB-D cameras, we record multi-view captures both before each command is executed and after each command has settled into a steady state, forming our dataset.

Neural Jacobian Field Training.

Using the above dataset, our method learns a mapping from a single RGB image to a neural scene representation. This scene representation combines a Radiance Field, which contains visual and geometric information, with a Jacobian Field, which contains the kinematics information of the scene. We refer to this combined representation as the Neural Jacobian Field.


Neural Jacobian Field.

The Neural Jacobian Field can be used to query the kinematics Jacobian at every coordinate in the 3D scene with respect to the robot's control commands. This capability allows us to visually identify the kinematics chain and plan a sequence of control commands for a desired motion using gradient-based optimization.


Results

With the Neural Jacobian Field, we can visualize the robot's geometry and kinematics inferred from a single image. The figure below shows the predicted depth and colorized kinematics chain as derived from the Neural Jacobian Field. The colorization is completely learned and not labeled by a human.


Our method can perform closed-loop control of diverse robots from vision, including soft robots that are traditionally hard to model, and a $220 3D-printed janky robot arm.

Quantitatively, our controller can effectively reduce the distance-to-goal for specified motion in the form of point movement, providing a viable method to control robots that are traditionally impossible to control, due to challenges in estimating their kinematic structures.

BibTeX


@misc{li2024unifying3drepresentationcontrol,
    title={Unifying 3D Representation and Control of Diverse Robots with a Single Camera}, 
    author={Sizhe Lester Li and Annan Zhang and Boyuan Chen and Hanna Matusik and Chao Liu and Daniela Rus and Vincent Sitzmann},
    year={2024},
    eprint={2407.08722},
    archivePrefix={arXiv},
    primaryClass={cs.RO},
    url={https://arxiv.org/abs/2407.08722}, 
}