Our model requires only multi-view video of the robot performing random actions to learn both its 3D morphology and control through the Neural Jacobian Field. The learned model can then be used to plan control commands for desired motions.
First, we sample random control commands to be executed on the robot. Using a setup of 12 RGB-D cameras, we record multi-view captures both before each command is executed and after each command has settled into a steady state, forming our dataset.
Using the above dataset, our method learns a mapping from a single RGB image to a neural scene representation. This scene representation combines a Radiance Field, which contains visual and geometric information, with a Jacobian Field, which contains the kinematics information of the scene. We refer to this combined representation as the Neural Jacobian Field.
The Neural Jacobian Field can be used to query the kinematics Jacobian at every coordinate in the 3D scene with respect to the robot's control commands. This capability allows us to visually identify the kinematics chain and plan a sequence of control commands for a desired motion using gradient-based optimization.
With the Neural Jacobian Field, we can visualize the robot's geometry and kinematics inferred from a single image. The figure below shows the predicted depth and colorized kinematics chain as derived from the Neural Jacobian Field. The colorization is completely learned and not labeled by a human.
Our method can perform closed-loop control of diverse robots from vision, including soft robots that are traditionally hard to model, and a $220 3D-printed janky robot arm.
Quantitatively, our controller can effectively reduce the distance-to-goal for specified motion in the form of point movement, providing a viable method to control robots that are traditionally impossible to control, due to challenges in estimating their kinematic structures.
@misc{li2024unifying3drepresentationcontrol,
title={Unifying 3D Representation and Control of Diverse Robots with a Single Camera},
author={Sizhe Lester Li and Annan Zhang and Boyuan Chen and Hanna Matusik and Chao Liu and Daniela Rus and Vincent Sitzmann},
year={2024},
eprint={2407.08722},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2407.08722},
}