More research

Hierarchical World Models as
Visual Whole-Body Humanoid Controllers

Nicklas Hansen1,  Jyothir S V2,  Vlad Sobal2,  Yann LeCun2,3, 
Xiaolong Wang1*,  Hao Su1*

UC San Diego, NYU, Meta AI
*Equal advising

Visual whole-body control for humanoids. We present Puppeteer, a hierarchical world model for whole-body humanoid control with visual observations. Our method produces natural and human-like motions without any reward design or skill primitives, and traverses challenging terrain.

Abstract

Whole-body control for humanoids is challenging due to the high-dimensional nature of the problem, coupled with the inherent instability of a bipedal morphology. Learning from visual observations further exacerbates this difficulty. In this work, we explore highly data-driven approaches to visual whole-body humanoid control based on reinforcement learning, without any simplifying assumptions, reward design, or skill primitives. Specifically, we propose a hierarchical world model in which a high-level agent generates commands based on visual observations for a low-level agent to execute, both of which are trained with rewards. Our approach produces highly performant control policies in 8 tasks with a simulated 56-DoF humanoid, while synthesizing motions that are broadly preferred by humans.

Qualitative results

Our method produces natural and human-like motions across a variety of visual whole-body humanoid control tasks. A strong baseline, TD-MPC2, achieves comparable performance in terms of reward, but produces unnatural behaviors.

Ours
TD-MPC2
Ours
TD-MPC2

Zero-shot generalization

We evaluate agents trained on gap lengths of 0.1m to 0.4m on unseen gap lengths of up to 1.2m. Below videos are generated by a single Puppeteer agent. Our method achieves non-trivial performance across all unseen gap lengths.

Benchmarking

Our method learns highly performant RL policies across 8 visual whole-body control tasks, while producing natural and human-like motions that are broadly preferred by humans. SAC and DreamerV3 do not achieve meaningful performance on these tasks. TD-MPC2 achieves comparable performance in terms of reward, but produces unnatural behaviors.

Human preference in humanoid motions

Aggregate results from a user study (n=46) where participants are presented with pairs of motions generated by TD-MPC2 and our method, and are asked to provide their preference.

Paper

Hierarchical World Models as Visual Whole-Body Humanoid Controllers
Nicklas Hansen, Jyothir S V, Vlad Sobal, Yann LeCun, Xiaolong Wang, Hao Su

arXiv preprint

View on arXiv

Citation

If you find our work useful, please consider citing the paper as follows:

@misc{hansen2024hierarchical, title={Hierarchical World Models as Visual Whole-Body Humanoid Controllers}, author={Nicklas Hansen, Jyothir S V, Vlad Sobal, Yann LeCun, Xiaolong Wang, Hao Su}, eprint={2405.18418}, archivePrefix={arXiv}, primaryClass={cs.LG}, year={2024} }