Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers

Authors: Ruihan Yang, Minghao Zhang, Nicklas Hansen, Huazhe Xu, Xiaolong Wang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method in challenging simulated environments with different obstacles and uneven terrain. We transfer our learned policy from simulation to a real robot by running it indoors and in the wild with unseen obstacles and terrain.
Researcher Affiliation Academia Ruihan Yang UC San Diego Minghao Zhang Tsinghua University Nicklas Hansen UC San Diego Huazhe Xu UC Berkeley Xiaolong Wang UC San Diego
Pseudocode No The paper describes the model architecture and methods in text and figures (Figure 2) but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Our project page with videos is at https://rchalyang.github.io/Loco Transformer/. We have released the code, environment and videos on our project page: https://rchalyang. github.io/Loco Transformer/.
Open Datasets No The paper states 'We evaluate our method in simulation and the real world. In the simulation, we simulate a quadruped robot in a set of challenging and diverse environments.' It describes custom-designed simulated environments and real-world robot deployment rather than explicitly using or providing access to a publicly available dataset in the common sense (e.g., ImageNet, COCO).
Dataset Splits No The paper describes training policies for a given number of samples and evaluating them, but it does not specify explicit training/validation/test dataset splits in terms of percentages or sample counts, which is typical for fixed datasets in supervised learning.
Hardware Specification No The paper mentions using a 'Unitree A1 Robot' and an 'Intel Real Sense camera' for real-world experiments, and states 'All computations are running with on-board resources.' However, it does not specify the exact hardware specifications (e.g., GPU models, CPU models, memory) used for training the models.
Software Dependencies No The paper mentions using PPO (Schulman et al., 2017) and certain network components like ReLU, but it does not list specific software dependencies with their version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Implementation Details. For the proprioceptive encoder and the projection head, we use a 2-layer MLP with hidden dimensions (256, 256). Our visual encoder encode visual inputs into 4 4 spatial feature maps with 128 channels, following the architecture in Mnih et al. (2015b). Our shared Transformer consists of 2 Transformer encoder layers, each with a hidden feature dimension of 256. (...) Hyperparameter Value Horizon 1000 Non-linearity Re LU Policy initialization Standard Gaussian # of samples per iteration 8192 Discount factor .99 Batch size 256 Optimization epochs 3 Clip parameter 0.2 Policy network learning rate 1e-4 Value network learning rate 1e-4 Optimizer Adam