Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers
Authors: Ruihan Yang, Minghao Zhang, Nicklas Hansen, Huazhe Xu, Xiaolong Wang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method in challenging simulated environments with different obstacles and uneven terrain. We transfer our learned policy from simulation to a real robot by running it indoors and in the wild with unseen obstacles and terrain. |
| Researcher Affiliation | Academia | Ruihan Yang UC San Diego Minghao Zhang Tsinghua University Nicklas Hansen UC San Diego Huazhe Xu UC Berkeley Xiaolong Wang UC San Diego |
| Pseudocode | No | The paper describes the model architecture and methods in text and figures (Figure 2) but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our project page with videos is at https://rchalyang.github.io/Loco Transformer/. We have released the code, environment and videos on our project page: https://rchalyang. github.io/Loco Transformer/. |
| Open Datasets | No | The paper states 'We evaluate our method in simulation and the real world. In the simulation, we simulate a quadruped robot in a set of challenging and diverse environments.' It describes custom-designed simulated environments and real-world robot deployment rather than explicitly using or providing access to a publicly available dataset in the common sense (e.g., ImageNet, COCO). |
| Dataset Splits | No | The paper describes training policies for a given number of samples and evaluating them, but it does not specify explicit training/validation/test dataset splits in terms of percentages or sample counts, which is typical for fixed datasets in supervised learning. |
| Hardware Specification | No | The paper mentions using a 'Unitree A1 Robot' and an 'Intel Real Sense camera' for real-world experiments, and states 'All computations are running with on-board resources.' However, it does not specify the exact hardware specifications (e.g., GPU models, CPU models, memory) used for training the models. |
| Software Dependencies | No | The paper mentions using PPO (Schulman et al., 2017) and certain network components like ReLU, but it does not list specific software dependencies with their version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Implementation Details. For the proprioceptive encoder and the projection head, we use a 2-layer MLP with hidden dimensions (256, 256). Our visual encoder encode visual inputs into 4 4 spatial feature maps with 128 channels, following the architecture in Mnih et al. (2015b). Our shared Transformer consists of 2 Transformer encoder layers, each with a hidden feature dimension of 256. (...) Hyperparameter Value Horizon 1000 Non-linearity Re LU Policy initialization Standard Gaussian # of samples per iteration 8192 Discount factor .99 Batch size 256 Optimization epochs 3 Clip parameter 0.2 Policy network learning rate 1e-4 Value network learning rate 1e-4 Optimizer Adam |