On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline
Authors: Nicklas Hansen, Zhecheng Yuan, Yanjie Ze, Tongzhou Mu, Aravind Rajeswaran, Hao Su, Huazhe Xu, Xiaolong Wang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach across a variety of task domains, algorithm classes, and evaluation metrics. Specifically, we examine 4 task domains (Adroit (Rajeswaran et al., 2018), DMControl (Tassa et al., 2018), Pix MC (Xiao et al., 2022), and a real robot setup), 3 algorithm classes: imitation learning (behavior cloning), on-policy RL (PPO (Schulman et al., 2017)), and off-policy RL (Dr Q-v2 (Yarats et al., 2021)), and multiple evaluation metrics including sample-efficiency, asymptotic performance, visual robustness, and computational cost. |
| Researcher Affiliation | Collaboration | 1University of California San Diego 2Tsinghua University 3Shanghai Jiao Tong University 4Meta AI 5Shanghai Qi Zhi Institute. |
| Pseudocode | No | The paper provides architectural definitions for neural networks using a code-like format (e.g., 'Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))'), but it does not include formal pseudocode blocks or sections explicitly labeled as 'Algorithm'. |
| Open Source Code | Yes | Code: https://github.com/gemcollector/learning-from-scratch. Our code is available at https://github.com/gemcollector/learning-from-scratch. |
| Open Datasets | Yes | These works train a visual representation using large out-of-domain vision datasets like Image Net (Russakovsky et al., 2015) and Ego4D (Grauman et al., 2022)... We consider two simulation domains Adroit and DMControl used in PVR... We reproduce the results of MVP on their proposed Pix MC robotic manipulation benchmark. |
| Dataset Splits | No | The paper mentions evaluating policies every two epochs over 100 epochs and reporting average performance over the 3 best epochs, but it does not explicitly define training/validation/test dataset splits for reproduction. |
| Hardware Specification | No | The paper describes the computational models and the robotic platforms used (e.g., '7-Do F x Arm 7 robot'), but it does not provide specific details about the hardware (e.g., GPU models, CPU types, or memory specifications) used to run the experiments. |
| Software Dependencies | No | The paper describes neural network architectures and references algorithms (e.g., PPO, Dr Q-v2) and libraries like Conv2d, BatchNorm2d, ReLU, Linear, Flatten, LayerNorm, Tanh (implying PyTorch), but it does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Importantly, we do not propose a new benchmark for pre-trained representations, but rather base our experiments on the public implementations of PVR, MVP, and Dr Q-v2, and meticulously follow their respective experimental setups. We make no changes to hyperparameters. This strict experimental setup ensures that pre-trained representations are evaluated in favorable conditions (for which they were originally proposed). |