On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline

Authors: Nicklas Hansen, Zhecheng Yuan, Yanjie Ze, Tongzhou Mu, Aravind Rajeswaran, Hao Su, Huazhe Xu, Xiaolong Wang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach across a variety of task domains, algorithm classes, and evaluation metrics. Specifically, we examine 4 task domains (Adroit (Rajeswaran et al., 2018), DMControl (Tassa et al., 2018), Pix MC (Xiao et al., 2022), and a real robot setup), 3 algorithm classes: imitation learning (behavior cloning), on-policy RL (PPO (Schulman et al., 2017)), and off-policy RL (Dr Q-v2 (Yarats et al., 2021)), and multiple evaluation metrics including sample-efficiency, asymptotic performance, visual robustness, and computational cost.
Researcher Affiliation Collaboration 1University of California San Diego 2Tsinghua University 3Shanghai Jiao Tong University 4Meta AI 5Shanghai Qi Zhi Institute.
Pseudocode No The paper provides architectural definitions for neural networks using a code-like format (e.g., 'Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))'), but it does not include formal pseudocode blocks or sections explicitly labeled as 'Algorithm'.
Open Source Code Yes Code: https://github.com/gemcollector/learning-from-scratch. Our code is available at https://github.com/gemcollector/learning-from-scratch.
Open Datasets Yes These works train a visual representation using large out-of-domain vision datasets like Image Net (Russakovsky et al., 2015) and Ego4D (Grauman et al., 2022)... We consider two simulation domains Adroit and DMControl used in PVR... We reproduce the results of MVP on their proposed Pix MC robotic manipulation benchmark.
Dataset Splits No The paper mentions evaluating policies every two epochs over 100 epochs and reporting average performance over the 3 best epochs, but it does not explicitly define training/validation/test dataset splits for reproduction.
Hardware Specification No The paper describes the computational models and the robotic platforms used (e.g., '7-Do F x Arm 7 robot'), but it does not provide specific details about the hardware (e.g., GPU models, CPU types, or memory specifications) used to run the experiments.
Software Dependencies No The paper describes neural network architectures and references algorithms (e.g., PPO, Dr Q-v2) and libraries like Conv2d, BatchNorm2d, ReLU, Linear, Flatten, LayerNorm, Tanh (implying PyTorch), but it does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Importantly, we do not propose a new benchmark for pre-trained representations, but rather base our experiments on the public implementations of PVR, MVP, and Dr Q-v2, and meticulously follow their respective experimental setups. We make no changes to hyperparameters. This strict experimental setup ensures that pre-trained representations are evaluated in favorable conditions (for which they were originally proposed).