reproducibilityindex.ai

Reinforcement Learning with Action-Free Pre-Training from Videos

Authors: Younggyo Seo, Kimin Lee, Stephen L James, Pieter Abbeel

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We designed our experiments to investigate the following: Can APV improve the sample-efﬁciency of vision-based RL in robotic manipulation tasks by performing actionfree pre-training on videos from different domains? Can representations pre-trained on videos from manipulation tasks transfer to locomotion tasks? How does APV compare to a na ıve ﬁne-tuning scheme? What is the contribution of each of the proposed techniques in APV? How does pre-trained representations qualitatively differ from the randomly initialized representations? How does APV perform when additional in-domain videos or real-world natural videos are available? Figure 5 shows the learning curves of APV pre-trained using the RLBench videos on six robotic manipulation tasks from Meta-world. We ﬁnd that APV consistently outperforms Dreamer V2 in terms of sample-efﬁciency in all considered tasks.
Researcher Affiliation	Collaboration	1KAIST 2Work done while visiting UC Berkeley 3UC Berkeley 4Now at Google Research. Correspondence to: Younggyo Seo <younggyo.seo@kaist.ac.kr>.
Pseudocode	No	The paper includes mathematical formulations and architectural diagrams (e.g., Figure 2 and 3) but does not provide any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https: //github.com/younggyoseo/apv.
Open Datasets	Yes	We ﬁrst evaluate APV on various vision-based robotic manipulation tasks from Metaworld (Yu et al., 2020). We use videos collected in robotic manipulation tasks from RLBench (James et al., 2020) as pre-training data. We also consider widely used robotic locomotion tasks from Deep Mind Control Suite (Tassa et al., 2020). Speciﬁcally, for real-world videos, we utilize Something-Something-V2 dataset (Goyal et al., 2017).
Dataset Splits	No	The paper discusses training and testing, and mentions 'validation' in the context of the Dreamer V2 behavior learning scheme and architectural components ('validation' RSSM), but does not explicitly provide details about a distinct validation dataset split (e.g., percentages, sample counts, or methodology for partitioning data for validation purposes) for reproducing the experiment.
Hardware Specification	Yes	We use a single Nvidia RTX3090 GPU and 10 CPU cores for each training run.
Software Dependencies	No	The paper states: 'We build our framework on top of the ofﬁcial implementation of Dreamer V2, which is based on Tensor Flow (Abadi et al., 2016).' However, it does not provide a specific version number for TensorFlow or any other key software dependencies required for replication.
Experiment Setup	Yes	For newly introduced hyperparameters, we use βz = 1.0 for pre-training, and βz = 0, β = 1.0 for ﬁne-tuning. We use τ = 5 for computing the intrinsic bonus. To make the scale of intrinsic bonus be 10% of extrinsic reward, we normalize the intrinsic reward and use λ = 0.1, 1.0 for manipulation and locomotion tasks, respectively. We ﬁnd that increasing the hidden size of dense layers and the model state dimension from 200 to 1024 improves the performance of both APV and Dreamer V2. We use T = 25, 50 for manipulation and locomotion tasks, respectively, during pre-training. Unless otherwise speciﬁed, we use the default hyperparameters of Dreamer V2.