reproducibilityindex.ai

Skill-based Meta-Reinforcement Learning

Authors: Taewook Nam, Shao-Hua Sun, Karl Pertsch, Sung Ju Hwang, Joseph J Lim

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on continuous control tasks in navigation and manipulation demonstrate that the proposed method can efﬁciently solve longhorizon novel target tasks by combining the strengths of meta-learning and the usage of ofﬂine datasets, while prior approaches in RL, meta-RL, and multi-task RL require substantially more environment interactions to solve the tasks.
Researcher Affiliation	Collaboration	Korea Advanced Institute of Science and Technology1 University of Southern California2, AITRICS3, Naver AI Lab4
Pseudocode	No	The paper describes its method in detail with text and diagrams (e.g., Figure 2 'Method Overview'), but it does not include formal pseudocode blocks or sections explicitly labeled 'Algorithm' or 'Pseudocode'.
Open Source Code	Yes	Project page: https://namsan96.github.io/Si MPL
Open Datasets	Yes	Following Fu et al. (2020) we collect a taskagnostic ofﬂine dataset by randomly sampling start-goal locations in the maze and using a planner to generate a trajectory that reaches from start to goal. We leverage a dataset of 600 human-teleoperated manipulation sequences of Gupta et al. (2019) for ofﬂine pre-training.
Dataset Splits	Yes	To generate a set of meta-training and target tasks, we ﬁx the agent s initial position in the center of the maze and sample 40 random goal locations for meta-training and another set of 10 goals for target tasks.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for running experiments, such as GPU models (e.g., NVIDIA A100, RTX 2080 Ti), CPU models, or detailed server/cluster configurations. It generally refers to 'deep reinforcement learning methods' implying computational resources but without specific details.
Software Dependencies	No	The paper mentions using the 'Adam optimizer' (Kingma & Ba, 2015) but does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages (e.g., Python 3.x).
Experiment Setup	Yes	We employed 4-layer MLPs with 256 hidden units for Maze Navigation, and 6-layer MLPs with 128 hidden unit for Kitchen Manipulation experiment. For all the network updates, we used Adam optimizer (Kingma & Ba, 2015) with a learning rate of 3e 4, β1 = 0.9, and β2 = 0.999. We train our models for 10000, 18000, and 16000 episodes for the Maze Navigation experiments with 10, 20, 40 meta-training tasks, respectively, and 3450 episodes for Kitchen Manipulation.