Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Skill-based Meta-Reinforcement Learning

Authors: Taewook Nam, Shao-Hua Sun, Karl Pertsch, Sung Ju Hwang, Joseph J Lim

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on continuous control tasks in navigation and manipulation demonstrate that the proposed method can efficiently solve longhorizon novel target tasks by combining the strengths of meta-learning and the usage of offline datasets, while prior approaches in RL, meta-RL, and multi-task RL require substantially more environment interactions to solve the tasks.
Researcher Affiliation Collaboration Korea Advanced Institute of Science and Technology1 University of Southern California2, AITRICS3, Naver AI Lab4
Pseudocode No The paper describes its method in detail with text and diagrams (e.g., Figure 2 'Method Overview'), but it does not include formal pseudocode blocks or sections explicitly labeled 'Algorithm' or 'Pseudocode'.
Open Source Code Yes Project page: https://namsan96.github.io/Si MPL
Open Datasets Yes Following Fu et al. (2020) we collect a taskagnostic offline dataset by randomly sampling start-goal locations in the maze and using a planner to generate a trajectory that reaches from start to goal. We leverage a dataset of 600 human-teleoperated manipulation sequences of Gupta et al. (2019) for offline pre-training.
Dataset Splits Yes To generate a set of meta-training and target tasks, we fix the agent s initial position in the center of the maze and sample 40 random goal locations for meta-training and another set of 10 goals for target tasks.
Hardware Specification No The paper does not explicitly describe the specific hardware used for running experiments, such as GPU models (e.g., NVIDIA A100, RTX 2080 Ti), CPU models, or detailed server/cluster configurations. It generally refers to 'deep reinforcement learning methods' implying computational resources but without specific details.
Software Dependencies No The paper mentions using the 'Adam optimizer' (Kingma & Ba, 2015) but does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages (e.g., Python 3.x).
Experiment Setup Yes We employed 4-layer MLPs with 256 hidden units for Maze Navigation, and 6-layer MLPs with 128 hidden unit for Kitchen Manipulation experiment. For all the network updates, we used Adam optimizer (Kingma & Ba, 2015) with a learning rate of 3e 4, β1 = 0.9, and β2 = 0.999. We train our models for 10000, 18000, and 16000 episodes for the Maze Navigation experiments with 10, 20, 40 meta-training tasks, respectively, and 3450 episodes for Kitchen Manipulation.