Skill-based Meta-Reinforcement Learning

Authors: Taewook Nam, Shao-Hua Sun, Karl Pertsch, Sung Ju Hwang, Joseph J Lim

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on continuous control tasks in navigation and manipulation demonstrate that the proposed method can efficiently solve longhorizon novel target tasks by combining the strengths of meta-learning and the usage of offline datasets, while prior approaches in RL, meta-RL, and multi-task RL require substantially more environment interactions to solve the tasks.
Researcher Affiliation Collaboration Korea Advanced Institute of Science and Technology1 University of Southern California2, AITRICS3, Naver AI Lab4
Pseudocode No The paper describes its method in detail with text and diagrams (e.g., Figure 2 'Method Overview'), but it does not include formal pseudocode blocks or sections explicitly labeled 'Algorithm' or 'Pseudocode'.
Open Source Code Yes Project page: https://namsan96.github.io/Si MPL
Open Datasets Yes Following Fu et al. (2020) we collect a taskagnostic offline dataset by randomly sampling start-goal locations in the maze and using a planner to generate a trajectory that reaches from start to goal. We leverage a dataset of 600 human-teleoperated manipulation sequences of Gupta et al. (2019) for offline pre-training.
Dataset Splits Yes To generate a set of meta-training and target tasks, we fix the agent s initial position in the center of the maze and sample 40 random goal locations for meta-training and another set of 10 goals for target tasks.
Hardware Specification No The paper does not explicitly describe the specific hardware used for running experiments, such as GPU models (e.g., NVIDIA A100, RTX 2080 Ti), CPU models, or detailed server/cluster configurations. It generally refers to 'deep reinforcement learning methods' implying computational resources but without specific details.
Software Dependencies No The paper mentions using the 'Adam optimizer' (Kingma & Ba, 2015) but does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages (e.g., Python 3.x).
Experiment Setup Yes We employed 4-layer MLPs with 256 hidden units for Maze Navigation, and 6-layer MLPs with 128 hidden unit for Kitchen Manipulation experiment. For all the network updates, we used Adam optimizer (Kingma & Ba, 2015) with a learning rate of 3e 4, β1 = 0.9, and β2 = 0.999. We train our models for 10000, 18000, and 16000 episodes for the Maze Navigation experiments with 10, 20, 40 meta-training tasks, respectively, and 3450 episodes for Kitchen Manipulation.