Skill-based Meta-Reinforcement Learning
Authors: Taewook Nam, Shao-Hua Sun, Karl Pertsch, Sung Ju Hwang, Joseph J Lim
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on continuous control tasks in navigation and manipulation demonstrate that the proposed method can efficiently solve longhorizon novel target tasks by combining the strengths of meta-learning and the usage of offline datasets, while prior approaches in RL, meta-RL, and multi-task RL require substantially more environment interactions to solve the tasks. |
| Researcher Affiliation | Collaboration | Korea Advanced Institute of Science and Technology1 University of Southern California2, AITRICS3, Naver AI Lab4 |
| Pseudocode | No | The paper describes its method in detail with text and diagrams (e.g., Figure 2 'Method Overview'), but it does not include formal pseudocode blocks or sections explicitly labeled 'Algorithm' or 'Pseudocode'. |
| Open Source Code | Yes | Project page: https://namsan96.github.io/Si MPL |
| Open Datasets | Yes | Following Fu et al. (2020) we collect a taskagnostic offline dataset by randomly sampling start-goal locations in the maze and using a planner to generate a trajectory that reaches from start to goal. We leverage a dataset of 600 human-teleoperated manipulation sequences of Gupta et al. (2019) for offline pre-training. |
| Dataset Splits | Yes | To generate a set of meta-training and target tasks, we fix the agent s initial position in the center of the maze and sample 40 random goal locations for meta-training and another set of 10 goals for target tasks. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for running experiments, such as GPU models (e.g., NVIDIA A100, RTX 2080 Ti), CPU models, or detailed server/cluster configurations. It generally refers to 'deep reinforcement learning methods' implying computational resources but without specific details. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer' (Kingma & Ba, 2015) but does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages (e.g., Python 3.x). |
| Experiment Setup | Yes | We employed 4-layer MLPs with 256 hidden units for Maze Navigation, and 6-layer MLPs with 128 hidden unit for Kitchen Manipulation experiment. For all the network updates, we used Adam optimizer (Kingma & Ba, 2015) with a learning rate of 3e 4, β1 = 0.9, and β2 = 0.999. We train our models for 10000, 18000, and 16000 episodes for the Maze Navigation experiments with 10, 20, 40 meta-training tasks, respectively, and 3450 episodes for Kitchen Manipulation. |