Parameterizing Non-Parametric Meta-Reinforcement Learning Tasks via Subtask Decomposition

Authors: Suyoung Lee, Myungsik Cho, Youngchul Sung

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on the Meta-World ML-10 and ML-45 benchmarks [71], widely used meta-RL benchmarks comprising diverse non-parametric robotic manipulation tasks. We empirically demonstrate that our method successfully meta-learns the shareable subtask decomposition. With the help of the subtask decomposition and virtual training, our method, without any offline demonstration or test-time gradient updates, achieves test success rates of 33.4% on ML-10 and 31.2% on ML-45, which improves the previous state-of-the-art by approximately 1.7 times and 1.3 times, respectively.
Researcher Affiliation Academia Suyoung Lee, Myungsik Cho, Youngchul Sung School of Electrical Engineering KAIST Daejeon 34141, Republic of Korea {suyoung.l, ms.cho, ycsung}@kaist.ac.kr
Pseudocode Yes A Pseudocode Algorithm 1 Subtask Decomposition and Virtual Training (SDVT)
Open Source Code Yes Our implementation is available at https://github.com/suyoung-lee/SDVT.
Open Datasets Yes Meta-World benchmark The Meta-World V2 benchmark [71] stands as the most prominent, if not the only, established benchmark for assessing meta-RL algorithms featuring non-parametric task variability. [71] T. Yu, D. Quillen, Z. He, R. Julian, A. Narayan, H. Shively, A. Bellathur, K. Hausman, C. Finn, and S. Levine. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. ar Xiv preprint ar Xiv:1910.10897, 2021.
Dataset Splits No The paper describes meta-training and meta-testing tasks, and the structure of meta-episodes, but it does not specify explicit dataset splits (e.g., percentages or counts for training, validation, and test data subsets) within a given task that would be required for reproduction.
Hardware Specification Yes Our experiments were conducted using an Nvidia TITAN Xp.
Software Dependencies No The paper mentions software like 'Garage repository [15]' and algorithms like 'PPO [49]' for implementation, and specifies using an 'exact version' of the Garage repository (referencing a pull request URL), but it does not provide explicit version numbers for general software dependencies such as Python, PyTorch, or TensorFlow, nor a specific numbered release for the Garage framework itself.
Experiment Setup Yes Table 3: Hyperparameters of SDVT and SD. Hyperparameters of SDVT used for Meta-World ML-10 and ML-45 along with the notations in the manuscript and the argument names in the source code.