Parameterizing Non-Parametric Meta-Reinforcement Learning Tasks via Subtask Decomposition
Authors: Suyoung Lee, Myungsik Cho, Youngchul Sung
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on the Meta-World ML-10 and ML-45 benchmarks [71], widely used meta-RL benchmarks comprising diverse non-parametric robotic manipulation tasks. We empirically demonstrate that our method successfully meta-learns the shareable subtask decomposition. With the help of the subtask decomposition and virtual training, our method, without any offline demonstration or test-time gradient updates, achieves test success rates of 33.4% on ML-10 and 31.2% on ML-45, which improves the previous state-of-the-art by approximately 1.7 times and 1.3 times, respectively. |
| Researcher Affiliation | Academia | Suyoung Lee, Myungsik Cho, Youngchul Sung School of Electrical Engineering KAIST Daejeon 34141, Republic of Korea {suyoung.l, ms.cho, ycsung}@kaist.ac.kr |
| Pseudocode | Yes | A Pseudocode Algorithm 1 Subtask Decomposition and Virtual Training (SDVT) |
| Open Source Code | Yes | Our implementation is available at https://github.com/suyoung-lee/SDVT. |
| Open Datasets | Yes | Meta-World benchmark The Meta-World V2 benchmark [71] stands as the most prominent, if not the only, established benchmark for assessing meta-RL algorithms featuring non-parametric task variability. [71] T. Yu, D. Quillen, Z. He, R. Julian, A. Narayan, H. Shively, A. Bellathur, K. Hausman, C. Finn, and S. Levine. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. ar Xiv preprint ar Xiv:1910.10897, 2021. |
| Dataset Splits | No | The paper describes meta-training and meta-testing tasks, and the structure of meta-episodes, but it does not specify explicit dataset splits (e.g., percentages or counts for training, validation, and test data subsets) within a given task that would be required for reproduction. |
| Hardware Specification | Yes | Our experiments were conducted using an Nvidia TITAN Xp. |
| Software Dependencies | No | The paper mentions software like 'Garage repository [15]' and algorithms like 'PPO [49]' for implementation, and specifies using an 'exact version' of the Garage repository (referencing a pull request URL), but it does not provide explicit version numbers for general software dependencies such as Python, PyTorch, or TensorFlow, nor a specific numbered release for the Garage framework itself. |
| Experiment Setup | Yes | Table 3: Hyperparameters of SDVT and SD. Hyperparameters of SDVT used for Meta-World ML-10 and ML-45 along with the notations in the manuscript and the argument names in the source code. |