Outcome-directed Reinforcement Learning by Uncertainty \& Temporal Distance-Aware Curriculum Goal Generation
Authors: Daesol Cho, Seungjae Lee, H. Jin Kim
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that our algorithm significantly outperforms these prior methods in a variety of challenging navigation tasks and robotic manipulation tasks in a quantitative and qualitative way.1...5 EXPERIMENTS We include 6 environments to validate our proposed method...5.1 EXPERIMENTAL RESULTS...5.2 ABLATION STUDY |
| Researcher Affiliation | Academia | Daesol Cho , Seungjae Lee , H. Jin Kim Seoul National University, Automation and Systems Research Institute (ASRI), Artificial Intelligence Institute of Seoul National University (AIIS) {dscho1234, ysz0301, hjinkim}@snu.ac.kr |
| Pseudocode | Yes | The overall training process is summarized in Algorithm 1 in Appendix B. Algorithm 2 Meta-NML (Li et al., 2021) |
| Open Source Code | Yes | 1Code is available : https://github.com/jayLEE0301/outpace_official |
| Open Datasets | Yes | We referred to the metaworld (Yu et al., 2020) and EARL (Sharma et al., 2021) environments. |
| Dataset Splits | No | The paper provides 'Meta-learner train sample size' and 'Meta-learner test sample size' for a component of the method, but it does not specify explicit training/validation/test splits (e.g., percentages or counts) for the overall dataset used in the RL experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'PyTorch (Paszke et al., 2019)' but does not specify a version number for PyTorch or any other software dependencies with their versions. |
| Experiment Setup | Yes | Table 2: Hyperparameters for OUTPACE...Table 3: Env-specific hyperparameters for OUTPACE |