Imitating Graph-Based Planning with Goal-Conditioned Policies
Authors: Junsu Kim, Younggyo Seo, Sungsoo Ahn, Kyunghwan Son, Jinwoo Shin
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods under various long-horizon control tasks.1...In this section, we design our experiments to answer the following questions: Can PIG improve the sample-efficiency on long-horizon continuous control tasks over baselines (Figure 4)? Can a policy trained by PIG perform well even without a planner at the test time (Figure 5)? How does PIG compare to another self-imitation strategy (Figure 6)? Is the subgoal skipping effective for sample-efficiency (Figure 7)? How does the balancing coefficient λ affect performance (Figure 8)? |
| Researcher Affiliation | Academia | Junsu Kim1, Younggyo Seo1, Sungsoo Ahn2, Kyunghwan Son1, Jinwoo Shin1 1 Korea Advanced Institute of Science and Technology (KAIST) 2 Pohang University of Science and Technology (POSTECH) |
| Pseudocode | Yes | We provide the overall pipeline in Algorithm 1 in Supplemental material A, colored as black....We describe our subgoal skipping procedure in Algorithm 2 of Supplemental material A. We provide algorithm tables that represent PIG in Algorithm 1 and 2. |
| Open Source Code | Yes | 1Code is available at https://github.com/junsu-kim97/PIG...REPRODUCIBILITY STATEMENT We provide the implementation details of our method in Section 5 and Supplemental material D. We also open-source our codebase. |
| Open Datasets | No | The paper uses environments from the Mu Jo Co simulator (Todorov et al., 2012) such as 2DReach, Reacher, Pusher, and Ant Maze environments. These are simulated environments for generating data, not pre-existing, publicly available datasets with specific download links or DOIs. |
| Dataset Splits | No | The paper mentions training and testing procedures but does not explicitly describe a separate validation dataset split or its purpose. |
| Hardware Specification | Yes | All of the experiments were processed using a single GPU (NVIDIA TITAN Xp) and 8 CPU cores (Intel Xeon E5-2630 v4). |
| Software Dependencies | No | The paper mentions the use of the Adam optimizer and the Mu Jo Co simulator, but it does not specify version numbers for any software libraries, frameworks, or the simulator itself. |
| Experiment Setup | Yes | We list hyperparameters used for PIG across all environments in Table 1 and 2. For the baselines, we used the best hyperparameters reported in their source codes for shared environments: 2DReach of MSS and HER, Reacher and Pusher for HIGL, and Ant Mazes for MSS, L3P, HER, and HIGL (all). |