Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Skill Discovery for Exploration and Planning using Deep Skill Graphs
Authors: Akhil Bagaria, Jason K Senthil, George Konidaris
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our algorithm on four maze-navigation tasks in Mu Jo Co (Todorov et al., 2012), where it outperforms ο¬at model-free RL (Andrychowicz et al., 2017), model-based RL (Nagabandi et al., 2018) and state-of-the-art skill-discovery algorithms (Levy et al., 2019; Sharma et al., 2020). |
| Researcher Affiliation | Academia | Akhil Bagaria 1 Jason Senthil 1 George Konidaris 1 1Department of Computer Science, Brown University, Providence, RI, USA. |
| Pseudocode | No | The paper describes the algorithm narratively and with figures, but it does not include a formal pseudocode block or algorithm listing. |
| Open Source Code | Yes | Video and code can be found on our website. |
| Open Datasets | Yes | We tested DSG on the continuous control tasks shown in Figure 6. These tasks are adapted from the Datasets for RL benchmark (Fu et al., 2020)2 and are challenging for non-hierarchical methods, which make little-to-no learning progress (Duan et al., 2016). |
| Dataset Splits | No | The paper describes the duration of the training phase in terms of episodes (e.g., 'unsupervised training phase (which lasts for 1000 episodes in Reacher and U-Maze, 1500 episodes in Medium Maze and 2000 episodes in the Large-Maze)') and the test setup involves generating random start-goal pairs and running trials. However, it does not specify a distinct validation dataset split or its size. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or memory specifications used for running experiments. |
| Software Dependencies | No | The paper mentions software components like MuJoCo, TD3, and MPC in the context of their use, but it does not specify exact version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The agent discovers skills during an unsupervised training phase (which lasts for 1000 episodes in Reacher and U-Maze, 1500 episodes in Medium Maze and 2000 episodes in the Large-Maze). During this period, its start-state is ο¬xed to be at the bottom-left of every maze. At test time, we generate 20 random start-goal pairs from the maze and record the average success rate (Andrychowicz et al., 2017) of the agent over 50 trials per start-goal pair (Sharma et al., 2020). All competing methods are tested on the same set of states. (...) We use the mazes from this suite, not the demonstration data. We use the dense reward version of these tasks, i.e, R(s, g) = ||s g||. |