Skill Discovery for Exploration and Planning using Deep Skill Graphs

Authors: Akhil Bagaria, Jason K Senthil, George Konidaris

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our algorithm on four maze-navigation tasks in Mu Jo Co (Todorov et al., 2012), where it outperforms flat model-free RL (Andrychowicz et al., 2017), model-based RL (Nagabandi et al., 2018) and state-of-the-art skill-discovery algorithms (Levy et al., 2019; Sharma et al., 2020).
Researcher Affiliation Academia Akhil Bagaria 1 Jason Senthil 1 George Konidaris 1 1Department of Computer Science, Brown University, Providence, RI, USA.
Pseudocode No The paper describes the algorithm narratively and with figures, but it does not include a formal pseudocode block or algorithm listing.
Open Source Code Yes Video and code can be found on our website.
Open Datasets Yes We tested DSG on the continuous control tasks shown in Figure 6. These tasks are adapted from the Datasets for RL benchmark (Fu et al., 2020)2 and are challenging for non-hierarchical methods, which make little-to-no learning progress (Duan et al., 2016).
Dataset Splits No The paper describes the duration of the training phase in terms of episodes (e.g., 'unsupervised training phase (which lasts for 1000 episodes in Reacher and U-Maze, 1500 episodes in Medium Maze and 2000 episodes in the Large-Maze)') and the test setup involves generating random start-goal pairs and running trials. However, it does not specify a distinct validation dataset split or its size.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or memory specifications used for running experiments.
Software Dependencies No The paper mentions software components like MuJoCo, TD3, and MPC in the context of their use, but it does not specify exact version numbers for these or other software dependencies.
Experiment Setup Yes The agent discovers skills during an unsupervised training phase (which lasts for 1000 episodes in Reacher and U-Maze, 1500 episodes in Medium Maze and 2000 episodes in the Large-Maze). During this period, its start-state is fixed to be at the bottom-left of every maze. At test time, we generate 20 random start-goal pairs from the maze and record the average success rate (Andrychowicz et al., 2017) of the agent over 50 trials per start-goal pair (Sharma et al., 2020). All competing methods are tested on the same set of states. (...) We use the mazes from this suite, not the demonstration data. We use the dense reward version of these tasks, i.e, R(s, g) = ||s g||.