Option Discovery using Deep Skill Chaining

Authors: Akhil Bagaria, George Konidaris

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through a series of experiments on five challenging domains in the Mu Jo Co physics simulator (Todorov et al., 2012), we show that deep skill chaining can solve tasks that otherwise cannot be solved by nonhierarchical agents in a reasonable amount of time. Furthermore, the new algorithm outperforms state-of-the-art deep skill discovery algorithms (Bacon et al., 2017; Levy et al., 2019) in these tasks.
Researcher Affiliation Academia Akhil Bagaria Department of Computer Science Brown University Providence, RI, USA akhil bagaria@brown.edu George Konidaris Department of Computer Science Brown University Providence, RI, USA gdk@brown.edu
Pseudocode Yes Readers may also refer to Figures 4 & 7 and the pseudo-code in Appendix A.5 to gain greater intuition about our algorithm.
Open Source Code Yes 2Code: https://github.com/deep-skill-chaining/deep-skill-chaining
Open Datasets Yes We test our algorithm in five tasks that exhibit a strong hierarchical structure: (1) Point-Maze (Duan et al., 2016), (2) Four Rooms with Lock and Key, (3) Reacher (Brockman et al., 2016), (4) Point E-Maze and (5) Ant-Maze (Duan et al., 2016; Brockman et al., 2016). Since tasks 1, 3 and 5 appear frequently in the literature, details of their setup can be found in Appendix A.3.
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits or refer to a standard validation split. It mentions 'test rollouts' but no explicit validation set or its proportion.
Hardware Specification Yes We used 1 NVIDIA Ge Force 2080 Ti, 2 NVIDIA Ge Force 2070 Ti and 2 Tesla K80s on the Google Cloud compute infrastructure to perform all experiments reported in this paper.
Software Dependencies No The paper mentions using 'scikit-learn' (Pedregosa et al., 2011) and the 'Mu Jo Co physics simulator' (Todorov et al., 2012), along with 'OpenAI Gym' (Brockman et al., 2016), but does not provide specific version numbers for these software dependencies in the text.
Experiment Setup Yes We divide the full set of hyperparameters that our algorithm depends on into two groups: those that are common to all algorithms that use DDPG (Table 2), and those that are specific to skill chaining (Table 3). ... Table 2: DDPG Hyperparameters (e.g., Replay buffer size 1e6, Batch size 64, γ 0.99, etc.). Table 3: Deep Skill Chaining Hyperparameters (e.g., Gestation Period (N) 5, Initiation Period 1, Buffer Length (K) 20, Option Max Time Steps (T) 100 for Point Maze, etc.).