reproducibilityindex.ai

Option Discovery using Deep Skill Chaining

Authors: Akhil Bagaria, George Konidaris

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through a series of experiments on ﬁve challenging domains in the Mu Jo Co physics simulator (Todorov et al., 2012), we show that deep skill chaining can solve tasks that otherwise cannot be solved by nonhierarchical agents in a reasonable amount of time. Furthermore, the new algorithm outperforms state-of-the-art deep skill discovery algorithms (Bacon et al., 2017; Levy et al., 2019) in these tasks.
Researcher Affiliation	Academia	Akhil Bagaria Department of Computer Science Brown University Providence, RI, USA akhil bagaria@brown.edu George Konidaris Department of Computer Science Brown University Providence, RI, USA gdk@brown.edu
Pseudocode	Yes	Readers may also refer to Figures 4 & 7 and the pseudo-code in Appendix A.5 to gain greater intuition about our algorithm.
Open Source Code	Yes	2Code: https://github.com/deep-skill-chaining/deep-skill-chaining
Open Datasets	Yes	We test our algorithm in ﬁve tasks that exhibit a strong hierarchical structure: (1) Point-Maze (Duan et al., 2016), (2) Four Rooms with Lock and Key, (3) Reacher (Brockman et al., 2016), (4) Point E-Maze and (5) Ant-Maze (Duan et al., 2016; Brockman et al., 2016). Since tasks 1, 3 and 5 appear frequently in the literature, details of their setup can be found in Appendix A.3.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits or refer to a standard validation split. It mentions 'test rollouts' but no explicit validation set or its proportion.
Hardware Specification	Yes	We used 1 NVIDIA Ge Force 2080 Ti, 2 NVIDIA Ge Force 2070 Ti and 2 Tesla K80s on the Google Cloud compute infrastructure to perform all experiments reported in this paper.
Software Dependencies	No	The paper mentions using 'scikit-learn' (Pedregosa et al., 2011) and the 'Mu Jo Co physics simulator' (Todorov et al., 2012), along with 'OpenAI Gym' (Brockman et al., 2016), but does not provide specific version numbers for these software dependencies in the text.
Experiment Setup	Yes	We divide the full set of hyperparameters that our algorithm depends on into two groups: those that are common to all algorithms that use DDPG (Table 2), and those that are speciﬁc to skill chaining (Table 3). ... Table 2: DDPG Hyperparameters (e.g., Replay buffer size 1e6, Batch size 64, γ 0.99, etc.). Table 3: Deep Skill Chaining Hyperparameters (e.g., Gestation Period (N) 5, Initiation Period 1, Buffer Length (K) 20, Option Max Time Steps (T) 100 for Point Maze, etc.).