Option Discovery using Deep Skill Chaining
Authors: Akhil Bagaria, George Konidaris
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through a series of experiments on five challenging domains in the Mu Jo Co physics simulator (Todorov et al., 2012), we show that deep skill chaining can solve tasks that otherwise cannot be solved by nonhierarchical agents in a reasonable amount of time. Furthermore, the new algorithm outperforms state-of-the-art deep skill discovery algorithms (Bacon et al., 2017; Levy et al., 2019) in these tasks. |
| Researcher Affiliation | Academia | Akhil Bagaria Department of Computer Science Brown University Providence, RI, USA akhil bagaria@brown.edu George Konidaris Department of Computer Science Brown University Providence, RI, USA gdk@brown.edu |
| Pseudocode | Yes | Readers may also refer to Figures 4 & 7 and the pseudo-code in Appendix A.5 to gain greater intuition about our algorithm. |
| Open Source Code | Yes | 2Code: https://github.com/deep-skill-chaining/deep-skill-chaining |
| Open Datasets | Yes | We test our algorithm in five tasks that exhibit a strong hierarchical structure: (1) Point-Maze (Duan et al., 2016), (2) Four Rooms with Lock and Key, (3) Reacher (Brockman et al., 2016), (4) Point E-Maze and (5) Ant-Maze (Duan et al., 2016; Brockman et al., 2016). Since tasks 1, 3 and 5 appear frequently in the literature, details of their setup can be found in Appendix A.3. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits or refer to a standard validation split. It mentions 'test rollouts' but no explicit validation set or its proportion. |
| Hardware Specification | Yes | We used 1 NVIDIA Ge Force 2080 Ti, 2 NVIDIA Ge Force 2070 Ti and 2 Tesla K80s on the Google Cloud compute infrastructure to perform all experiments reported in this paper. |
| Software Dependencies | No | The paper mentions using 'scikit-learn' (Pedregosa et al., 2011) and the 'Mu Jo Co physics simulator' (Todorov et al., 2012), along with 'OpenAI Gym' (Brockman et al., 2016), but does not provide specific version numbers for these software dependencies in the text. |
| Experiment Setup | Yes | We divide the full set of hyperparameters that our algorithm depends on into two groups: those that are common to all algorithms that use DDPG (Table 2), and those that are specific to skill chaining (Table 3). ... Table 2: DDPG Hyperparameters (e.g., Replay buffer size 1e6, Batch size 64, γ 0.99, etc.). Table 3: Deep Skill Chaining Hyperparameters (e.g., Gestation Period (N) 5, Initiation Period 1, Buffer Length (K) 20, Option Max Time Steps (T) 100 for Point Maze, etc.). |