On the Role of Weight Sharing During Deep Option Learning
Authors: Matthew Riemer, Ignacio Cases, Clemens Rosenbaum, Miao Liu, Gerald Tesauro5519-5526
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments in challenging RL environments with high dimensional state spaces such as Atari demonstrate the benefits of OCPG and HOCPG when using typical strategies for weight sharing across option model components. |
| Researcher Affiliation | Collaboration | Matthew Riemer,1 Ignacio Cases,2 Clemens Rosenbaum,3 Miao Liu,1 Gerald Tesauro1 1IBM Research, Yorktown Heights, NY 2Linguistics Department and Stanford NLP Group, AI Lab, Stanford University 3College of Information and Computer Sciences, University of Massachusetts Amherst |
| Pseudocode | Yes | We implement option-critic policy gradients (OCPG) using the variant of A2OC outlined in algorithm 1 of the appendix and implement the hierarchical option-critic policy gradients (HOCPG) following algorithm 2 of the appendix. |
| Open Source Code | No | The paper does not provide explicit statements or links for its own open-source code. It mentions 'a popular Py Torch repository' but this refers to a third-party resource. |
| Open Datasets | Yes | We consider the Atari games (Bellemare et al. 2013). |
| Dataset Splits | No | The paper mentions running 'evaluation episodes' and 'one thread for evaluation' but does not specify explicit train/validation/test splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper describes the neural network architecture but does not provide specific details about the hardware used for experiments (e.g., GPU models, CPU types, or cloud instance specifications). |
| Software Dependencies | No | The paper mentions using 'Open AI Gym environments' and extending 'A3C from a popular Py Torch repository' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | All of our models use 8 options following past work, and a learning rate of 1e-4. Following (Harb et al. 2017) we run 16 parallel asynchronously updating threads for each game. [...] At the beginning we set η = 0.0 for 15 million steps before setting η = 0.01. Next, we set η = 0.1 at 30 million steps and η = 1.0 at 45 million steps. |