On the Role of Weight Sharing During Deep Option Learning

Authors: Matthew Riemer, Ignacio Cases, Clemens Rosenbaum, Miao Liu, Gerald Tesauro5519-5526

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments in challenging RL environments with high dimensional state spaces such as Atari demonstrate the benefits of OCPG and HOCPG when using typical strategies for weight sharing across option model components.
Researcher Affiliation Collaboration Matthew Riemer,1 Ignacio Cases,2 Clemens Rosenbaum,3 Miao Liu,1 Gerald Tesauro1 1IBM Research, Yorktown Heights, NY 2Linguistics Department and Stanford NLP Group, AI Lab, Stanford University 3College of Information and Computer Sciences, University of Massachusetts Amherst
Pseudocode Yes We implement option-critic policy gradients (OCPG) using the variant of A2OC outlined in algorithm 1 of the appendix and implement the hierarchical option-critic policy gradients (HOCPG) following algorithm 2 of the appendix.
Open Source Code No The paper does not provide explicit statements or links for its own open-source code. It mentions 'a popular Py Torch repository' but this refers to a third-party resource.
Open Datasets Yes We consider the Atari games (Bellemare et al. 2013).
Dataset Splits No The paper mentions running 'evaluation episodes' and 'one thread for evaluation' but does not specify explicit train/validation/test splits (e.g., percentages or sample counts).
Hardware Specification No The paper describes the neural network architecture but does not provide specific details about the hardware used for experiments (e.g., GPU models, CPU types, or cloud instance specifications).
Software Dependencies No The paper mentions using 'Open AI Gym environments' and extending 'A3C from a popular Py Torch repository' but does not provide specific version numbers for these software components.
Experiment Setup Yes All of our models use 8 options following past work, and a learning rate of 1e-4. Following (Harb et al. 2017) we run 16 parallel asynchronously updating threads for each game. [...] At the beginning we set η = 0.0 for 15 million steps before setting η = 0.01. Next, we set η = 0.1 at 30 million steps and η = 1.0 at 45 million steps.