reproducibilityindex.ai

On the Role of Weight Sharing During Deep Option Learning

Authors: Matthew Riemer, Ignacio Cases, Clemens Rosenbaum, Miao Liu, Gerald Tesauro5519-5526

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments in challenging RL environments with high dimensional state spaces such as Atari demonstrate the beneﬁts of OCPG and HOCPG when using typical strategies for weight sharing across option model components.
Researcher Affiliation	Collaboration	Matthew Riemer,1 Ignacio Cases,2 Clemens Rosenbaum,3 Miao Liu,1 Gerald Tesauro1 1IBM Research, Yorktown Heights, NY 2Linguistics Department and Stanford NLP Group, AI Lab, Stanford University 3College of Information and Computer Sciences, University of Massachusetts Amherst
Pseudocode	Yes	We implement option-critic policy gradients (OCPG) using the variant of A2OC outlined in algorithm 1 of the appendix and implement the hierarchical option-critic policy gradients (HOCPG) following algorithm 2 of the appendix.
Open Source Code	No	The paper does not provide explicit statements or links for its own open-source code. It mentions 'a popular Py Torch repository' but this refers to a third-party resource.
Open Datasets	Yes	We consider the Atari games (Bellemare et al. 2013).
Dataset Splits	No	The paper mentions running 'evaluation episodes' and 'one thread for evaluation' but does not specify explicit train/validation/test splits (e.g., percentages or sample counts).
Hardware Specification	No	The paper describes the neural network architecture but does not provide specific details about the hardware used for experiments (e.g., GPU models, CPU types, or cloud instance specifications).
Software Dependencies	No	The paper mentions using 'Open AI Gym environments' and extending 'A3C from a popular Py Torch repository' but does not provide specific version numbers for these software components.
Experiment Setup	Yes	All of our models use 8 options following past work, and a learning rate of 1e-4. Following (Harb et al. 2017) we run 16 parallel asynchronously updating threads for each game. [...] At the beginning we set η = 0.0 for 15 million steps before setting η = 0.01. Next, we set η = 0.1 at 30 million steps and η = 1.0 at 45 million steps.