reproducibilityindex.ai

QUOTA: The Quantile Option Architecture for Reinforcement Learning

Authors: Shangtong Zhang, Hengshuai Yao5797-5804

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the performance advantage of QUOTA in both challenging video games and physical robot simulators.
Researcher Affiliation	Collaboration	Shangtong Zhang, 1 Hengshuai Yao 2 1 Department of Computing Science, University of Alberta 2 Reinforcement Learning for Autonomous Driving Lab, Noah s Ark Lab, Huawei shangtong.zhang@ualberta.ca, hengshuai.yao@huawei.com
Pseudocode	Yes	The pseudo code of QUOTA is provided in Supplementary Material.
Open Source Code	Yes	All the implementations are made publicly available. 1https://github.com/Shangtong Zhang/Deep RL
Open Datasets	Yes	We evaluated QUOTA in both Arcade Learning Environment (ALE) (Bellemare et al. 2013) and Roboschool 2
Dataset Splits	No	The paper mentions training steps and performing
Hardware Specification	No	The paper does not provide specific hardware details like GPU/CPU models used for running the experiments. It only mentions running experiments.
Software Dependencies	No	The paper mentions using RMSProp optimizer and Huber loss, but it does not specify version numbers for any software dependencies, libraries, or frameworks used (e.g., TensorFlow, PyTorch version).
Experiment Setup	Yes	We used 16 synchronous workers, and the rollout length is 5, resulting in a batch size 80. We trained each agent for 40M steps with frameskip 4, resulting in 160M frames in total. We used an RMSProp optimizer with an initial learning rate 10 4. The discount factor is 0.99. The ϵ for action selection was linearly decayed from 1.0 to 0.05 in the ﬁrst 4M training steps and remained 0.05 afterwards. We used 200 quantiles to approximate the distribution and set the Huber loss parameter κ to 1. We used 10 options in QUOTA (M = 10) , and ϵΩwas linearly decayed from 1.0 to 0 during the 40M training steps. β was ﬁxed at 0.01.