QUOTA: The Quantile Option Architecture for Reinforcement Learning

Authors: Shangtong Zhang, Hengshuai Yao5797-5804

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the performance advantage of QUOTA in both challenging video games and physical robot simulators.
Researcher Affiliation Collaboration Shangtong Zhang, 1 Hengshuai Yao 2 1 Department of Computing Science, University of Alberta 2 Reinforcement Learning for Autonomous Driving Lab, Noah s Ark Lab, Huawei shangtong.zhang@ualberta.ca, hengshuai.yao@huawei.com
Pseudocode Yes The pseudo code of QUOTA is provided in Supplementary Material.
Open Source Code Yes All the implementations are made publicly available. 1https://github.com/Shangtong Zhang/Deep RL
Open Datasets Yes We evaluated QUOTA in both Arcade Learning Environment (ALE) (Bellemare et al. 2013) and Roboschool 2
Dataset Splits No The paper mentions training steps and performing
Hardware Specification No The paper does not provide specific hardware details like GPU/CPU models used for running the experiments. It only mentions running experiments.
Software Dependencies No The paper mentions using RMSProp optimizer and Huber loss, but it does not specify version numbers for any software dependencies, libraries, or frameworks used (e.g., TensorFlow, PyTorch version).
Experiment Setup Yes We used 16 synchronous workers, and the rollout length is 5, resulting in a batch size 80. We trained each agent for 40M steps with frameskip 4, resulting in 160M frames in total. We used an RMSProp optimizer with an initial learning rate 10 4. The discount factor is 0.99. The ϵ for action selection was linearly decayed from 1.0 to 0.05 in the first 4M training steps and remained 0.05 afterwards. We used 200 quantiles to approximate the distribution and set the Huber loss parameter κ to 1. We used 10 options in QUOTA (M = 10) , and ϵΩwas linearly decayed from 1.0 to 0 during the 40M training steps. β was fixed at 0.01.