reproducibilityindex.ai

Frequency-based Search-control in Dyna

Authors: Yangchen Pan, Jincheng Mei, Amir-massoud Farahmand

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show that a high frequency function is more difﬁcult to approximate. This suggests a search-control strategy: we should use states from high frequency regions of the value function to query the model to acquire more samples. We develop a simple strategy to locally measure the frequency of a function by gradient and hessian norms, and provide theoretical justiﬁcation for this approach. We then apply our strategy to search-control in Dyna, and conduct experiments to show its property and effectiveness on benchmark domains.
Researcher Affiliation	Academia	Yangchen Pan & Jincheng Mei Department of Computing Science University of Alberta Edmonton, AB, Canada {pan6,jmei2}@ualberta.ca Amir-massoud Farahmand Vector Institute & University of Toronto Toronto, ON, Canada farahmand@vectorinstitute.ai
Pseudocode	Yes	Algorithm 1 Dyna architecture with Frequency-based search-control ... Algorithm 4 Dyna architecture with Frequency-based search-control with additional details
Open Source Code	No	The paper does not provide a statement about releasing its source code or a link to a code repository for the methodology described.
Open Datasets	Yes	The Mountain Car (Brockman et al., 2016) domain is well-studied... Then we illustrate the utility of our algorithm on a challenging self-designed Maze Grid World domain... Hopper-v2 and Walker2d-v2 from Mujoco (Todorov et al., 2012)
Dataset Splits	No	The paper does not specify training, validation, and test dataset splits with percentages, sample counts, or references to predefined splits for its main reinforcement learning experiments.
Hardware Specification	No	The paper does not specify the hardware used for running experiments, such as particular GPU or CPU models.
Software Dependencies	Yes	All of our implementations are based on tensorﬂow with version 1.13.0 (Abadi et al., 2015). For DQN update, we use Adam optimizer (Kingma & Ba, 2014).
Experiment Setup	Yes	For DQN update, we use Adam optimizer (Kingma & Ba, 2014). We use mini-batch size b = 32 except on the supervised learning experiment where we use 128. For reinforcement learning experiment, we use buffer size 100k. All activation functions are tanh except the output layer of the Q-value is linear. Except the output layer parameters which were initialized from a uniform distribution [ 0.003, 0.003], all other parameters are initialized using Xavier initialization (Glorot & Bengio, 2010). For model learning, we use a 64 × 64 relu units neural network to predict s'−s given a state-action pair with mini-batch size 128 and learning rate 0.0001.