Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Hill Climbing on Value Estimates for Search-control in Dyna
Authors: Yangchen Pan, Hengshuai Yao, Amir-massoud Farahmand, Martha White
IJCAI 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide an empirical demonstration on four classical domains that our algorithm, HC-Dyna, can obtain significant sample efficiency improvements. We conduct experiments showing improved performance in four benchmark domains. |
| Researcher Affiliation | Collaboration | Yangchen Pan1 , Hengshuai Yao2 , Amir-massoud Farahmand3,4 and Martha White1 1Department of Computing Science, University of Alberta, Canada 2Huawei Hi Silicon, Canada 3Vector Institute, Canada 4Department of Computer Science, University of Toronto, Canada EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 HC-Dyna |
| Open Source Code | No | The paper does not provide any explicit statement about making its source code publicly available or provide a link to a code repository. |
| Open Datasets | Yes | In this section, we present empirical results on four classic domains: the Grid World (Figure 1(a)), Mountain Car, Cart Pole and Acrobot. We test on a simplified Tabular Grid World domain of size 20 × 20. |
| Dataset Splits | No | The paper discusses training and evaluating models within reinforcement learning environments, but it does not specify explicit train/validation/test dataset splits in the conventional sense of data partitioning for supervised learning. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to conduct the experiments (e.g., GPU/CPU models, memory specifications). |
| Software Dependencies | No | The paper mentions using DQN and DDPG algorithms and a two-layer NN, but does not specify any software libraries (e.g., TensorFlow, PyTorch) or their version numbers. |
| Experiment Setup | Yes | Input:budget k for the number of gradient ascent steps (e.g., k=100), stochasticity η for gradient ascent (e.g., η=0.1), ρ percentage of updates from SC queue (e.g., ρ=0.5), d the number of state variables, i.e. S Rd. The agents all use a two-layer NN, with ReLU activations and 32 nodes in each layer. We set the step size to α=0.1/||ˆΣsg|| across all results in this work. In all further experiments in this paper, we set ρ=0.5. The continuous-state setting uses NNs ... with a minibatch size of 32. For the tabular setting, the mini-batch size is 1. We further include multiple planning steps n, where for each real environment step, the agent does n updates with a mini-batch of size 32. |