Harnessing Structures for Value-Based Planning and Reinforcement Learning

Authors: Yuzhe Yang, Guo Zhang, Zhi Xu, Dina Katabi

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on control tasks and Atari games confirm the efficacy of our approach.
Researcher Affiliation Academia Yuzhe Yang , Guo Zhang , Zhi Xu , Dina Katabi Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology {yuzhe, guozhang, zhixu, dk}@mit.edu
Pseudocode Yes In Appendix A, we provide the pseudo-code and additionally, a short discussion on the technical difficulty for theoretical analysis.
Open Source Code Yes Code is available at: https://github.com/YyzHarry/SV-RL
Open Datasets No The paper mentions using
Dataset Splits No No specific percentages or counts for training/validation/test splits were found. The paper mentions
Hardware Specification No No specific hardware details (GPU/CPU models, memory, etc.) were mentioned for running the experiments.
Software Dependencies No The paper mentions using 'Adam optimizer (Kingma & Ba, 2014)' but does not specify its version or any other software dependencies with version numbers.
Experiment Setup Yes In all experiments, we set the hyper-parameters as follows: learning rate α = 1e-5, discount coefficient γ = 0.99, and a minibatch size of 32. The number of steps between target network updates is set to 10,000. We use a simple exploration policy as the ϵ-greedy policy with the ϵ decreasing linearly from 1 to 0.01 over 3e5 steps.