reproducibilityindex.ai

Harnessing Structures for Value-Based Planning and Reinforcement Learning

Authors: Yuzhe Yang, Guo Zhang, Zhi Xu, Dina Katabi

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on control tasks and Atari games confirm the efficacy of our approach.
Researcher Affiliation	Academia	Yuzhe Yang , Guo Zhang , Zhi Xu , Dina Katabi Computer Science and Artiﬁcial Intelligence Lab Massachusetts Institute of Technology {yuzhe, guozhang, zhixu, dk}@mit.edu
Pseudocode	Yes	In Appendix A, we provide the pseudo-code and additionally, a short discussion on the technical difﬁculty for theoretical analysis.
Open Source Code	Yes	Code is available at: https://github.com/YyzHarry/SV-RL
Open Datasets	No	The paper mentions using
Dataset Splits	No	No specific percentages or counts for training/validation/test splits were found. The paper mentions
Hardware Specification	No	No specific hardware details (GPU/CPU models, memory, etc.) were mentioned for running the experiments.
Software Dependencies	No	The paper mentions using 'Adam optimizer (Kingma & Ba, 2014)' but does not specify its version or any other software dependencies with version numbers.
Experiment Setup	Yes	In all experiments, we set the hyper-parameters as follows: learning rate α = 1e-5, discount coefﬁcient γ = 0.99, and a minibatch size of 32. The number of steps between target network updates is set to 10,000. We use a simple exploration policy as the ϵ-greedy policy with the ϵ decreasing linearly from 1 to 0.01 over 3e5 steps.