Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Harnessing Structures for Value-Based Planning and Reinforcement Learning

Authors: Yuzhe Yang, Guo Zhang, Zhi Xu, Dina Katabi

ICLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on control tasks and Atari games confirm the efficacy of our approach.
Researcher Affiliation Academia Yuzhe Yang , Guo Zhang , Zhi Xu , Dina Katabi Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology EMAIL
Pseudocode Yes In Appendix A, we provide the pseudo-code and additionally, a short discussion on the technical difficulty for theoretical analysis.
Open Source Code Yes Code is available at: https://github.com/YyzHarry/SV-RL
Open Datasets No The paper mentions using
Dataset Splits No No specific percentages or counts for training/validation/test splits were found. The paper mentions
Hardware Specification No No specific hardware details (GPU/CPU models, memory, etc.) were mentioned for running the experiments.
Software Dependencies No The paper mentions using 'Adam optimizer (Kingma & Ba, 2014)' but does not specify its version or any other software dependencies with version numbers.
Experiment Setup Yes In all experiments, we set the hyper-parameters as follows: learning rate α = 1e-5, discount coefficient γ = 0.99, and a minibatch size of 32. The number of steps between target network updates is set to 10,000. We use a simple exploration policy as the ϵ-greedy policy with the ϵ decreasing linearly from 1 to 0.01 over 3e5 steps.