Harnessing Structures for Value-Based Planning and Reinforcement Learning
Authors: Yuzhe Yang, Guo Zhang, Zhi Xu, Dina Katabi
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on control tasks and Atari games confirm the efficacy of our approach. |
| Researcher Affiliation | Academia | Yuzhe Yang , Guo Zhang , Zhi Xu , Dina Katabi Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology {yuzhe, guozhang, zhixu, dk}@mit.edu |
| Pseudocode | Yes | In Appendix A, we provide the pseudo-code and additionally, a short discussion on the technical difficulty for theoretical analysis. |
| Open Source Code | Yes | Code is available at: https://github.com/YyzHarry/SV-RL |
| Open Datasets | No | The paper mentions using |
| Dataset Splits | No | No specific percentages or counts for training/validation/test splits were found. The paper mentions |
| Hardware Specification | No | No specific hardware details (GPU/CPU models, memory, etc.) were mentioned for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer (Kingma & Ba, 2014)' but does not specify its version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | In all experiments, we set the hyper-parameters as follows: learning rate α = 1e-5, discount coefficient γ = 0.99, and a minibatch size of 32. The number of steps between target network updates is set to 10,000. We use a simple exploration policy as the ϵ-greedy policy with the ϵ decreasing linearly from 1 to 0.01 over 3e5 steps. |