Rethinking Value Function Learning for Generalization in Reinforcement Learning
Authors: Seungyong Moon, JunYeong Lee, Hyun Oh Song
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our proposed algorithms significantly improve observational generalization performance and sample efficiency on the Procgen Benchmark. |
| Researcher Affiliation | Collaboration | 1Seoul National University, 2Neural Processing Research Center, 3Deep Metrics |
| Pseudocode | Yes | Algorithm 1 Dynamics-aware Delayed-Critic Policy Gradient (DDCPG) |
| Open Source Code | Yes | The code can be found at https://github.com/snu-mllab/DCPG. |
| Open Datasets | Yes | In this paper, we utilize the Procgen benchmark as a testbed for observational generalization [10]. |
| Dataset Splits | No | The paper describes training and testing on the Procgen benchmark but does not explicitly mention a validation dataset or split details for it. |
| Hardware Specification | No | The paper states: 'We describe the computational resource in the supplementary material.', implying these details are not in the main body of the paper. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers in the main text. |
| Experiment Setup | No | For the implementation details and hyperparameters, please refer to Appendix D. |