Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning
Authors: Shenao Zhang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results also validate the exploration efficiency of CDPO. (Abstract) and 6 Empirical Evaluation (Section 6) |
| Researcher Affiliation | Academia | Shenao Zhang Georgia Institute of Technology Atlanta, GA 30332 shenao@gatech.edu |
| Pseudocode | Yes | Algorithm 1 Practical CDPO Algorithm |
| Open Source Code | Yes | Our code can be found in the supplemental material. |
| Open Datasets | No | The paper uses standard RL environments (Mu Jo Co tasks, N-Chain MDPs) for experimentation, but does not provide concrete access information or citations for specific datasets in the way a supervised learning paper would. |
| Dataset Splits | No | The paper does not explicitly mention training, validation, or test dataset splits, nor does it provide specific percentages or sample counts for these splits in the provided text. |
| Hardware Specification | No | The provided text does not specify any hardware details such as GPU models, CPU types, or cloud computing instances used for running the experiments. |
| Software Dependencies | No | The paper mentions using Dyna and MPC solvers, neural network ensembles, and specific optimization methods like Adam (via citation), but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Implementation details and hyperparameters are provided in Appendix F.1. |