Bridging the Gap Between Value and Policy Based Reinforcement Learning
Authors: Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental evaluation demonstrates that PCL significantly outperforms strong actor-critic and Q-learning baselines across several benchmarks. and We evaluate the proposed algorithms, namely PCL & Unified PCL, across several different tasks and compare them to an A3C implementation, based on [21], and an implementation of double Q-learning with prioritized experience replay, based on [30]. |
| Researcher Affiliation | Collaboration | Ofir Nachum1 Mohammad Norouzi Kelvin Xu1 Dale Schuurmans {ofirnachum,mnorouzi,kelvinxx}@google.com, daes@ualberta.ca Google Brain |
| Pseudocode | Yes | Pseudocode of PCL is provided in the Appendix. |
| Open Source Code | Yes | An implementation of PCL can be found at https://github.com/tensorflow/models/tree/master/research/pcl_rl |
| Open Datasets | No | The paper references several tasks/environments (e.g., Synthetic Tree, Reversed Addition) but does not provide concrete access information (links, citations) for publicly available datasets used for training. |
| Dataset Splits | No | The paper mentions training runs and hyperparameter tuning but does not provide specific train/validation/test dataset splits (percentages, counts, or explicit splitting methodology). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU/CPU models or other system specifications. |
| Software Dependencies | No | The paper does not provide specific software dependency details, such as library names with version numbers, required to replicate the experiment. |
| Experiment Setup | No | The paper states that 'The details of the tasks and the experimental setup are provided in the Appendix' and 'After finding the best hyperparameters (see the Supplementary Material)', indicating that these details are not present in the main text of the paper. |