Bridging the Gap Between Value and Policy Based Reinforcement Learning

Authors: Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental evaluation demonstrates that PCL significantly outperforms strong actor-critic and Q-learning baselines across several benchmarks. and We evaluate the proposed algorithms, namely PCL & Unified PCL, across several different tasks and compare them to an A3C implementation, based on [21], and an implementation of double Q-learning with prioritized experience replay, based on [30].
Researcher Affiliation Collaboration Ofir Nachum1 Mohammad Norouzi Kelvin Xu1 Dale Schuurmans {ofirnachum,mnorouzi,kelvinxx}@google.com, daes@ualberta.ca Google Brain
Pseudocode Yes Pseudocode of PCL is provided in the Appendix.
Open Source Code Yes An implementation of PCL can be found at https://github.com/tensorflow/models/tree/master/research/pcl_rl
Open Datasets No The paper references several tasks/environments (e.g., Synthetic Tree, Reversed Addition) but does not provide concrete access information (links, citations) for publicly available datasets used for training.
Dataset Splits No The paper mentions training runs and hyperparameter tuning but does not provide specific train/validation/test dataset splits (percentages, counts, or explicit splitting methodology).
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU/CPU models or other system specifications.
Software Dependencies No The paper does not provide specific software dependency details, such as library names with version numbers, required to replicate the experiment.
Experiment Setup No The paper states that 'The details of the tasks and the experimental setup are provided in the Appendix' and 'After finding the best hyperparameters (see the Supplementary Material)', indicating that these details are not present in the main text of the paper.