Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Bridging the Gap Between Value and Policy Based Reinforcement Learning

Authors: Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans

NeurIPS 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental evaluation demonstrates that PCL signiﬁcantly outperforms strong actor-critic and Q-learning baselines across several benchmarks. and We evaluate the proposed algorithms, namely PCL & Uniﬁed PCL, across several different tasks and compare them to an A3C implementation, based on [21], and an implementation of double Q-learning with prioritized experience replay, based on [30].
Researcher Affiliation	Collaboration	Oﬁr Nachum1 Mohammad Norouzi Kelvin Xu1 Dale Schuurmans EMAIL, EMAIL Google Brain
Pseudocode	Yes	Pseudocode of PCL is provided in the Appendix.
Open Source Code	Yes	An implementation of PCL can be found at https://github.com/tensorflow/models/tree/master/research/pcl_rl
Open Datasets	No	The paper references several tasks/environments (e.g., Synthetic Tree, Reversed Addition) but does not provide concrete access information (links, citations) for publicly available datasets used for training.
Dataset Splits	No	The paper mentions training runs and hyperparameter tuning but does not provide specific train/validation/test dataset splits (percentages, counts, or explicit splitting methodology).
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU/CPU models or other system specifications.
Software Dependencies	No	The paper does not provide specific software dependency details, such as library names with version numbers, required to replicate the experiment.
Experiment Setup	No	The paper states that 'The details of the tasks and the experimental setup are provided in the Appendix' and 'After ﬁnding the best hyperparameters (see the Supplementary Material)', indicating that these details are not present in the main text of the paper.