Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Better Exploration with Optimistic Actor Critic
Authors: Kamil Ciosek, Quan Vuong, Robert Loftin, Katja Hofmann
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we evaluate Optimistic Actor Critic in several challenging continuous control tasks and achieve state-of-the-art sample efficiency. |
| Researcher Affiliation | Collaboration | Kamil Ciosek Microsoft Research Cambridge, UK EMAIL Quan Vuong University of California San Diego EMAIL Robert Loftin Microsoft Research Cambridge, UK EMAIL Katja Hofmann Microsoft Research Cambridge, UK EMAIL |
| Pseudocode | Yes | Algorithm 1 Optimistic Actor-Critic (OAC). |
| Open Source Code | No | The paper does not provide any statements about making code open source or providing links to a repository for the described methodology. |
| Open Datasets | Yes | We test OAC on the Mu Jo Co [45] continuous control benchmarks. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) needed to reproduce the data partitioning for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions Tensorflow [1] in the references, but does not specify a version number or list other software dependencies with version numbers needed for replication. |
| Experiment Setup | Yes | OAC uses 3 hyper-parameters related to exploration. The parameters βUB and βLB control the amount of uncertainty used to compute the upper and lower bound respectively. The parameter δ controls the maximal allowed divergence between the exploration policy and the target policy. We provide the values of all hyper-parameters and details of the hyper-parameter tuning in Appendix D. |