reproducibilityindex.ai

Better Exploration with Optimistic Actor Critic

Authors: Kamil Ciosek, Quan Vuong, Robert Loftin, Katja Hofmann

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we evaluate Optimistic Actor Critic in several challenging continuous control tasks and achieve state-of-the-art sample efﬁciency.
Researcher Affiliation	Collaboration	Kamil Ciosek Microsoft Research Cambridge, UK kamil.ciosek@microsoft.com Quan Vuong University of California San Diego qvuong@ucsd.edu Robert Loftin Microsoft Research Cambridge, UK t-roloft@microsoft.com Katja Hofmann Microsoft Research Cambridge, UK katja.hofmann@microsoft.com
Pseudocode	Yes	Algorithm 1 Optimistic Actor-Critic (OAC).
Open Source Code	No	The paper does not provide any statements about making code open source or providing links to a repository for the described methodology.
Open Datasets	Yes	We test OAC on the Mu Jo Co [45] continuous control benchmarks.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) needed to reproduce the data partitioning for training, validation, or testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions Tensorﬂow [1] in the references, but does not specify a version number or list other software dependencies with version numbers needed for replication.
Experiment Setup	Yes	OAC uses 3 hyper-parameters related to exploration. The parameters βUB and βLB control the amount of uncertainty used to compute the upper and lower bound respectively. The parameter δ controls the maximal allowed divergence between the exploration policy and the target policy. We provide the values of all hyper-parameters and details of the hyper-parameter tuning in Appendix D.