Better Exploration with Optimistic Actor Critic
Authors: Kamil Ciosek, Quan Vuong, Robert Loftin, Katja Hofmann
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we evaluate Optimistic Actor Critic in several challenging continuous control tasks and achieve state-of-the-art sample efficiency. |
| Researcher Affiliation | Collaboration | Kamil Ciosek Microsoft Research Cambridge, UK kamil.ciosek@microsoft.com Quan Vuong University of California San Diego qvuong@ucsd.edu Robert Loftin Microsoft Research Cambridge, UK t-roloft@microsoft.com Katja Hofmann Microsoft Research Cambridge, UK katja.hofmann@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Optimistic Actor-Critic (OAC). |
| Open Source Code | No | The paper does not provide any statements about making code open source or providing links to a repository for the described methodology. |
| Open Datasets | Yes | We test OAC on the Mu Jo Co [45] continuous control benchmarks. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) needed to reproduce the data partitioning for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions Tensorflow [1] in the references, but does not specify a version number or list other software dependencies with version numbers needed for replication. |
| Experiment Setup | Yes | OAC uses 3 hyper-parameters related to exploration. The parameters βUB and βLB control the amount of uncertainty used to compute the upper and lower bound respectively. The parameter δ controls the maximal allowed divergence between the exploration policy and the target policy. We provide the values of all hyper-parameters and details of the hyper-parameter tuning in Appendix D. |