reproducibilityindex.ai

Context-dependent upper-confidence bounds for directed exploration

Authors: Raksha Kumaraswamy, Matthew Schlegel, Adam White, Martha White

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate that our algorithm can converge more quickly than other incremental exploration strategies using conﬁdence estimates on action-values. We demonstrate in several simulated domains that UCLS outperforms DGPQ, UCBootstrap, and RLSVI. Our experiments were intentionally conducted in small though carefully selected simulation domains so that we could conduct extensive parameter sweeps, hundreds of runs for averaging, and compare numerous state-of-the-art exploration algorithms.
Researcher Affiliation	Collaboration	Raksha Kumaraswamy1, Matthew Schlegel1, Adam White1,2, Martha White1 1Department of Computing Science, University of Alberta; 2Deep Mind
Pseudocode	Yes	The complete psuedocode for UCLS is given in the Appendix (Algorithm 2).
Open Source Code	No	The paper does not explicitly state that the source code for their methodology is released or provide a link to it.
Open Datasets	Yes	Sparse Mountain Car is a version of classic mountain car problem Sutton and Barto [40]... River Swim is a standard continuing exploration benchmark [42]...
Dataset Splits	No	The paper describes the environments and experimental budget (e.g., 50,000 steps, episode cutoff), but does not specify dataset splits (e.g., train/validation/test percentages or counts) as typically found in supervised learning tasks.
Hardware Specification	No	The paper mentions "Calcul Québec (www.calculquebec.ca) and Compute Canada (www.computecanada.ca) for the computing resources used in this work," but does not specify any particular hardware models (e.g., GPU/CPU types, memory).
Software Dependencies	No	The paper mentions DGPQ uses a kernel-based representation and refers to algorithms like Sarsa, but does not provide specific version numbers for any software libraries, frameworks, or languages used.
Experiment Setup	Yes	Our primary concern is early learning performance, thus each experiment is restricted to 50,000 steps, with an episode cutoff (in Sparse Mountain Car and Puddle World) at 10,000 steps. For all the algorithms that utilize eligibility traces we set λ to be 0.9. For algorithms which use exponential averaging, β is set to 0.001, and the regularizer is set to be 0.0001. The parameters for UCLS are ﬁxed. All the algorithms except DGPQ use the same representation: (1) Sparse Mountain Car 8 tilings of 8x8, hashed to a memory space of 512, (2) River Swim 4 tilings of granularity 32, hashed to a memory space of 128, and (3) Puddle World 5 tilings of granularity 5x5, hashed to a memory space of 128.