reproducibilityindex.ai

Generalization and Exploration via Randomized Value Functions

Authors: Ian Osband, Benjamin Van Roy, Zheng Wen

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We will present computational results comparing RLSVI to LSVI with action-dithering schemes... The results demonstrate that RLSVI enjoys dramatic efﬁciency gains. Further, we establish a bound on the expected regret for an episodic tabula rasa learning context...
Researcher Affiliation	Collaboration	Ian Osband1,2 IOSBAND@STANFORD.EDU Benjamin Van Roy1 BVR@STANFORD.EDU Zheng Wen1,3 ZWEN@ADOBE.COM 1Stanford University, 2Google Deepmind, 3Adobe Research
Pseudocode	Yes	Algorithm 1 Randomized Least-Squares Value Iteration and Algorithm 2 RLSVI with greedy action
Open Source Code	No	The paper does not contain an explicit statement about the release of source code or a link to a code repository for the described methodology.
Open Datasets	No	The paper describes custom or simulated environments for its experiments (didactic chain environments, Tetris game, recommendation engine model) and does not provide concrete access information (links, DOIs, formal citations) for a publicly available or open dataset.
Dataset Splits	No	The paper does not provide specific details regarding train, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits) for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library names with version numbers, or specific solver versions, needed to replicate the experiments.
Experiment Setup	Yes	Figure 2 presents the empirical regret for RLSVI with K=10,N = 50,σ=0.1,λ=1 and an -greedy agent over 5 seeds. ... In Figure 7 we present learning curves for RLSVI λ=1,σ=1 ... For our simulations we set βa = 0 8a and sample a random problem instance by sampling γan N(0, c2) independently for each a and n. ... We set N = 10, H = J = 5, c = 2 and L = 1200. ... The cumulative regret for both RLSVI (with λ = 0.2 and σ2 = 10 3) and LSVI with Boltzmann exploration (with λ = 0.2 and a variety of temperature settings ) are plotted in Figure 8.