Ensemble Sampling

Authors: Xiuyuan Lu, Benjamin Van Roy

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We establish a theoretical basis that supports the approach and present computational results that offer further insight. In this section, we present computational results that demonstrate viability of ensemble sampling.
Researcher Affiliation Academia Xiuyuan Lu Stanford University lxy@stanford.edu Benjamin Van Roy Stanford University bvr@stanford.edu
Pseudocode Yes Algorithm 1 Ensemble Sampling 1: Sample: θ0,1,...,θ0,M p0 2: for t = 0,...,T 1 do 3: Sample: m unif({1,...,M}) 4: Act: At = arg max a A Rˆθt(a) 5: Observe: Yt+1 6: Update: θt+1,1,...,θt+1,M 7: end for
Open Source Code No The paper does not provide an explicit statement or link for the availability of its source code.
Open Datasets No The paper uses synthetic data generated according to specified distributions and parameters (e.g., Gaussian bandits and neural networks with specific noise and prior distributions) rather than relying on named, publicly available datasets with explicit access information.
Dataset Splits No The paper does not explicitly provide specific details about train/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) needed for reproducibility.
Hardware Specification No The paper does not specify any particular hardware components (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9').
Experiment Setup Yes We set the input dimension N = 100, number of actions K = 100, prior variance λ = 10, and noise variance σ2 z = 100. We used N = 100 for the input dimension, D = 50 for the dimension of the hidden layer, number of actions K = 100, prior variance λ = 1, and noise variance σ2 z = 100. We took 3 stochastic gradient steps with a minibatch size of 64 for each model update. We used a learning rate of 1e-1 for ϵ-greedy and ensemble sampling, and a learning rate of 1e-2, 1e-2, 2e-2, and 5e-2 for dropout with dropping probabilities 0.25, 0.5, 0.75, and 0.9 respectively.