Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Adaptive Variance Inflation in Thompson Sampling: Efficiency, Safety, Robustness, and Beyond

Authors: Feng Zhu, David Simchi-Levi

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We complement theoretical results with numerical simulations, demonstrating that TS-VI is efficient, safe, and robust across diverse environments. To handle unknown arm-specific variances in practice, we incorporate Gamma-Normal Bayesian updates into our design and show that this amendment preserves the stability and effectiveness of the policy. We conduct numerical experiments on the 2-armed bandit case to illustrate the benefits brought by our policy. For each policy considered below, we collect 10^4 trajectories.
Researcher Affiliation Academia Feng Zhu Institute for Data, Systems, and Society Massachusetts Institute of Technology Cambridge, MA 02139 EMAIL. David Simchi-Levi Institute for Data, Systems, and Society Massachusetts Institute of Technology Cambridge, MA 02139 EMAIL.
Pseudocode Yes Algorithm 1 TS 1: Input: A = [K], σ2 0. 2: Pull each arm once. 3: for t = K + 1, do 4: For each arm k, draw a random sample Xt,k N ˆµt,k, 1 nt,k σ2 0 5: Take action at = arg maxk{Xt,k}. 6: Collect reward rt,at = µat + ϵt,at. 7: end for Algorithm 2 TS-VI 1: Input: A = [K], σ2 0. 2: Pull each arm once. 3: for t = K + 1, do 4: For each arm k, draw a random sample Xt,k N ˆµt,k, t/K nt,k σ2 0 5: Take action at = arg maxk{Xt,k}. 6: Collect reward rt,at = µat + ϵt,at. 7: end for
Open Source Code Yes Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: Code provided in supplementary material.
Open Datasets No The paper describes numerical experiments in simulated environments (Gaussian and Exponential with Laplace noises) by setting specific parameters like the mean vector µ = ( δ, δ) and noise variances σ^2. It does not use pre-existing public datasets. The text refers to custom-generated environments for simulations.
Dataset Splits No The paper describes numerical simulations for a multi-armed bandit problem. It mentions collecting '10^4 trajectories' for each policy, which refers to simulation runs rather than traditional dataset splits for training, validation, or testing. No explicit dataset split percentages or sample counts are provided as it's a simulated environment.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the numerical experiments. The NeurIPS checklist for this question also states 'NA'.
Software Dependencies No The paper does not explicitly state the specific software dependencies, libraries, or their version numbers used for implementing the algorithms or running the simulations. While the NeurIPS checklist includes a general guideline about programming languages, no specific versions of libraries are mentioned in the paper's text.
Experiment Setup Yes We conduct numerical experiments on the 2-armed bandit case to illustrate the benefits brought by our policy. The mean vector is fixed as µ = ( δ, δ) with δ = 0.3. For each policy considered below, we collect 10^4 trajectories. For both TS and TS-VI, we assume the prior is N(0, 10^3). We consider the standard TS (σ0 = σ), a slightly under-specified TS (σ0 = 0.9σ), and a slightly over-specified TS (σ0 = 1.1σ). For TS-VI, we consider σ0 = 0.3σ, 0.4σ, 0.5σ. We first consider Gaussian environments where the noise variances are correctly specified. We consider σ^2 = 2 for both arms. We then consider Exponential environments with Laplacian noises that is, the probability density function is (2b) 1 exp( |x|/b). We consider b = 1 for both arms. Finally, we consider Exponential environments with noises following unknown Laplace distributions. We consider b1 = 1, b2 = 2.