Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Variance-Aware Feel-Good Thompson Sampling for Contextual Bandits

Authors: Xuheng Li, Quanquan Gu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we examine our algorithm, FGTS-VA, against baselines (including Weighted OFUL+, FGTS, and SAVE) in experiments with synthetic data. The code can be found at https://github. com/xuheng-li99/FGTS-VA. ... Figure 1: Comparison of different algorithms. Error bands are plotted over 100 runs.
Researcher Affiliation Academia Xuheng Li Department of Computer Science University of California, Los Angeles California, 90095 EMAIL Quanquan Gu Department of Computer Science University of California, Los Angeles California, 90095 EMAIL
Pseudocode Yes Algorithm 1 FGTS-VA 1: Given hyperparameter α and γ. Initialize S0 = . 2: for t = 1 to T do 3: Receive context xt. 4: Set parameters {ηs}s [t 1] and λt according to (4.2). 5: Sample ft pt( |St 1), with the posterior distribution pt(f|St 1) defined in (4.1). 6: Select at = argmaxa At ft(xt, a). 7: Observe reward rt; update St = St 1 {(xt, at, rt)}. 8: end for
Open Source Code Yes The code can be found at https://github. com/xuheng-li99/FGTS-VA.
Open Datasets No We focus on the setting of linear bandits with d = 5 and X = {x}, so we omit the context x for simplicity. The action set is At = A = { 1/ d}d, and the ground truth parameter θ is sampled from the uniform distribution on the unit sphere. We consider two noise models with heterogeneous noise magnitudes. In both cases, the noise ϵt is sampled from N(0, σ2 t ).
Dataset Splits No The paper uses synthetic data and runs experiments for
Hardware Specification No The paper does not provide specific hardware details for the experimental runs. The NeurIPS checklist mentions: "The experiments are runnable using a personal laptop within minutes." however this is not a specific hardware specification.
Software Dependencies No The paper describes algorithmic details like "Langevin dynamics" and "SGLD steps" but does not specify any software libraries or packages with version numbers used for implementation. The NeurIPS checklist notes code is on GitHub, but specific dependencies are not mentioned in the paper text.
Experiment Setup Yes In this section, we examine our algorithm, FGTS-VA, against baselines (including Weighted OFUL+, FGTS, and SAVE) in experiments with synthetic data. ... Implementation details. For FGTS-VA, in the linear bandit setting, we let the prior distribution be the Gaussian distribution N(0, Id/d). We use Langevin dynamics to sample from this distribution: ... We use K = 20 SGLD steps in our experiments, and initialize θ(0) t+1 = θ(K) t . ... We first compare FGTS-VA with c = 0.003 against Weighted OFUL+ (Zhou and Gu, 2022), SAVE (Zhao et al., 2023), and FGTS (Zhang, 2022) with results in Figure 1. ... We then perform ablation studies of the algorithm with different choices of c. It is worth noting that c is the only tunable parameter of FGTS-VA, and c = eΘ(1) for linear bandits according to Theorem 5.4. The results are shown in Figure 2. For the case of sparse noise, we observe the advantage of choosing c bounded away from 0, i.e., advantage of the feel-good exploration. For the case of dense noise, the optimal choice of c is close to 0.