Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Infrequent Exploration in Linear Bandits

Authors: Harin Lee, Min-hwan Oh

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations confirm our theoretical findings, showing state-of-the-art regret performance and runtime improvements over existing methods. Empirical results, provided in Section 5, substantiate our theoretical findings by demonstrating that, for suitable exploration schedules, INFEX outperforms both purely greedy and fully exploratory baselines in cumulative regret and computational efficiency. Figure 1 shows the total regret and computation time of each algorithm.
Researcher Affiliation Academia Harin Lee University of Washington Seattle, WA, USA EMAIL Min-hwan Oh Seoul National University Seoul, South Korea EMAIL
Pseudocode Yes The pseudocode describing the procedure is provided in Algorithm 1.
Open Source Code Yes Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We submit experiment code as supplemental material, which reproduces the experiment results.
Open Datasets No We randomly generate problem instances for given d and K as follows. We construct the arm set X by sampling K arms i.i.d. from a multivariate Gaussian distribution N(0d, 1/2d Id) and rescaling each vector to have a norm at most 1 when it exceeds 1. We sample θ uniformly from the unit sphere in Rd.
Dataset Splits No We randomly generate problem instances for given d and K as follows. We construct the arm set X by sampling K arms i.i.d. from a multivariate Gaussian distribution N(0d, 1/2d Id) and rescaling each vector to have a norm at most 1 when it exceeds 1. We repeat the process for 20 randomly generated instances and report the mean and standard deviation of the cumulative regret over T = 10000 time steps for each algorithm.
Hardware Specification Yes The experiments are conducted on a computing cluster with twenty Intel(R) Xeon(R) Silver 4214R CPUs, and three of them are used for the experiments.
Software Dependencies No The paper does not specify any software names with version numbers used for implementation, such as Python, PyTorch, or specific libraries.
Experiment Setup Yes We select Alg = Lin UCB and Alg = Lin TS as the base algorithms for exploration and use an exploration schedule Te = m N := {mn : n N}, meaning Alg executes every m steps. Specifically, we examine three choices of m: m = 5, m = 20, and m = 100, corresponding to 80%, 95%, and 99% greedy selections, respectively. For benchmarking, we also compare our framework against other policies: the purely greedy policy, a single-parameter version of OLSBandit [Goldenshluger and Zeevi, 2013], and an ε-greedy approach with εt = t^-1/3. We randomly generate problem instances for given d and K as follows. ... We repeat the process for 20 randomly generated instances and report the mean and standard deviation of the cumulative regret over T = 10000 time steps for each algorithm. All hyperparameters of the algorithms are set to their theoretical values. Both Lin UCB and Lin TS require the confidence radius βt. We explicitly compute the value of log det Vt / det V0 using rank-one update [Abbasi-Yadkori et al., 2011] instead of using its upper bound d log T, so that the base algorithms achieve the regret bounds of Theorem 5 in Abbasi-Yadkori et al. [2011] and Theorem 2.