Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Oracle-Efficient Combinatorial Semi-Bandits

Authors: Jung-hun Kim, Milan Vojnovic, Min-hwan Oh

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6 Experiments We compare our algorithms to benchmarks in terms of oracle efficiency and regret using synthetic datasets3. We begin with the linear reward setting, where the mean vector is sampled from Unif[0, 1] with d = 20 and m = 3, and stochastic rewards are uniformly generated around these means at each round. As shown in Figure 2 (a,b), our algorithms (AROQ-CMAB, SROQ-CMAB) achieve significantly lower oracle adaptivity and query complexities than CUCB [5], consistent with Theorems 1 and 2. Importantly, as shown in Figure 2 (d), our algorithms achieve faster runtime than the benchmark.
Researcher Affiliation Academia Jung-hun Kim CREST, ENSAE, IP Paris Fair Play joint team, France EMAIL Milan Vojnovi c London School of Economics United Kingdom EMAIL Min-hwan Oh Seoul National University South Korea EMAIL
Pseudocode Yes Algorithm 1 Adaptive Rare Oracle Queries for Combinatorial MAB (AROQ-CMAB) Initialize: τi = 1 for all i [d] for t = 1, 2..., T do
Open Source Code Yes Source Code: https://github.com/junghunkim7786/Oracle Efficient Combinatorial Bandits
Open Datasets No We compare our algorithms to benchmarks in terms of oracle efficiency and regret using synthetic datasets3. We begin with the linear reward setting, where the mean vector is sampled from Unif[0, 1] with d = 20 and m = 3, and stochastic rewards are uniformly generated around these means at each round.
Dataset Splits No The paper uses synthetic datasets which are generated on the fly; it does not mention pre-existing datasets or their splits for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies (e.g., library or solver names with version numbers) for its experiments.
Experiment Setup Yes We begin with the linear reward setting, where the mean vector is sampled from Unif[0, 1] with d = 20 and m = 3, and stochastic rewards are uniformly generated around these means at each round. [...] The mean reward of each base arm is independently sampled from a uniform distribution over [0, 1], with d = 10 base arms and cardinality constraint m = 3. The reward noise is correlated according to a d d positive semi-definite covariance matrix Σ, constructed as AA + Id with normalization, where A Rd d is a randomly generated matrix. The stochastic rewards are then sampled from a multivariate Gaussian distribution with the specified mean vector and covariance matrix Σ.