Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Oracle-Efficient Combinatorial Semi-Bandits

Authors: Jung-hun Kim, Milan Vojnovic, Min-hwan Oh

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6 Experiments We compare our algorithms to benchmarks in terms of oracle efficiency and regret using synthetic datasets3. We begin with the linear reward setting, where the mean vector is sampled from Unif[0, 1] with d = 20 and m = 3, and stochastic rewards are uniformly generated around these means at each round. As shown in Figure 2 (a,b), our algorithms (AROQ-CMAB, SROQ-CMAB) achieve significantly lower oracle adaptivity and query complexities than CUCB [5], consistent with Theorems 1 and 2. Importantly, as shown in Figure 2 (d), our algorithms achieve faster runtime than the benchmark.
Researcher Affiliation	Academia	Jung-hun Kim CREST, ENSAE, IP Paris Fair Play joint team, France EMAIL Milan Vojnovi c London School of Economics United Kingdom EMAIL Min-hwan Oh Seoul National University South Korea EMAIL
Pseudocode	Yes	Algorithm 1 Adaptive Rare Oracle Queries for Combinatorial MAB (AROQ-CMAB) Initialize: τi = 1 for all i [d] for t = 1, 2..., T do
Open Source Code	Yes	Source Code: https://github.com/junghunkim7786/Oracle Efficient Combinatorial Bandits
Open Datasets	No	We compare our algorithms to benchmarks in terms of oracle efficiency and regret using synthetic datasets3. We begin with the linear reward setting, where the mean vector is sampled from Unif[0, 1] with d = 20 and m = 3, and stochastic rewards are uniformly generated around these means at each round.
Dataset Splits	No	The paper uses synthetic datasets which are generated on the fly; it does not mention pre-existing datasets or their splits for training, validation, or testing.
Hardware Specification	No	The paper does not provide specific hardware details for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies (e.g., library or solver names with version numbers) for its experiments.
Experiment Setup	Yes	We begin with the linear reward setting, where the mean vector is sampled from Unif[0, 1] with d = 20 and m = 3, and stochastic rewards are uniformly generated around these means at each round. [...] The mean reward of each base arm is independently sampled from a uniform distribution over [0, 1], with d = 10 base arms and cardinality constraint m = 3. The reward noise is correlated according to a d d positive semi-definite covariance matrix Σ, constructed as AA + Id with normalization, where A Rd d is a randomly generated matrix. The stochastic rewards are then sampled from a multivariate Gaussian distribution with the specified mean vector and covariance matrix Σ.