Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Oracle-Efficient Combinatorial Semi-Bandits
Authors: Jung-hun Kim, Milan Vojnovic, Min-hwan Oh
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Experiments We compare our algorithms to benchmarks in terms of oracle efficiency and regret using synthetic datasets3. We begin with the linear reward setting, where the mean vector is sampled from Unif[0, 1] with d = 20 and m = 3, and stochastic rewards are uniformly generated around these means at each round. As shown in Figure 2 (a,b), our algorithms (AROQ-CMAB, SROQ-CMAB) achieve significantly lower oracle adaptivity and query complexities than CUCB [5], consistent with Theorems 1 and 2. Importantly, as shown in Figure 2 (d), our algorithms achieve faster runtime than the benchmark. |
| Researcher Affiliation | Academia | Jung-hun Kim CREST, ENSAE, IP Paris Fair Play joint team, France EMAIL Milan Vojnovi c London School of Economics United Kingdom EMAIL Min-hwan Oh Seoul National University South Korea EMAIL |
| Pseudocode | Yes | Algorithm 1 Adaptive Rare Oracle Queries for Combinatorial MAB (AROQ-CMAB) Initialize: τi = 1 for all i [d] for t = 1, 2..., T do |
| Open Source Code | Yes | Source Code: https://github.com/junghunkim7786/Oracle Efficient Combinatorial Bandits |
| Open Datasets | No | We compare our algorithms to benchmarks in terms of oracle efficiency and regret using synthetic datasets3. We begin with the linear reward setting, where the mean vector is sampled from Unif[0, 1] with d = 20 and m = 3, and stochastic rewards are uniformly generated around these means at each round. |
| Dataset Splits | No | The paper uses synthetic datasets which are generated on the fly; it does not mention pre-existing datasets or their splits for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies (e.g., library or solver names with version numbers) for its experiments. |
| Experiment Setup | Yes | We begin with the linear reward setting, where the mean vector is sampled from Unif[0, 1] with d = 20 and m = 3, and stochastic rewards are uniformly generated around these means at each round. [...] The mean reward of each base arm is independently sampled from a uniform distribution over [0, 1], with d = 10 base arms and cardinality constraint m = 3. The reward noise is correlated according to a d d positive semi-definite covariance matrix Σ, constructed as AA + Id with normalization, where A Rd d is a randomly generated matrix. The stochastic rewards are then sampled from a multivariate Gaussian distribution with the specified mean vector and covariance matrix Σ. |