Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Constrained Feedback Learning for Non-Stationary Multi-Armed Bandits
Authors: Shaoang Li, Jian Li
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | E Experiment Results. In this section, we compare different query allocation strategies. We base our comparison on the Rexp3B-Sample algorithm (Appendix B), which assumes a known variation budget VT , as its structure is more easily adapted to accommodate different allocation policies. Environments. We evaluate all algorithms across three distinct non-stationary environments. ... Figure 3 presents the cumulative regret of all algorithms across the three environments. |
| Researcher Affiliation | Academia | Shaoang Li Stony Brook University EMAIL Jian Li Stony Brook University EMAIL |
| Pseudocode | Yes | Algorithm 1 BAQUE: Baseline Query Allocation Algorithm 2 HYQUE: Hybrid Query Allocation Algorithm 3 Rexp3B: Rexp3 algorithm with Budgeted feedback Algorithm 4 Extended BAQUE for General Base Algorithm ALG Algorithm 5 Extended HYQUE |
| Open Source Code | No | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: The paper provides all the necessary details for anyone to reproduce the main experimental results, ensuring transparency and allowing others to validate the claims and conclusions independently. |
| Open Datasets | No | Environments. We evaluate all algorithms across three distinct non-stationary environments. In all settings, the time horizon is set to T = 200,000, the number of arms is K = 5, and all nonoptimal arms provide a baseline reward of µbase = 0.5. First, we use a standard Piecewise Stationary environment, where the time horizon is divided into 40 equal epochs. In each epoch, a single optimal arm s reward is elevated, and this optimal arm cycles deterministically at each change point. This setting, with a query budget of B = 100,000 and total variation VT = 20. We also test on a Random Changepoints environment with randomly distributed change points, and a Low Query Budget environment where the budget is reduced to B = 20,000. |
| Dataset Splits | No | The paper describes simulated environments and a 'time horizon T = 200,000' but does not specify any training/test/validation splits for a fixed dataset, as the experiments are based on simulated dynamic processes over time rather than static datasets. |
| Hardware Specification | Yes | All runs are executed on a machine equipped with a 12th Gen Intel(R) Core(TM) i9-12900HX processor. |
| Software Dependencies | No | The paper does not explicitly list any specific software dependencies with version numbers used for implementing the algorithms or conducting experiments. |
| Experiment Setup | Yes | Environments. We evaluate all algorithms across three distinct non-stationary environments. In all settings, the time horizon is set to T = 200,000, the number of arms is K = 5, and all non-optimal arms provide a baseline reward of µbase = 0.5. First, we use a standard Piecewise Stationary environment, where the time horizon is divided into 40 equal epochs. In each epoch, a single optimal arm s reward is elevated, and this optimal arm cycles deterministically at each change point. This setting, with a query budget of B = 100,000 and total variation VT = 20. We also test on a Random Changepoints environment with randomly distributed change points, and a Low Query Budget environment where the budget is reduced to B = 20,000. |