Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Follow-the-Perturbed-Leader Nearly Achieves Best-of-Both-Worlds for the m-Set Semi-Bandit Problems

Authors: Jingxin Zhan, Yuchen Xin, Chenjie Sun, Zhihua Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate the empirical performance of FTPL and several benchmark algorithms on the m-set semi-bandit problem. We compare our method against five established baselines... The resulting average pseudo-regret for each algorithm over time is presented in Figure 1.
Researcher Affiliation Academia Jingxin Zhan School of Mathematical Sciences Peking University Beijing, China 100871 EMAIL Yuchen Xin School of Mathematical Sciences Peking University Beijing, China 100871 EMAIL Chenjie Sun School of Mathematical Sciences Peking University Beijing, China 100871 EMAIL Zhihua Zhang School of Mathematical Sciences Peking University Center for Intelligent Computing Great Bay University, China EMAIL
Pseudocode Yes Algorithm 1: FTPL wit geometric resampling for m-set Semi-bandits
Open Source Code Yes We provide the code in supplemental material.
Open Datasets No Following [Zimmert et al., 2019], we run experiments on a specific instance of the m-set semi-bandit with parameters d = 10, m = 5, and n = 107. The loss for arm i at time t has mean νti, and the realized loss is 0 with probability 1 − νti and 1 with probability νti, independently across arms and time. In the stochastic environment, the losses are generated from a stationary distribution where the mean loss for arm i at time t is given by νti = 1/2 if i ≤ 5, and νti = 1/2 + ε otherwise, with ε = 0.1. In the adversarial environment, we employ the adversarial setting detailed in Zimmert et al. [2019]... We sample a sequence of n loss vectors from the above setting and fix it as our adversarial environment
Dataset Splits No Following [Zimmert et al., 2019], we run experiments on a specific instance of the m-set semi-bandit with parameters d = 10, m = 5, and n = 107. ... We sample a sequence of n loss vectors from the above setting and fix it as our adversarial environment, then run the algorithms to be compared on this fixed sequence. Across all experiments, we estimated the pseudo-regret using 20 repetitions.
Hardware Specification Yes Our experiments are conducted on a server with 4 NVIDIA RTX 4090 GPUs and Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz.
Software Dependencies No The paper does not explicitly list software dependencies with specific version numbers. It mentions algorithms like COMBUCB, THOMPSON SAMPLING, EXP2, LOGBARRIER, and FTRL but does not specify software environments or versions.
Experiment Setup Yes In this section, we evaluate the empirical performance of FTPL and several benchmark algorithms on the m-set semi-bandit problem. ... Following [Zimmert et al., 2019], we run experiments on a specific instance of the m-set semi-bandit with parameters d = 10, m = 5, and n = 107. The loss for arm i at time t has mean νti, and the realized loss is 0 with probability 1 − νti and 1 with probability νti, independently across arms and time. In the stochastic environment, the losses are generated from a stationary distribution where the mean loss for arm i at time t is given by νti = 1/2 if i ≤ 5, and νti = 1/2 + ε otherwise, with ε = 0.1. ... Across all experiments, we estimated the pseudo-regret using 20 repetitions.