Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimal Algorithm for Max-Min Fair Bandit

Authors: Zilong Wang, Zhiyao Zhang, Shuai Li

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Additional numerical experiments also show the efficiency and improvement of our algorithms.
Researcher Affiliation Academia 1Shanghai Jiao Tong University, Shanghai, China 2The Ohio State University, Columbus, Ohio, USA. Correspondence to: Shuai Li <EMAIL>.
Pseudocode Yes In this section, we introduce the Decentralized Fair Elimination algorithm (Algorithm 1). Algorithm 1 Decentralized Fair Elimination (for player i) Algorithm 2 Assign Exploration Algorithm 3 Communication phase in Decentralized Fair Elimination (for players i and j)
Open Source Code No The paper does not contain any explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets No The reward matrix of (4, 4) and (10, 10) are the same with which in Bistritz et al. (2020), shown in Appendix E. And for (10, 15), the mean value is generated uniformly from [0, 1].
Dataset Splits No The paper describes generating reward mean values or taking them from a cited work for the experiments, which are typical for bandit problems. It does not mention or provide specific training, validation, or test dataset splits in the conventional sense used for supervised learning tasks.
Hardware Specification No All experiments are conducted on CPU.
Software Dependencies No The paper does not specify any software dependencies (e.g., libraries, frameworks) with version numbers that would be needed to replicate the experiments.
Experiment Setup Yes Here, we take T = 500, 000, and change the values of N and K to conduct multiple experiments: (N, K) = (4, 4), (10, 10), (10, 15). The reward of each player-arm pair (i, k) follows a Gaussian distribution N(ยตi,k, ฯƒ2) with ฯƒ = 1. The reward means form a reward matrix U, whose element in the i-th row and the k-th column is ยตi,k. The reward matrix of (4, 4) and (10, 10) are the same with which in Bistritz et al. (2020), shown in Appendix E. And for (10, 15), the mean value is generated uniformly from [0, 1]. We conduct experiments with three algorithms for comparison: DFE (Algorithm 1), Leshem ((Leshem, 2025)), and My Fair Bandit ((Bistritz et al., 2020)). Each experiment is repeated 20 times. All plots are averaged over 20 trials with confidence intervals of 95%.