Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Best Arm Identification in Multi-Agent Multi-Armed Bandits

Authors: Filippo Vannella, Alexandre Proutiere, Jaeseong Jeong

ICML 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the performance of MF-Ta S numerically using both synthetic and real-world experiments (e.g., to solve the antenna tilt optimization problem in radio communication networks).
Researcher Affiliation Collaboration 1KTH Royal Institute of Technology, Stockholm, Sweden 2Ericsson, Stockholm, Sweden. Correspondence to: Filippo Vannella <EMAIL>.
Pseudocode Yes Algorithm 1 FCR, Algorithm 2 MF-Ta S, Algorithm 3 VE, Algorithm 4 BUILD A0
Open Source Code Yes Additional experiments are reported in App. J, and the code is available at this link.
Open Datasets No We run our experiments in a proprietary mobile network simulator in an urban environment. The local expected rewards are selected at random as θi(ai, ai+i) U(0, M), for all i [N] and for some M > 0.
Dataset Splits No The paper does not specify dataset splits like training, validation, or test sets; it mentions synthetic data generation and a proprietary simulator.
Hardware Specification Yes The experiments run on a Mac Book Pro 2.6 GHz 6-Core Intel Core i7 processor. We use this setup in all of our experiments.
Software Dependencies No We implement the solver for the lower bound optimization problems using CVXPY (Diamond & Boyd, 2016), with a MOSEK solver.
Experiment Setup Yes The exploration threshold is selected as β(δ, t) = log(log(t) + 1)/δ). The elimination order for both VE and FCR is chosen as O = {N, N 1, . . . , 1}.