Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Incentivized Truthful Communication for Federated Bandits

Authors: Zhepei Wei, Chuanhao Li, Tianze Ren, Haifeng Xu, Hongning Wang

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive numerical studies further validate the effectiveness of our proposed solution.
Researcher Affiliation Academia University of Virginia University of Chicago
Pseudocode Yes Algorithm 1 Truthful Incentive Search
Open Source Code No The paper does not provide any statements about releasing code or links to a code repository for the methodology described.
Open Datasets No The paper states 'we create a simulated federated bandit learning environment' but does not provide access information (link, DOI, or formal citation) for any publicly available or open dataset.
Dataset Splits No The paper uses a 'simulated federated bandit learning environment' and mentions a 'fixed time horizon T', but it does not specify traditional train/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper mentions running experiments in a 'simulated federated bandit learning environment' but does not specify any hardware details (e.g., GPU models, CPU types, or memory) used for these simulations.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) required for reproducibility.
Experiment Setup Yes For demonstration purpose, we instantiate it as a combination of client s weighted data collection cost plus its intrinsic preference cost, i.e., f( Vi,t) = w det( Vi,t) + Ci, where w = 10 4, and each client i s intrinsic preference cost Ci is uniformly sampled from U(0, 100). In the simulated environment (Section 5), the time horizon is T = 6250, total number of clients N = 25, context dimension d = 5. We set the hyper-parameter ϵ = 1.0, β = 0.5 in Algorithm 1 and Algorithm 3. The tolerance factor in Algorithm 7 is γ = 1.0.