Incentivized Truthful Communication for Federated Bandits

Authors: Zhepei Wei, Chuanhao Li, Tianze Ren, Haifeng Xu, Hongning Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive numerical studies further validate the effectiveness of our proposed solution.
Researcher Affiliation Academia University of Virginia University of Chicago
Pseudocode Yes Algorithm 1 Truthful Incentive Search
Open Source Code No The paper does not provide any statements about releasing code or links to a code repository for the methodology described.
Open Datasets No The paper states 'we create a simulated federated bandit learning environment' but does not provide access information (link, DOI, or formal citation) for any publicly available or open dataset.
Dataset Splits No The paper uses a 'simulated federated bandit learning environment' and mentions a 'fixed time horizon T', but it does not specify traditional train/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper mentions running experiments in a 'simulated federated bandit learning environment' but does not specify any hardware details (e.g., GPU models, CPU types, or memory) used for these simulations.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) required for reproducibility.
Experiment Setup Yes For demonstration purpose, we instantiate it as a combination of client s weighted data collection cost plus its intrinsic preference cost, i.e., f( Vi,t) = w det( Vi,t) + Ci, where w = 10 4, and each client i s intrinsic preference cost Ci is uniformly sampled from U(0, 100). In the simulated environment (Section 5), the time horizon is T = 6250, total number of clients N = 25, context dimension d = 5. We set the hyper-parameter ϵ = 1.0, β = 0.5 in Algorithm 1 and Algorithm 3. The tolerance factor in Algorithm 7 is γ = 1.0.