Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
FedHPO-Bench: A Benchmark Suite for Federated Hyperparameter Optimization
Authors: Zhen Wang, Weirui Kuang, Ce Zhang, Bolin Ding, Yaliang Li
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments based on FEDHPO-BENCH to provide the community with more insights into Fed HPO. |
| Researcher Affiliation | Collaboration | 1Alibaba Group 2ETH Zรผrich. Correspondence to: Yaliang Li <EMAIL>. |
| Pseudocode | Yes | Figure 6. A general algorithmic view for Fed HPO methods: They are allowed to concurrently explore different client-side configurations in the same round of FL, but the clients are heterogeneous, i.e., corresponding to different functions f (c) i ( ). Operators in brackets are optional. |
| Open Source Code | Yes | We open-sourced FEDHPO-BENCH at https://github.com/alibaba/Federated Scope/tree/ master/benchmark/Fed HPOBench. |
| Open Datasets | Yes | All these datasets are publicly available and can be downloaded and preprocessed by our prepared scripts. |
| Dataset Splits | Yes | FEMNIST... And we use the default train/valid/test splits for each client, where the ratio is 60% : 20% : 20%. |
| Hardware Specification | Yes | In creating them, we spent about two months of computation time on six machines, each with four Nvidia V100 GPUs. |
| Software Dependencies | Yes | Table 5. Overview of the optimizers from widely adopted libraries. Name Model Packages version RS (Bergstra & Bengio, 2012) HPBandster 0.7.4 BOGP (Hutter et al., 2011; Lindauer et al., 2022) GP SMAC3 1.3.3 BOKDE (Falkner et al., 2018) KDE HPBandster 0.7.4 TPEHB (Akiba et al., 2019) TPE Optuna 2.10.0 |
| Experiment Setup | Yes | As a full-fidelity function evaluation consumes 500 rounds on these datasets, we specify the total budget to 2,500 (i.e., 5 times the budget of a full-fidelity evaluation) in terms of #round. Precisely, each BBO method consists of 50 trials, each of which runs for 50 rounds. For MF optimizers, we set ฮท of Successive Halving Algorithm (SHA) (Jamieson & Talwalkar, 2016) to 3, the min and max budget to 9 and 81 rounds, respectively. |