Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
FedAIoT: A Federated Learning Benchmark for Artificial Intelligence of Things
Authors: Samiul Alam, Tuo Zhang, Tiantian Feng, Hui Shen, Zhichao Cao, Dong Zhao, Jeonggil Ko, Kiran Somasundaram, Shrikanth Narayanan, Salman Avestimehr, Mi Zhang
DMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our benchmark results shed light on the opportunities and challenges of FL for AIo T. We have conducted systematic benchmarking on the eight datasets using the end-toend framework. Specifically, we examine the impact of varying degrees of non-IID data distributions, FL optimizers, and client sampling ratios on the performance of FL. We also evaluate the impact of noisy labels, a prevalent challenge in Io T datasets, as well as the effects of quantized training, a technique that tackles the practical limitation of resourceconstrained Io T devices. Our benchmark results provide valuable information about both the opportunities and challenges of FL for AIo T. Sections like "4 Benchmark Results and Analysis", "Table 5: Overall performance", "Table 6: Impact of client sampling ratio", "Table 7: Impact of noisy labels", and "Table 8: Performance on quantized training" demonstrate empirical studies with data analysis. |
| Researcher Affiliation | Collaboration | Samiul Alam1 EMAIL Tuo Zhang2 EMAIL Tiantian Feng2 EMAIL Hui Shen1 EMAIL Zhichao Cao3 EMAIL Dong Zhao3 EMAIL Jeong Gil Ko4 EMAIL Kiran Somasundaram5 EMAIL Shrikanth S. Narayanan2 EMAIL Salman Avestimehr2 EMAIL Mi Zhang1 EMAIL 1The Ohio State University 2University of Southern California 3Michigan State University 4Yonsei University 5Meta |
| Pseudocode | No | The paper describes methodologies in prose and with figures like Figure 2, but does not contain any explicitly labeled pseudocode or algorithm blocks with structured steps. |
| Open Source Code | Yes | The repository of Fed AIo T is maintained at https://github.com/AIo T-MLSys-Lab/Fed AIo T. |
| Open Datasets | Yes | Fed AIo T includes eight datasets collected from a wide range of Io T devices. These datasets cover unique Io T modalities and target representative applications of AIo T. The paper lists and cites each dataset used, such as "WISDM: The Wireless Sensor Data Mining (WISDM) dataset (Weiss et al., 2019; Lockhart et al., 2011)", "UT-HAR: The UT-HAR dataset (Yousefi et al., 2017)", "Widar: The Widar dataset (Yang, 2020; Zheng et al., 2019)", "Vis Drone: The Vis Drone dataset (Zhu et al., 2021)", "CASAS: The CASAS dataset (Schmitter-Edgecombe and Cook, 2009)", "AEP: The Appliances Energy Prediction (AEP) dataset (Candanedo et al., 2017)", and "EPIC-SOUNDS: The EPIC-SOUNDS dataset (Huh et al., 2023)". Some datasets also explicitly mention their licenses, e.g., "The dataset is licensed under the Creative Commons Attribution-Non Commercial 4.0 International Licence (CC BY 4)" for Widar. |
| Dataset Splits | Yes | For WISDM-W/P: "We randomly selected 45 subjects as the training set and the remaining six subjects were assigned to the test set. The total number of samples in the training and test set is 16, 569 and 4, 103 for WISDM-W and 13, 714 and 4, 073 for WISDM-P respectively." For UT-HAR: "UT-HAR contains a pre-determined training and test set. The total number of training and test samples is 3, 977 and 500 respectively." For Widar: "data from two subjects used during training and the remaining one for the test set. The resulting dataset includes nine gestures with 11, 372 samples in the training set and 5, 222 in the test set." For Vis Drone: "The dataset contains a pre-determined training and test set. The total number of samples in the training and test set is 6, 471 and 1, 610 respectively." For CASAS: "The training and test set was made using an 80-20 split." For AEP: "The number of samples in the training and test set is 15, 788 and 3, 947 respectively." For EPIC-SOUNDS: "The total number of training and test samples is 60, 055 and 40, 175 respectively." |
| Hardware Specification | Yes | We implemented Fed AIo T using Py Torch (Paszke et al., 2019) and Ray (Moritz et al., 2018) and conducted our experiments on NVIDIA A6000 GPUs. |
| Software Dependencies | No | We implemented Fed AIo T using Py Torch (Paszke et al., 2019) and Ray (Moritz et al., 2018). The paper mentions Py Torch and Ray by name and cites papers for them, but does not provide specific version numbers for these software dependencies used in the experiments. |
| Experiment Setup | Yes | Specifically, we examine the impact of varying degrees of non-IID data distributions, FL optimizers, and client sampling ratios on the performance of FL. Tables 5, 6, and 7 provide specific values such as "Low Data Heterogeneity (α = 0.5)", "High Data Heterogeneity (α = 0.1)", "Low Client Sampling Ratio (10%)", "High Client Sampling Ratio (30%)", and "Noisy Label Ratio... 10% and 30%". Also, it states "Details on other hyperparameters used for each experiment are described in Appendix E." |