Byzantine-Robust Learning on Heterogeneous Data via Gradient Splitting
Authors: Yuchen Liu, Chen Chen, Lingjuan Lyu, Fangzhao Wu, Sai Wu, Gang Chen
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on various real-world datasets verify the efficacy of our proposed GAS. |
| Researcher Affiliation | Collaboration | Yuchen Liu 1 * Chen Chen 2 * Lingjuan Lyu 2 Fangzhao Wu 3 Sai Wu 1 Gang Chen 1 1Key Lab of Intelligent Computing Based Big Data of Zhejiang Province, Zhejiang University, Hangzhou, China 2Sony AI 3Microsoft. |
| Pseudocode | No | The paper describes the proposed GAS approach in three steps (Splitting, Identification, Aggregation) in paragraph form, but it does not include a formal pseudocode block or algorithm box. |
| Open Source Code | Yes | The implementation code is provided in https://github. com/Yuchen Liu-a/byzantine-gas. |
| Open Datasets | Yes | Our experiments are conducted on four real-world datasets: CIFAR-10 (Krizhevsky et al., 2009), CIFAR100 (Krizhevsky et al., 2009), a subset of Image Net (Russakovsky et al., 2015) refered as Image Net-12 (Li et al., 2021b) and FEMNIST (Caldas et al., 2018). |
| Dataset Splits | Yes | For each client, we randomly sample 0.9 portion of data as training data and let the rest 0.1 portion of data be test data by following Caldas et al. (2018). |
| Hardware Specification | No | The paper mentions model architectures like Alex Net, Squeeze Net, ResNet-18, and CNN but does not specify the hardware (e.g., GPU models, CPU types, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions using SGD optimizer and refers to various existing robust AGRs but does not provide specific version numbers for any software libraries or programming languages used in the implementation (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | For local training, the number of local epochs is set to 1, batch size is set to 64, the optimizer is set to SGD. For SGD optimizer, learning rate is set to 0.1, momentum is set to 0.5, weight decay coefficient is set to 0.0001. We also adopt gradient clipping with clipping norm 2. |