Robust Distributed Gradient Aggregation Using Projections onto Gradient Manifolds

Authors: Kwang In Kim

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate consistent performance improvements over state-of-the-art robust aggregation algorithms.
Researcher Affiliation Academia Kwang In Kim POSTECH kimkin@postech.ac.kr
Pseudocode Yes Algorithm 1: Robust collaborative learning algorithm. Training data is distributed across K clients. An unknown number out of K clients will provide erroneous gradients.
Open Source Code No The paper does not contain any explicit statements about releasing source code or provide links to a code repository.
Open Datasets Yes The CIFAR10 and CIFAR100 datasets consist of 60,000 color images from 10 and 100 classes, respectively (Krizhevsky 2009). Tiny Image Net is a subset of the Image Net 2017 benchmark, consisting of 100,000 training and 10,000 testing images evenly covering 200 object categories (Le and Yang 2015). The Kuzushiji49 dataset provides 223,365 training and 38,547 testing images of 49 Japanese characters (Clanuwat et al. 2018). The Fashion-MNIST (FMNIST) and extended MNIST letters (EMNISTL) datasets provide 70,000 images of 10 cloth categories (Xiao, Rasul, and Vollgraf 2017) and 124,800 letter images (Le Cun et al. 1998), respectively.
Dataset Splits Yes For each dataset, 50,000 images were used for training, and the remaining 10,000 images were reserved for testing.
Hardware Specification Yes All experiments were conducted on a machine with two Intel Xeon Silver 4210R CPUs and two NVIDIA RTX3090 GPUs.
Software Dependencies No The paper mentions using convolutional neural networks and ResNet50, but does not provide specific version numbers for any software libraries (e.g., TensorFlow, PyTorch, scikit-learn) or programming languages.
Experiment Setup Yes We used K = 100 clients for all datasets. To distribute each dataset to these clients, we extended (Mc Mahan et al. 2017) s approach (to more than ten classes)... Firstly, we partitioned the dataset into 0.2 C K shards, where C is the number of classes. Each shard contained only a single class, and all shards were of equal size. Then, we randomly assigned 0.2 C shards to each client... For each number of affected clients in {10, 20, 30, 40, 50, 60, 70} (out of K = 100), we repeat experiments 10 times and report the average results. We used convolutional neural networks with two convolution layers and two fully-connected layers for Kuzushiji49, FMNIST, and EMNISTL, following the setup of (Le Cun et al. 1998). The convolution layers comprised 10 and 20 filters of size 5 5, followed by 2 2 max pooling. The fully-connected layers were of 50 and 10. For the remaining datasets, we combined a Res Net50 pre-trained on Image Net dataset with three fully connected layers of size 300 each, following (Kim 2022). Input: Gradient neighborhood size H = 10 and the number of random projection steps S = 20.