DepthFL : Depthwise Federated Learning for Heterogeneous Clients
Authors: Minjae Kim, Sangyoon Yu, Suhyun Kim, Soo-Mook Moon
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that depth-scaled local models build a global model better than width-scaled ones, and that self-distillation is highly effective in training data-insufficient deep layers. |
| Researcher Affiliation | Academia | Minjae Kim Seoul National University mjkim@snu.ac.kr Sangyoon Yu Seoul National University sangyoonyu@snu.ac.kr Korea Institute of Science and Technology dr.suhyun.kim@gmail.com Soo-Mook Moon Seoul National University smoon@snu.ac.kr |
| Pseudocode | Yes | Algorithm 1: Depth FL |
| Open Source Code | No | The paper does not contain any statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We used MNIST, CIFAR-100, and Tiny Image Net datasets for the image classification task, and Wiki Text-2 dataset for the masked language modeling task. |
| Dataset Splits | No | The paper mentions the datasets used (MNIST, CIFAR-100, Tiny Image Net, Wiki Text-2) but does not explicitly state the training, validation, and test dataset splits or cite where these splits are defined for reproduction. |
| Hardware Specification | No | The paper does not specify the hardware (e.g., GPU models, CPU models, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, or CUDA versions) required for replication. |
| Experiment Setup | Yes | Table 13: Hyperparameters and model architecture used in experiments, provides specific values for Local Epoch E, Local Batch Size B, Optimizer (SGD), Momentum, Weight decay, Temperature, alpha (Fed Dyn), Consistency rampup, Communication rounds, Learning rate, Learning rate decay, Embedding Size, Number of heads, Dropout, and Sequence length. |