A Kernel Perspective on Distillation-based Collaborative Learning
Authors: Sejun Park, Kihun Hong, Ganguk Hwang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we conduct experiments on DCL-KR and DCL-NN. To illustrate the superiority of our algorithms, we compare them with several baselines on various regression tasks. Experimental results show that DCL-KR achieves the same performance as the centralized model, even beyond the theoretical results. We also observe that DCL-NN significantly outperforms previous DCL frameworks in most settings. |
| Researcher Affiliation | Academia | Sejun Park Kihun Hong Ganguk Hwang Department of Mathematical Sciences Korea Advanced Institute of Science and Technology {sejunpark, nuri9911, guhwang}@kaist.ac.kr |
| Pseudocode | Yes | Algorithm 1 DCL-KR Algorithm; Algorithm 2 DCL-NN Algorithm |
| Open Source Code | Yes | The code is also provided via the supplementary material. ... The code is provided via the supplementary material. |
| Open Datasets | Yes | Datasets We use the following six regression datasets to evaluate the performance. ... (1) Toy-1D [33] and (2) Toy-3D [6] are synthetic datasets... (3) Energy is a tabular dataset from the UCI database [12]... (4) Rotated MNIST is an image dataset where it aims to predict the rotation angles for given rotated images of the MNIST [11] images. (5) UTKFace [71] and (6) IMDB-WIKI [42, 52] are image datasets for age estimation. |
| Dataset Splits | Yes | We use 12,000 training data points distributed across the parties. We use 6,000 samples as public inputs and 1,000 samples for testing. (Energy dataset) ... We use 200,000 images as the entire training data, 50,000 images as public inputs, and 50,000 images as test data. (Rotated MNIST) ... We use 12,544 samples for training and 1,039 samples for testing. We have 6,234 public inputs. (UTKFace) ... We use 147,107 images as the entire training data, 36,780 images as public inputs, and 56,087 images as test data. (IMDB-WIKI) |
| Hardware Specification | Yes | We simulate a decentralized setting on a single deep learning workstation (Intel(R) Xeon(R) Gold 6430 with one NVIDIA Ge Force RTX 4090 GPU and 189GB RAM). |
| Software Dependencies | No | The experiments are implemented in Py Torch. ... All optimizers used are Adam [26]. While PyTorch and Adam are mentioned, no specific version numbers are provided for these software components. |
| Experiment Setup | Yes | Hyperparameters: T: total communication round, E: the number of local iterations at each communication round, η: learning rate (Algorithm 1) ... We set the learning rate η = 0.5... The number of local iterations E for DCL-KR is set to 5. ... Tables 5, 6, 7, 8, and 9 list various hyperparameters such as batch size, learning rate, and communication rounds for different algorithms and datasets. |