Fault Tolerant ML: Efficient Meta-Aggregation and Synchronous Training
Authors: Tehila Dahan, Kfir Yehuda Levy
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted our experiments in a homogeneous setting. Specifically, we utilized the MNIST (Le Cun et al., 2010) dataset, which contains 28x28 pixel grayscale images of handwritten digits, and the CIFAR-10 (Krizhevsky et al., 2014) dataset, which includes 32x32 color images spanning 10 classes. The detailed experimental setup and the complete results are provided in Appendix E. |
| Researcher Affiliation | Academia | 1Department of Data and Decision Sciences, Technion, Haifa, Israel 2Department of Electrical and Computer Engineering, Technion, Haifa, Israel. |
| Pseudocode | Yes | Algorithm 1 Centered Trimmed Meta Aggregator (CTMA); Algorithm 2 Synchronous Robust µ2-SGD |
| Open Source Code | Yes | For the code, please visit our Git Hub repository1. 1https://github.com/dahan198/synchronous-fault-tolerant-ml |
| Open Datasets | Yes | We utilized the MNIST (Le Cun et al., 2010) dataset, which contains 28x28 pixel grayscale images of handwritten digits, and the CIFAR-10 (Krizhevsky et al., 2014) dataset, which includes 32x32 color images spanning 10 classes. |
| Dataset Splits | No | The paper specifies training and testing image counts for MNIST and CIFAR-10, but it does not explicitly mention a separate validation set split or its size. |
| Hardware Specification | Yes | The code was written in Python and executed on NVIDIA A30 GPU for MNIST and NVIDIA Ge Force RTX 3090 GPU for CIFAR-10. |
| Software Dependencies | No | The implementation was carried out using Py Torch. It mentions PyTorch but does not specify a version number, nor does it specify versions for Python or any other libraries. |
| Experiment Setup | Yes | Learning Rate: We experimented with a range of learning rates from 10 4 to 101. For experiments requiring a single learning rate, we selected 0.1, which was found to be optimal within this range. ... The table below summarizes the configurations used in our experiments, including the settings for αt, βt, and γt for the µ2-SGD algorithm, as well as the dataset, model, batch size, and gradient clipping values to enhance performance, as implemented in Allouah et al. (2023). |