Weight for Robustness: A Comprehensive Approach towards Optimal Fault-Tolerant Asynchronous ML
Authors: Tehila Dahan, Kfir Y. Levy
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our methodology is rigorously validated through empirical and theoretical analysis, demonstrating its effectiveness in enhancing fault tolerance and optimizing performance in asynchronous ML systems. To evaluate the effectiveness of our proposed approach, we conducted experiments on MNIST [Le Cun et al., 2010] and CIFAR-10 [Krizhevsky et al., 2014] datasets two recognized benchmarks in image classification tasks. |
| Researcher Affiliation | Academia | Tehila Dahan Department of Electrical Engineering Technion Haifa, Israel t.dahan@campus.technion.ac.il Kfir Y. Levy Department of Electrical Engineering Technion Haifa, Israel kfirylevy@technion.ac.il |
| Pseudocode | Yes | Algorithm 1 Weighted Centered Trimmed Meta Aggregator (ω-CTMA); Algorithm 2 Asynchronous Robust µ2-SGD |
| Open Source Code | Yes | For more details, please visit our Git Hub repository.1 1https://github.com/dahan198/asynchronous-fault-tolerant-ml |
| Open Datasets | Yes | We simulated over the MNIST [Le Cun et al., 2010] and CIFAR-10 [Krizhevsky et al., 2014] datasets. The datasets were accessed through torchvision (version 0.16.2). MNIST Dataset. MNIST is a widely used benchmark dataset in the machine learning community, consisting of 70,000 grayscale images of handwritten digits (0-9) with a resolution of 28x28 pixels. The dataset is split into 60,000 training images and 10,000 test images. CIFAR-10 Dataset. CIFAR-10 is a widely recognized benchmark dataset in the machine learning community, containing 60,000 color images categorized into 10 different classes. Each image has a resolution of 32x32 pixels and represents objects such as airplanes, automobiles, birds, cats, and more. The dataset is evenly split into 50,000 training images and 10,000 test images. |
| Dataset Splits | No | The paper mentions training and testing splits, but does not explicitly detail a validation split or its size/percentage, only that datasets are split into training and testing images. |
| Hardware Specification | Yes | all computations were executed on an NVIDIA L40S GPU. |
| Software Dependencies | Yes | We employed a two-layer convolutional neural network architecture for both datasets, implemented using the Py Torch framework. The datasets were accessed through torchvision (version 0.16.2). |
| Experiment Setup | Yes | Optimization Setup. We optimized the cross-entropy loss across all experiments. For comparisons, we configured µ2-SGD with fixed parameters γ = 0.1 and β = 0.25. This was tested against Standard SGD, and Momentum-based SGD, where the momentum parameter was set to β = 0.9 as recommended by Karimireddy et al. [2021]. Parameter MNIST CIFAR-10 Model Architecture Conv(1,20,5), Re LU, Max Pool(2x2), Conv(20,50,5), Re LU, Max Pool(2x2), FC(800 50), Batch Norm, Re LU, FC(50 10) Conv(3,20,5), Re LU, Max Pool(2x2), Conv(20,50,5), Re LU, Max Pool(2x2), FC(1250 50), Batch Norm, Re LU, FC(50 10) Learning Rate 0.01 0.01 Batch Size 16 8 Data Processing & Augmentation Normalize(mean=(0.1307), std=(0.3081)) Random Crop(size=32, padding=2), Random Horizontal Flip(p=0.5), Normalize(mean=(0.4914, 0.4822, 0.4465), std=(0.2023, 0.1994, 0.2010)) |