Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance
Authors: Cong Xie, Sanmi Koyejo, Indranil Gupta
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that Zeno outperforms existing approaches.In this section, we evaluate the fault tolerance of the proposed algorithm. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Illinois, Urbana-Champaign, USA. Correspondence to: Cong Xie <cx2@illinois.edu>. |
| Pseudocode | Yes | Algorithm 1 Zeno |
| Open Source Code | Yes | The detailed network architecture can be found in https:// github.com/xcgoner/icml2019_zeno. |
| Open Datasets | Yes | We conduct experiments on benchmark CIFAR-10 image classification dataset (Krizhevsky & Hinton, 2009) |
| Dataset Splits | No | The paper mentions '50k images for training and 10k images for testing' but does not specify a validation set or detailed split percentages or methodology beyond this general training/testing division for reproducibility. |
| Hardware Specification | No | The paper states 'In each experiment, we launch 20 worker processes' but provides no specific details on the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, or TensorFlow versions) that are required to replicate the experiments. |
| Experiment Setup | Yes | In all the experiments, we take the learning rate γ = 0.1, worker batch size 100, Zeno batch size nr = 4, and ρ = 0.0005.Each epoch has 25 iterations. |