Position: Exploring the Robustness of Pipeline-Parallelism-Based Decentralized Training
Authors: Lin Lu, Chenxi Dai, Wangcheng Tao, Binhang Yuan, Yanan Sun, Pan Zhou
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments aim to validate our attack methods and robust training framework. Specifically, we demonstrate: i) Decentralized pipeline parallelism training is vulnerable to both forward attack and backward attack, which negatively impact the model’s convergence and final metric; ii) When employing our robust training strategy, the training effectiveness of LLMs is comparable to the results under normal conditions and may even perform better; and iii) Our robust training framework enhances the training process of the pipeline parallel strategy compared to the restart framework. It is important to note that our experiments do not solely assess the detection strategy, as we have already logically proven its ability to detect malicious behaviors with one hundred percent accuracy in Subsection 5.1. However, its significance as the foundation of the robust training framework should not be overlooked. |
| Researcher Affiliation | Academia | 1Huazhong University of Science and Technology 2Hong Kong University of Science and Technology 3Sichuan University. Correspondence to: Pan Zhou <panzhou@hust.edu.cn>. |
| Pseudocode | No | This section addresses RQ3 by introducing our robust training framework consisting of two main components: attack detection and efficient training, as depicted in Figure 3. |
| Open Source Code | Yes | The code is available at https://github.com/ dcx001016/pipeline_attack |
| Open Datasets | Yes | We employ text-generation tasks on wikitext2, arxiv abstracts, and openwebtext datasets to conduct our evaluations. All datasets are publicly available and do not contain sensitive or offensive content. |
| Dataset Splits | No | We employ text-generation tasks on wikitext2, arxiv abstracts, and openwebtext datasets to conduct our evaluations. |
| Hardware Specification | Yes | To simulate heterogeneous computing resources in real scenarios and to train LLMs of varying sizes, we utilize several types of GPU devices, including A40, V100, RTX 3090, and Quadro RTX 5000. |
| Software Dependencies | No | Perplexity serves as our primary metric for evaluating model performance, and our experiments are founded on the GPipe (Huang et al., 2019) framework. Checkpoints for all models above can be obtained from Hugging Face. |
| Experiment Setup | Yes | Hyperparameter tuning. We set the learning rate to 5e-6 during training, and the batch size and micro-batch size to 4 and 1, respectively. |