reproducibilityindex.ai

Position: Exploring the Robustness of Pipeline-Parallelism-Based Decentralized Training

Authors: Lin Lu, Chenxi Dai, Wangcheng Tao, Binhang Yuan, Yanan Sun, Pan Zhou

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments aim to validate our attack methods and robust training framework. Specifically, we demonstrate: i) Decentralized pipeline parallelism training is vulnerable to both forward attack and backward attack, which negatively impact the model’s convergence and final metric; ii) When employing our robust training strategy, the training effectiveness of LLMs is comparable to the results under normal conditions and may even perform better; and iii) Our robust training framework enhances the training process of the pipeline parallel strategy compared to the restart framework. It is important to note that our experiments do not solely assess the detection strategy, as we have already logically proven its ability to detect malicious behaviors with one hundred percent accuracy in Subsection 5.1. However, its significance as the foundation of the robust training framework should not be overlooked.
Researcher Affiliation	Academia	1Huazhong University of Science and Technology 2Hong Kong University of Science and Technology 3Sichuan University. Correspondence to: Pan Zhou <panzhou@hust.edu.cn>.
Pseudocode	No	This section addresses RQ3 by introducing our robust training framework consisting of two main components: attack detection and efficient training, as depicted in Figure 3.
Open Source Code	Yes	The code is available at https://github.com/ dcx001016/pipeline_attack
Open Datasets	Yes	We employ text-generation tasks on wikitext2, arxiv abstracts, and openwebtext datasets to conduct our evaluations. All datasets are publicly available and do not contain sensitive or offensive content.
Dataset Splits	No	We employ text-generation tasks on wikitext2, arxiv abstracts, and openwebtext datasets to conduct our evaluations.
Hardware Specification	Yes	To simulate heterogeneous computing resources in real scenarios and to train LLMs of varying sizes, we utilize several types of GPU devices, including A40, V100, RTX 3090, and Quadro RTX 5000.
Software Dependencies	No	Perplexity serves as our primary metric for evaluating model performance, and our experiments are founded on the GPipe (Huang et al., 2019) framework. Checkpoints for all models above can be obtained from Hugging Face.
Experiment Setup	Yes	Hyperparameter tuning. We set the learning rate to 5e-6 during training, and the batch size and micro-batch size to 4 and 1, respectively.