OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning

Authors: Youhe Jiang, Fangcheng Fu, Xupeng Miao, Xiaonan Nie, Bin Cui

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results of OSDP on multiple different kinds of large-scale models demonstrate that the proposed strategy outperforms the state-of-the-art in multiple regards.
Researcher Affiliation Academia Youhe Jiang1 , Fangcheng Fu1 , Xupeng Miao2 , Xiaonan Nie1 , Bin Cui1,3 1School of CS & Key Lab of High Confidence Software Technologies (MOE), Peking University 2Computer Science Department, Carnegie Mellon University 3Institute of Computational Social Science, Peking University (Qingdao)
Pseudocode Yes Algorithm 1 Routines of OSDP.
Open Source Code Yes Our code is available1. 1https://github.com/Youhe-Jiang/IJCAI2023-Optimal Sharded Data Parallel
Open Datasets No The paper mentions using 'min GPT2' as the model base, which is a re-implementation of GPT training. However, it does not explicitly provide access information (link, DOI, formal citation) for the specific dataset used for training, only a link to the model's implementation.
Dataset Splits No The paper does not explicitly provide specific details about training, validation, and test dataset splits, percentages, or sample counts.
Hardware Specification Yes Most of our experiments are performed on a laboratorial server equipped with 8 NVIDIA RTX TITAN 24 GB GPUs using PCIe 3.0. For the multi-server experiments, two cloud servers equipped with NVIDIA A100 GPUs are utilized.
Software Dependencies No The paper states 'We implement OSDP on top of Py Torch and Fair Scale' but does not specify version numbers for these software dependencies.
Experiment Setup Yes All experiments are executed for 100 iterations and the averaged statistics are reported. By default, we set the slice granularity of our operator splitting technique as 4. We conduct experiments... under the GPU memory limit of 8G and 16G, respectively.