Learning Generalizable Device Placement Algorithms for Distributed Machine Learning

Authors: ravichandra addanki, Shaileshh Bojja Venkatakrishnan, Shreyan Gupta, Hongzi Mao, Mohammad Alizadeh

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that Placeto requires up to 6.1 fewer training steps to find placements that are on par with or better than the best placements found by prior approaches. Moreover, Placeto is able to learn a generalizable placement policy for any given family of graphs, which can then be used without any retraining to predict optimized placements for unseen graphs from the same family.
Researcher Affiliation Academia Ravichandra Addanki , Shaileshh Bojja Venkatakrishnan, Shreyan Gupta, Hongzi Mao, Mohammad Alizadeh MIT Computer Science and Artificial Intelligence Laboratory {addanki, bjjvnkt, shreyang, hongzi, alizadeh}@mit.edu
Pseudocode No No pseudocode or clearly labeled algorithm blocks were found in the paper.
Open Source Code Yes We open-source our implementation, datasets and the simulator. 2 https://github.com/aravic/generalizable-device-placement
Open Datasets Yes We use TensorFlow to generate a computation graph given any neural network model... We evaluate our approach on computation graphs corresponding to the following three popular deep learning models: (1) Inception-V3 [23], (2) NMT [27], (3) NASNet [28]. We also evaluate on three synthetic datasets, each comprising of 32 graphs, spanning a wide range of graph sizes and structures. We refer to these datasets as cifar10, ptb and nmt. Graphs from cifar10 and ptb datasets are synthesized using an automatic model design approach called ENAS [19]. The nmt dataset is constructed by varying the RNN length and batch size hyperparameters of the NMT model [27].
Dataset Splits Yes We randomly split these datasets for training and test purposes.
Hardware Specification No No specific hardware details (e.g., GPU models, CPU models, memory amounts) are mentioned for the experiments. It only states "on real hardware" or mentions "2 GPUs" and "4 GPUs" without specifics.
Software Dependencies No The paper mentions "TensorFlow" but does not specify a version number. No other software libraries or solvers are mentioned with version numbers.
Experiment Setup Yes We consider an MDP where a state observation s comprises of a graph G(V, E) G with the following features on each node v V : (1) estimated run time of v, (2) total size of tensors output by v, (3) the current device placement of v, (4) a flag indicating whether v has been visited before, and (5) a flag indicating whether v is the current node for which the placement has to be updated... The episode ends in n|V | steps, after the placement of each node has been updated n times, where n is a tunable hyper-parameter... Further details on training of Placeto and the RNN-based approach including our choice of hyperparameter values are given in the Appendix A.7.