Learning Generalizable Device Placement Algorithms for Distributed Machine Learning
Authors: ravichandra addanki, Shaileshh Bojja Venkatakrishnan, Shreyan Gupta, Hongzi Mao, Mohammad Alizadeh
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that Placeto requires up to 6.1 fewer training steps to find placements that are on par with or better than the best placements found by prior approaches. Moreover, Placeto is able to learn a generalizable placement policy for any given family of graphs, which can then be used without any retraining to predict optimized placements for unseen graphs from the same family. |
| Researcher Affiliation | Academia | Ravichandra Addanki , Shaileshh Bojja Venkatakrishnan, Shreyan Gupta, Hongzi Mao, Mohammad Alizadeh MIT Computer Science and Artificial Intelligence Laboratory {addanki, bjjvnkt, shreyang, hongzi, alizadeh}@mit.edu |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks were found in the paper. |
| Open Source Code | Yes | We open-source our implementation, datasets and the simulator. 2 https://github.com/aravic/generalizable-device-placement |
| Open Datasets | Yes | We use TensorFlow to generate a computation graph given any neural network model... We evaluate our approach on computation graphs corresponding to the following three popular deep learning models: (1) Inception-V3 [23], (2) NMT [27], (3) NASNet [28]. We also evaluate on three synthetic datasets, each comprising of 32 graphs, spanning a wide range of graph sizes and structures. We refer to these datasets as cifar10, ptb and nmt. Graphs from cifar10 and ptb datasets are synthesized using an automatic model design approach called ENAS [19]. The nmt dataset is constructed by varying the RNN length and batch size hyperparameters of the NMT model [27]. |
| Dataset Splits | Yes | We randomly split these datasets for training and test purposes. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU models, memory amounts) are mentioned for the experiments. It only states "on real hardware" or mentions "2 GPUs" and "4 GPUs" without specifics. |
| Software Dependencies | No | The paper mentions "TensorFlow" but does not specify a version number. No other software libraries or solvers are mentioned with version numbers. |
| Experiment Setup | Yes | We consider an MDP where a state observation s comprises of a graph G(V, E) G with the following features on each node v V : (1) estimated run time of v, (2) total size of tensors output by v, (3) the current device placement of v, (4) a flag indicating whether v has been visited before, and (5) a flag indicating whether v is the current node for which the placement has to be updated... The episode ends in n|V | steps, after the placement of each node has been updated n times, where n is a tunable hyper-parameter... Further details on training of Placeto and the RNN-based approach including our choice of hyperparameter values are given in the Appendix A.7. |