reproducibilityindex.ai

Learning Generalizable Device Placement Algorithms for Distributed Machine Learning

Authors: ravichandra addanki, Shaileshh Bojja Venkatakrishnan, Shreyan Gupta, Hongzi Mao, Mohammad Alizadeh

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that Placeto requires up to 6.1 fewer training steps to find placements that are on par with or better than the best placements found by prior approaches. Moreover, Placeto is able to learn a generalizable placement policy for any given family of graphs, which can then be used without any retraining to predict optimized placements for unseen graphs from the same family.
Researcher Affiliation	Academia	Ravichandra Addanki , Shaileshh Bojja Venkatakrishnan, Shreyan Gupta, Hongzi Mao, Mohammad Alizadeh MIT Computer Science and Artificial Intelligence Laboratory {addanki, bjjvnkt, shreyang, hongzi, alizadeh}@mit.edu
Pseudocode	No	No pseudocode or clearly labeled algorithm blocks were found in the paper.
Open Source Code	Yes	We open-source our implementation, datasets and the simulator. 2 https://github.com/aravic/generalizable-device-placement
Open Datasets	Yes	We use TensorFlow to generate a computation graph given any neural network model... We evaluate our approach on computation graphs corresponding to the following three popular deep learning models: (1) Inception-V3 [23], (2) NMT [27], (3) NASNet [28]. We also evaluate on three synthetic datasets, each comprising of 32 graphs, spanning a wide range of graph sizes and structures. We refer to these datasets as cifar10, ptb and nmt. Graphs from cifar10 and ptb datasets are synthesized using an automatic model design approach called ENAS [19]. The nmt dataset is constructed by varying the RNN length and batch size hyperparameters of the NMT model [27].
Dataset Splits	Yes	We randomly split these datasets for training and test purposes.
Hardware Specification	No	No specific hardware details (e.g., GPU models, CPU models, memory amounts) are mentioned for the experiments. It only states "on real hardware" or mentions "2 GPUs" and "4 GPUs" without specifics.
Software Dependencies	No	The paper mentions "TensorFlow" but does not specify a version number. No other software libraries or solvers are mentioned with version numbers.
Experiment Setup	Yes	We consider an MDP where a state observation s comprises of a graph G(V, E) G with the following features on each node v V : (1) estimated run time of v, (2) total size of tensors output by v, (3) the current device placement of v, (4) a flag indicating whether v has been visited before, and (5) a flag indicating whether v is the current node for which the placement has to be updated... The episode ends in n\|V \| steps, after the placement of each node has been updated n times, where n is a tunable hyper-parameter... Further details on training of Placeto and the RNN-based approach including our choice of hyperparameter values are given in the Appendix A.7.