Post: Device Placement with Cross-Entropy Minimization and Proximal Policy Optimization
Authors: Yuanxiang Gao, Li Chen, Baochun Li
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have implemented Post in the Google Cloud platform, and our extensive experiments with several popular neural network training benchmarks have demonstrated clear evidence of superior performance: with the same amount of learning time, it leads to placements that have training times up to 63.7% shorter over the state-of-the-art. |
| Researcher Affiliation | Academia | Yuanxiang Gao1,2 Li Chen 3 Baochun Li 1 1 Department of Electrical and Computer Engineering, University of Toronto 2 School of Information and Communication Engineering, University of Electronic Science and Technology of China 3 School of Computing and Informatics, University of Louisiana at Lafayette yuanxiang@ece.utoronto.ca, li.chen@louisiana.edu, bli@ece.toronto.edu |
| Pseudocode | Yes | Algorithm 1 Post: Joint Policy Optimization 1: Initialize parameters u(0) as all zeros; Initialize t = 0; 2: for n = 1, 2, . . . , L do 3: Sample a placement d(n) f(d|u(t)); 4: Train the DNN under d(n) and recording T(d(n)); 5: if n%K == 0 and n%N = 0 then 6: Perform several (e.g., 10) stochastic gradient ascent steps w.r.t. the objective of proximal policy optimization in Eq. (11); 7: t = t + 1 8: end if 9: if n%N == 0 then 10: Solve the cross-entropy minimization using Eq. (10) to achieve a global minimum; 11: t = t + 1 12: Mix the new distribution f(dm|u(t+1) m ) = (1 ϵ)f(dm|u(t+1) m ) + ϵ 1 D, m; 13: end if 14: end for |
| Open Source Code | No | The paper does not provide a direct link or an explicit statement about the availability of the source code for the Post algorithm. |
| Open Datasets | Yes | Inception-V3 and Res Net are two popular deep convolutional neural networks trained on the Image Net dataset using a batch size of 32. RNNLM is a 4-layer RNN trained with a batch size of 64. NMT is a sequence-to-sequence encoder-decoder architecture trained on the WMT16 dataset with a batch size of 64. |
| Dataset Splits | No | The paper does not explicitly mention the use of a validation dataset split or provide details on how data was partitioned into training, validation, and test sets. It describes training for a number of steps to measure performance but not data splits for model evaluation. |
| Hardware Specification | Yes | We have conducted our experiments with 12 machines on the Google Cloud platform. Each machine is equipped with 26 GB of main memory, an Intel Broadwell 8-core CPU and 2, 4 or 8 NVIDIA Tesla K80 GPUs, each with 11 GB of memory. |
| Software Dependencies | No | The paper mentions using 'TensorFlow' for benchmarks but does not specify its version number or any other software dependencies with version details. |
| Experiment Setup | Yes | Our detailed setting of the parameters in Algorithm 1 is as follows: the learning rate of SGA is 1; the ratio for choosing promising placements is 0.1 (6 best placements out of 60 samples over 5 iterations); the exploration factor ϵ is 0.1, which is linearly reduced to zero during learning; the KL penalty coefficient β is initialized as 1 and adapted based on a target KL divergence of 0.03. |