A Hierarchical Model for Device Placement
Authors: Azalia Mirhoseini, Anna Goldie, Hieu Pham, Benoit Steiner, Quoc V. Le, Jeff Dean
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments with widely-used computer vision and natural language models show that our algorithm can find optimized, non-trivial placements for Tensor Flow computational graphs with over 80,000 operations. |
| Researcher Affiliation | Industry | {azalia,agoldie,hyhieu,bsteiner,qvl,jeff}@google.com |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | No explicit statement or link providing access to the source code for the methodology was found. |
| Open Datasets | Yes | For a fair comparison to previous state-of-the-art deep RL methods (Mirhoseini et al., 2017), we use the same model architectures (Inception-V3, RNNLM, and 2-Layer NMT models), hyperparameters and input data. In addition, we evaluate our model on a 152-layer Res Net (He et al., 2016) with Image Net data (Deng et al., 2009), as well as more complex NMT models with 4 and 8 layers. |
| Dataset Splits | No | The paper uses well-known models (e.g., Inception-V3, ResNet) and mentions ImageNet data, but does not provide specific training/validation/test dataset split percentages, sample counts, or explicit instructions for how to partition the data for reproducibility. |
| Hardware Specification | Yes | Our experiments are run on machines with 1 Intel Haswell 2300 CPU and up to 8 Nvidia Tesla K40 GPUs. |
| Software Dependencies | Yes | We use TensorFlow r1.3 to run our experiments. |
| Experiment Setup | Yes | We train both policies using Adam (Kingma & Ba, 2015) optimizer with a fixed learning rate of 0.1, gradient clipping of norm 1.0, tanh constant C = 5.0, and temperature T = 10.0. The number of Grouper and Placer samples in Eqs. 4 and 6 are m = 1 and k = 4, respectively. |