Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Integrated Hardware Architecture and Device Placement Search
Authors: Irene Wang, Jakub Tarnawski, Amar Phanishayee, Divya Mahajan
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach achieves higher throughput on large language models compared to the state-of-the-art TPUv4 and the Spotlight accelerator search framework. and We evaluate PHAZE, the architecture search and solver, on a diverse set of large language models deployed in distributed training environments. |
| Researcher Affiliation | Collaboration | 1Georgia Institute of Technology, GA, USA 2Microsoft Research, WA, USA. |
| Pseudocode | Yes | Algorithm 1 PHAZE workflow algorithm |
| Open Source Code | Yes | The entire source code of PHAZE is available at https://github.com/msr-fiddle/phaze. |
| Open Datasets | Yes | We obtain OPT (Zhang et al., 2022b), Bertlarge (Devlin et al., 2019), GPT2 (Radford et al., 2019), and Llama2-7B (Touvron et al., 2023) from the Hugging Face library (Wolf et al., 2019) and TMP graphs and hyper-parameters from public source code of Megatron-LM (Nvidia, b; Shoeybi et al., 2020). |
| Dataset Splits | No | The paper refers to training and evaluation but does not explicitly provide percentages or absolute counts for dataset splits like train/validation/test, nor does it refer to standard dataset splits for reproduction. |
| Hardware Specification | Yes | PHAZE is executed on a V100 GPU and a Dual AMD Epyc 7713 CPU at 2.0 GHz with 128 cores, running Ubuntu 20.04. The GPU runs CUDA 12.1 and is only used to extract the operator graphs. |
| Software Dependencies | Yes | The overall PHAZE process is executed using Python 3.8. The ILP formulations are solved using Gurobi 10.0.1 (Gurobi Optimization, 2019). The dynamic programming algorithm is implemented in C++, compiled with g++ version 11.3.0 and -O3 optimization flag. The GPU runs CUDA 12.1... |
| Experiment Setup | Yes | Table 1: Architecture and training search parameters explored in PHAZE for per device execution. ... Microbatch Size mbs 1 to 8 powers of 2 Activation Recomputation True/False. and PHAZE is optimized over 1024 accelerators and a global batch size of 4096. |