reproducibilityindex.ai

Integrated Hardware Architecture and Device Placement Search

Authors: Irene Wang, Jakub Tarnawski, Amar Phanishayee, Divya Mahajan

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our approach achieves higher throughput on large language models compared to the state-of-the-art TPUv4 and the Spotlight accelerator search framework. and We evaluate PHAZE, the architecture search and solver, on a diverse set of large language models deployed in distributed training environments.
Researcher Affiliation	Collaboration	1Georgia Institute of Technology, GA, USA 2Microsoft Research, WA, USA.
Pseudocode	Yes	Algorithm 1 PHAZE workflow algorithm
Open Source Code	Yes	The entire source code of PHAZE is available at https://github.com/msr-fiddle/phaze.
Open Datasets	Yes	We obtain OPT (Zhang et al., 2022b), Bertlarge (Devlin et al., 2019), GPT2 (Radford et al., 2019), and Llama2-7B (Touvron et al., 2023) from the Hugging Face library (Wolf et al., 2019) and TMP graphs and hyper-parameters from public source code of Megatron-LM (Nvidia, b; Shoeybi et al., 2020).
Dataset Splits	No	The paper refers to training and evaluation but does not explicitly provide percentages or absolute counts for dataset splits like train/validation/test, nor does it refer to standard dataset splits for reproduction.
Hardware Specification	Yes	PHAZE is executed on a V100 GPU and a Dual AMD Epyc 7713 CPU at 2.0 GHz with 128 cores, running Ubuntu 20.04. The GPU runs CUDA 12.1 and is only used to extract the operator graphs.
Software Dependencies	Yes	The overall PHAZE process is executed using Python 3.8. The ILP formulations are solved using Gurobi 10.0.1 (Gurobi Optimization, 2019). The dynamic programming algorithm is implemented in C++, compiled with g++ version 11.3.0 and -O3 optimization flag. The GPU runs CUDA 12.1...
Experiment Setup	Yes	Table 1: Architecture and training search parameters explored in PHAZE for per device execution. ... Microbatch Size mbs 1 to 8 powers of 2 Activation Recomputation True/False. and PHAZE is optimized over 1024 accelerators and a global batch size of 4096.