Paleo: A Performance Model for Deep Neural Networks
Authors: Hang Qi, Evan R. Sparks, Ameet Talwalkar
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that PALEO is robust to the choice of network architecture, hardware, software, communication schemes, and parallelization strategies. We further demonstrate its ability to accurately model various recently published scalability results for CNNs such as Ni N, Inception and Alex Net. |
| Researcher Affiliation | Academia | Hang Qi UCLA hangqi@cs.ucla.edu Evan R. Sparks UC Berkeley sparks@cs.berkeley.edu Ameet Talwalkar UCLA ameet@cs.ucla.edu |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. It describes the model conceptually and mathematically. |
| Open Source Code | Yes | 1Open-sourced at https://github.com/Talwalkar Lab/paleo. |
| Open Datasets | Yes | We first compare PALEO-estimated runtimes with actual runtimes measured from Tensor Flow (Abadi et al., 2015) execution in two popular CNN architectures: the one-tower variant of Alex Net (Krizhevsky, 2014b) and the 16-layer VGG network (Simonyan & Zisserman, 2014). [...] Fire Caffe (Iandola et al., 2016) adopts the Tree All Reduce communication scheme when training a Ni N model (Lin et al., 2013) in data parallel settings [...] Murray et al. (2016) reported their results in synchronously training the Inception model (Szegedy et al., 2015b) [...] We profile a generative adversarial network (GAN) inspired by Radford et al. (2015) for the LSUN dataset. |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits (e.g., percentages or sample counts). It mentions batch sizes but no information on how the data was partitioned for training, validation, or testing. |
| Hardware Specification | Yes | Experiments are run on a NVIDIA TITAN X GPU with a 4 GB workspace limit. [...] Case 1: Net Ni N Device NVIDIA K20X [...] Case 2: Net Inception v3 Device NVIDIA K20 [...] Case 3: Net Alex Net Device NVIDIA K20 [...] in one of the experiments, their model completes 20 epochs of training after 100 hours when using 8 Tesla K40 s [...] each equipped with a NVIDIA K80 GPU. |
| Software Dependencies | Yes | Tensor Flow4 (Abadi et al., 2015) execution [...] 4Tensor Flow 0.9 with cu DNN 4 backend. |
| Experiment Setup | Yes | PALEO uses cu DNN heuristics to choose algorithms and the auto-tuning mechanism in Tensor Flow is disabled. Experiments are run on a NVIDIA TITAN X GPU with a 4 GB workspace limit. [...] Table 2: PALEO configurations used in the case studies. [...] Batch size 1024 [...] batch size 256 [...] batch size 2048. |