reproducibilityindex.ai

Predicting Training Time Without Training

Authors: Luca Zancato, Alessandro Achille, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we are able to predict training time of a Res Net within a 20% error margin on a variety of datasets and hyper-parameters, at a 30 to 45-fold reduction in cost compared to actual training.
Researcher Affiliation	Collaboration	Luca Zancato Department of Information Engineering University of Padova luca.zancato@phd.unipd.it Alessandro Achille Amazon Web Services aachille@amazon.com Avinash Ravichandran Amazon Web Services ravinash@amazon.com Rahul Bhotika Amazon Web Services bhotikar@amazon.com Stefano Soatto Amazon Web Services soattos@amazon.com
Pseudocode	No	The paper refers to a "complete algorithm" in the Supplementary Material but does not provide pseudocode or an algorithm block within the main content of the paper.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or links to a code repository for the methodology described.
Open Datasets	Yes	For all the experiments we extract 5 random classes from each dataset (Table 1) and sample 150 images (or the maximum available for the speciﬁc dataset). Then we ﬁne-tuned Res Net18/34 using either GD or SGD. Table 1 lists datasets: Cars [18], Surfaces [4], Mit67 [26], Aircrafts [24], CUB200 [30], CIFAR100 [19], CIFAR10 [19].
Dataset Splits	No	The paper defines and measures 'training time' based on the loss on the training set and a normalized threshold, but it does not specify explicit train/validation/test splits for the datasets themselves or for validating the model's performance on unseen data.
Hardware Specification	No	The paper mentions that experiments were run on GPU and CPU, stating 'Our method is 30-40 times faster. Moreover, we note that it can be run completely on CPU without a drastic drop in performance. This allows to cheaply estimate TT and allocate/manage resources even without access to a GPU.' However, it does not provide specific GPU/CPU models or detailed hardware specifications.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., Python version, PyTorch/TensorFlow version, specific library versions) required to replicate the experiments.
Experiment Setup	Yes	Each task is obtained by randomly sampling a subset of ﬁve classes with 150 images (when possible) each from one popular dataset with different hyperparameters (batch size, learning rate). The closer the scatter plots to the bisector the better the TT estimate. Our prediction is (a) within 13% of the real training time 95% of the times when using GD and (b) within 20% of the real training time when using SGD. We ﬁne-tuned Res Net18/34 using either GD or SGD.