Predicting Training Time Without Training
Authors: Luca Zancato, Alessandro Achille, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we are able to predict training time of a Res Net within a 20% error margin on a variety of datasets and hyper-parameters, at a 30 to 45-fold reduction in cost compared to actual training. |
| Researcher Affiliation | Collaboration | Luca Zancato Department of Information Engineering University of Padova luca.zancato@phd.unipd.it Alessandro Achille Amazon Web Services aachille@amazon.com Avinash Ravichandran Amazon Web Services ravinash@amazon.com Rahul Bhotika Amazon Web Services bhotikar@amazon.com Stefano Soatto Amazon Web Services soattos@amazon.com |
| Pseudocode | No | The paper refers to a "complete algorithm" in the Supplementary Material but does not provide pseudocode or an algorithm block within the main content of the paper. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | Yes | For all the experiments we extract 5 random classes from each dataset (Table 1) and sample 150 images (or the maximum available for the specific dataset). Then we fine-tuned Res Net18/34 using either GD or SGD. Table 1 lists datasets: Cars [18], Surfaces [4], Mit67 [26], Aircrafts [24], CUB200 [30], CIFAR100 [19], CIFAR10 [19]. |
| Dataset Splits | No | The paper defines and measures 'training time' based on the loss on the training set and a normalized threshold, but it does not specify explicit train/validation/test splits for the datasets themselves or for validating the model's performance on unseen data. |
| Hardware Specification | No | The paper mentions that experiments were run on GPU and CPU, stating 'Our method is 30-40 times faster. Moreover, we note that it can be run completely on CPU without a drastic drop in performance. This allows to cheaply estimate TT and allocate/manage resources even without access to a GPU.' However, it does not provide specific GPU/CPU models or detailed hardware specifications. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python version, PyTorch/TensorFlow version, specific library versions) required to replicate the experiments. |
| Experiment Setup | Yes | Each task is obtained by randomly sampling a subset of five classes with 150 images (when possible) each from one popular dataset with different hyperparameters (batch size, learning rate). The closer the scatter plots to the bisector the better the TT estimate. Our prediction is (a) within 13% of the real training time 95% of the times when using GD and (b) within 20% of the real training time when using SGD. We fine-tuned Res Net18/34 using either GD or SGD. |