Learning to Rank Learning Curves

Authors: Martin Wistuba, Tejaswini Pedapati

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare our ranking method with respect to a ranking measure against different methods on five different image classification and four tabular regression datasets. We also show that our method is capable of significantly accelerating neural architecture search (NAS) and hyperparameter optimization. Furthermore, we conduct several ablation studies to provide a better motivation of our model and its behavior.
Researcher Affiliation Industry 1IBM Research. Correspondence to: Martin Wistuba <martin.wistuba@ibm.com>.
Pseudocode Yes Algorithm 1 Early Termination Method
Open Source Code No The paper does not contain an explicit statement about making the source code for the described methodology publicly available, nor does it provide a link to a code repository.
Open Datasets Yes We compare our method to similar methods on five different datasets: CIFAR-10, CIFAR-100, Fashion-MNIST, Quickdraw, and SVHN. ... To create the meta-knowledge, we choose 200 architectures per dataset at random from the NASNet search space (Zoph et al., 2018)... For the experiments in Section 4.6 we rely on the tabular benchmark (Klein & Hutter, 2019).
Dataset Splits Yes We use the original train/test splits if available. Quickdraw has a total of 50 million data points and 345 classes. To reduce the training time, we select a subset of this dataset. We use 100 different randomly selected classes and choose 300 examples per class for the training split and 100 per class for the test split. 5,000 random data points of the training dataset serve as validation split for all datasets.
Hardware Specification No The paper mentions 'GPU hours' for computational effort (e.g., '20 GPU hours were searched', '36 GPU hours'), but does not specify any particular GPU model or other hardware specifications used for running the experiments.
Software Dependencies No The paper mentions software components and algorithms like Adam and CNNs, but does not specify version numbers for any programming languages, libraries, or frameworks used in the experiments (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Each architecture is trained for 100 epochs with stochastic gradient descent and cosine learning rate schedule without restart (Loshchilov & Hutter, 2017). ... All parameters of the layers in f are trained jointly by means of Adam (Kingma & Ba, 2015) by minimizing L = αLce + (1 α) Lrec (4) a weighted linear combination of the ranking loss (Equation (3)) and the reconstruction loss with α = 0.8. ... For our experiment we set δ = 0.45 which means that if the predicted probability that the new model is better than the best one is below 45%, the run is terminated early.