reproducibilityindex.ai

Understanding and Robustifying Differentiable Architecture Search

Authors: Arber Zela, Thomas Elsken, Tonmoy Saikia, Yassine Marrakchi, Thomas Brox, Frank Hutter

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our observations are robust across ﬁve search spaces on three image classiﬁcation tasks and also hold for the very different domains of disparity estimation (a dense regression task) and language modelling. and Table 1 (ﬁrst column) conﬁrms the very poor performance standard DARTS yields on all of these search spaces and on different datasets.
Researcher Affiliation	Collaboration	Arber Zela1, Thomas Elsken2,1, Tonmoy Saikia1, Yassine Marrakchi1, Thomas Brox1 & Frank Hutter1,2 1Department of Computer Science, University of Freiburg 2Bosch Center for Artiﬁcial Intelligence
Pseudocode	Yes	Algorithm 1: DARTS ADA
Open Source Code	Yes	We provide our implementation and scripts to facilitate reproducibility1. and 1 https://github.com/automl/Robust DARTS
Open Datasets	Yes	across three different datasets (CIFAR-10, CIFAR-100 and SVHN), language modelling (PTB), Flying Things3D dataset (Mayer et al., 2016), Sintel dataset ( Butler et al. (2012)).
Dataset Splits	Yes	The network weights and the architecture parameters are optimized on the training and validation set, respectively. and We ran DARTS on this search space three times for each dataset and compared its result to the baseline of Random Search with weight sharing (RS-ws) by Li & Talwalkar (2019). Figure 2 shows the test regret of the architectures selected by DARTS (blue) and RS-ws (green) throughout the search. DARTS manages to ﬁnd an architecture close to the global minimum, but around epoch 40 the test performance deteriorated. Note that the search model validation error (dashed red line) did not deteriorate but rather converged, indicating that the architectural parameters are overﬁtting to the validation set. and selecting the architecture used for the ﬁnal evaluation based on a validation run
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory, cloud instances) are provided for running the experiments.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'TensorFlow 2.x') are explicitly mentioned in the paper.
Experiment Setup	Yes	On CIFAR-10, when scaling the Scheduled Drop Path drop probability, we use the same settings for training from scratch the found architectures as in the original DARTS paper, i.e. 36 initial ﬁlters and 20 stacked cells. However, for search space S2 and S4 we reduce the number of initial ﬁlters to 16 in order to avoid memory issues, since the cells found with more regularization usually are composed only with separable convolutions. When scaling the L2 factor on CIFAR-10 experiments we use 16 initial ﬁlters and 8 stacked cells, except the experiments on S1, where the settings are the same as in Liu et al. (2019), i.e. 36 initial ﬁlters and 20 stacked cells. and For training the search network, images are downsampled by a factor of two and trained for 300k mini-batch iterations. During search, we use SGD and ADAM to optimize the inner and outer objectives respectively. ... The extracted network is also trained for 300k mini-batch iterations but full resolution images are used. Here, ADAM is used for optimization and the learning rate is annealed to 0 from 1e 4, using a cosine decay schedule.