Understanding and Robustifying Differentiable Architecture Search
Authors: Arber Zela, Thomas Elsken, Tonmoy Saikia, Yassine Marrakchi, Thomas Brox, Frank Hutter
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our observations are robust across five search spaces on three image classification tasks and also hold for the very different domains of disparity estimation (a dense regression task) and language modelling. and Table 1 (first column) confirms the very poor performance standard DARTS yields on all of these search spaces and on different datasets. |
| Researcher Affiliation | Collaboration | Arber Zela1, Thomas Elsken2,1, Tonmoy Saikia1, Yassine Marrakchi1, Thomas Brox1 & Frank Hutter1,2 1Department of Computer Science, University of Freiburg 2Bosch Center for Artificial Intelligence |
| Pseudocode | Yes | Algorithm 1: DARTS ADA |
| Open Source Code | Yes | We provide our implementation and scripts to facilitate reproducibility1. and 1 https://github.com/automl/Robust DARTS |
| Open Datasets | Yes | across three different datasets (CIFAR-10, CIFAR-100 and SVHN), language modelling (PTB), Flying Things3D dataset (Mayer et al., 2016), Sintel dataset ( Butler et al. (2012)). |
| Dataset Splits | Yes | The network weights and the architecture parameters are optimized on the training and validation set, respectively. and We ran DARTS on this search space three times for each dataset and compared its result to the baseline of Random Search with weight sharing (RS-ws) by Li & Talwalkar (2019). Figure 2 shows the test regret of the architectures selected by DARTS (blue) and RS-ws (green) throughout the search. DARTS manages to find an architecture close to the global minimum, but around epoch 40 the test performance deteriorated. Note that the search model validation error (dashed red line) did not deteriorate but rather converged, indicating that the architectural parameters are overfitting to the validation set. and selecting the architecture used for the final evaluation based on a validation run |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, cloud instances) are provided for running the experiments. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'TensorFlow 2.x') are explicitly mentioned in the paper. |
| Experiment Setup | Yes | On CIFAR-10, when scaling the Scheduled Drop Path drop probability, we use the same settings for training from scratch the found architectures as in the original DARTS paper, i.e. 36 initial filters and 20 stacked cells. However, for search space S2 and S4 we reduce the number of initial filters to 16 in order to avoid memory issues, since the cells found with more regularization usually are composed only with separable convolutions. When scaling the L2 factor on CIFAR-10 experiments we use 16 initial filters and 8 stacked cells, except the experiments on S1, where the settings are the same as in Liu et al. (2019), i.e. 36 initial filters and 20 stacked cells. and For training the search network, images are downsampled by a factor of two and trained for 300k mini-batch iterations. During search, we use SGD and ADAM to optimize the inner and outer objectives respectively. ... The extracted network is also trained for 300k mini-batch iterations but full resolution images are used. Here, ADAM is used for optimization and the learning rate is annealed to 0 from 1e 4, using a cosine decay schedule. |