$\Lambda$-DARTS: Mitigating Performance Collapse by Harmonizing Operation Selection among Cells

Authors: Sajad Movahedi, Melika Adabinejad, Ayyoob Imani, Arezou Keshavarz, Mostafa Dehghani, Azadeh Shakery, Babak N Araabi

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on six different search spaces and three different datasets show that our method (Λ-DARTS) does indeed prevent performance collapse, providing justification for our theoretical analysis and the proposed remedy.
Researcher Affiliation Collaboration 1University of Tehran, 2LMU Munich, 3Google Brain, 4 Institute for Research in Fundamental Sciences (IPM)
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. It uses mathematical equations and descriptive text.
Open Source Code Yes We have published our code at https://github.com/dr-faustus/Lambda-DARTS.
Open Datasets Yes Experimental results on six different search spaces and three different datasets show that our method (Λ-DARTS) does indeed prevent performance collapse, providing justification for our theoretical analysis and the proposed remedy. [...] NAS-Bench-201 (Dong & Yang, 2020) and DARTS (Liu et al., 2019) search spaces. [...] CIFAR-10 and CIFAR-100 datasets on the DARTS search space. [...] Image Net16-120 datasets can be attained by querrying the database provided by (Dong & Yang, 2020).
Dataset Splits Yes Lval(ω (α),α) [...] where Lval(.,.) is the loss function calculated over the validation set. [...] The search is performed on the CIFAR-10 dataset. [...] In Figure-1, we can see a clear correlation between the layer alignment and the accuracy of the architecture on CIFAR-10, which reaches its optimal point at around the 40th search epoch.
Hardware Specification Yes The cost of our method in terms of GPU days is 0.8 days on a GTX 1080 Ti GPU on the DARTS search space and CIFAR-10 dataset.
Software Dependencies No The paper mentions software components like "stochastic gradient descent with Nestrov momentum" and "Adam", and refers to using "the implementation of (Dong & Yang, 2020)", but it does not specify version numbers for any programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup Yes The learning rate is set to 0.025, gradually reduced to 0.001 using cosine scheduling. The weight decay and momentum are set to 0.0005 and 0.9, respectively. For architecture parameters (α), we use Adam with the learning rate set to 10 4 and the weight decay rate set to 0.001. The momentum values β1 and β2 are set to 0.5 and 0.999, respectively. We perform the search on CIFAR-10 for 100 epochs, and we set ϵ0 =0.0001 and λ=0.125 for this experiment. [...] For SVHN and CIFAR-100, we increase the batch size and the value of ϵ0 to 96 and 0.001, respectively.