Theory-Inspired Path-Regularized Differential Network Architecture Search

Authors: Pan Zhou, Caiming Xiong, Richard Socher, Steven Chu Hong Hoi

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on image classification tasks validate its advantages. Here we evaluate PR-DARTS on classification task and compare it with representative state-of-the-art NAS approaches
Researcher Affiliation Industry Pan Zhou Caiming Xiong Richard Socher Steven C.H. Hoi Salesforce Research {pzhou, cxiong, rsocher, shoi}@salesforce.com
Pseudocode Yes See optimization details in Algorithm 1 of Appendix A.
Open Source Code Yes Code is available at https://panzhous.github.io/.
Open Datasets Yes Datasets. CIAFR10 [40] and CIFAR100 [40] contain 50K training and 10K test images which are of size 32 32 and distribute over 10 classes in CIFAR10 and 100 classes in CIFAR100. Image Net [41] has 1.28M training and 50K test images which roughly equally distribute over 1K object categories.
Dataset Splits Yes We divide 50K training samples in CIFAR10 into two equal-sized training and validation datasets.
Hardware Specification Yes In merely 0.17 GPU-days on Tesla V100, PR-DARTS respectively achieves...
Software Dependencies No The paper mentions optimizers like SGD and ADAM but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes In PR-DARTS, we set λ1 =0.01, λ2 =0.005, and λ3 =0.005 for regularization. Then we train the network 200 epochs with mini-batch size 128. We set temperature τ =10 and linearly reduce it to 0.1, a= 0.1 and b=1.1. We train the network 600 epochs with a mini-batch size of 128 from scratch. We also use drop-path with probability 0.2 and cutout [46] with length 16, for regularization.