Theory-Inspired Path-Regularized Differential Network Architecture Search
Authors: Pan Zhou, Caiming Xiong, Richard Socher, Steven Chu Hong Hoi
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on image classification tasks validate its advantages. Here we evaluate PR-DARTS on classification task and compare it with representative state-of-the-art NAS approaches |
| Researcher Affiliation | Industry | Pan Zhou Caiming Xiong Richard Socher Steven C.H. Hoi Salesforce Research {pzhou, cxiong, rsocher, shoi}@salesforce.com |
| Pseudocode | Yes | See optimization details in Algorithm 1 of Appendix A. |
| Open Source Code | Yes | Code is available at https://panzhous.github.io/. |
| Open Datasets | Yes | Datasets. CIAFR10 [40] and CIFAR100 [40] contain 50K training and 10K test images which are of size 32 32 and distribute over 10 classes in CIFAR10 and 100 classes in CIFAR100. Image Net [41] has 1.28M training and 50K test images which roughly equally distribute over 1K object categories. |
| Dataset Splits | Yes | We divide 50K training samples in CIFAR10 into two equal-sized training and validation datasets. |
| Hardware Specification | Yes | In merely 0.17 GPU-days on Tesla V100, PR-DARTS respectively achieves... |
| Software Dependencies | No | The paper mentions optimizers like SGD and ADAM but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | In PR-DARTS, we set λ1 =0.01, λ2 =0.005, and λ3 =0.005 for regularization. Then we train the network 200 epochs with mini-batch size 128. We set temperature τ =10 and linearly reduce it to 0.1, a= 0.1 and b=1.1. We train the network 600 epochs with a mini-batch size of 128 from scratch. We also use drop-path with probability 0.2 and cutout [46] with length 16, for regularization. |