Generalization Properties of NAS under Activation and Skip Connection Search
Authors: Zhenyu Zhu, Fanghui Liu, Grigorios Chrysos, Volkan Cevher
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate our theoretical results, we conduct a series of experiments on NAS. Firstly, we simulate the NTK matrices under different depths in Appendix F.4 to verify the relationship between the minimum eigenvalue of NTK and the network depth L in Theorem 1. In sec. 5.1 we use the DARTS algorithm [Liu et al., 2019b] to conduct experiments on activation function search and skip connection search under the search space of Equation (1). Finally, we use the minimum eigenvalue of NTK to guide the training of NAS on the benchmark NAS-Bench-201 [Dong and Yang, 2020], with a comparison of recent NAS algorithms. |
| Researcher Affiliation | Academia | Zhenyu Zhu, Fanghui Liu, Grigorios G Chrysos, Volkan Cevher EPFL, Switzerland {[first name].[surname]}@epfl.ch |
| Pseudocode | Yes | Algorithm 1: SGD for training DNNs by NAS |
| Open Source Code | No | The code will be open-sourced upon the acceptance of the paper. |
| Open Datasets | Yes | We select Fashion-MNIST [Xiao et al., 2017] as a standard benchmark. NAS-Bench-201 [Dong and Yang, 2020] is a commonly used benchmark for NAS algorithm evaluation, which includes three datasets: a) CIFAR-10 [Krizhevsky et al., 2014], b) CIFAR-100 [Krizhevsky et al., 2014] and c) Image Net-16 [Chrabaszcz et al., 2017] for image classification. |
| Dataset Splits | Yes | Then, we conduct neural network training on the selected architecture by SGD. For ease of theoretical analysis, we employ the constant step-size SGD with one epoch and randomly choose the weight parameters during all the iterations, which is commonly used in deep learning theory [Cao and Gu, 2019, Zou et al., 2019]. Sequentially, the top-k best candidates architectures are chosen in KNAS and our Eigen-NAS, and then the best architecture is chosen by the validation error. |
| Hardware Specification | No | All our experiments are conducted on a single GPU in our internal cluster. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) are mentioned in the paper. |
| Experiment Setup | Yes | We conduct the experiment via DARTS on a feedforward neural network with L = 10 and m = 1024, with 5 runs. Gaussian initialization: W(1) l ∼ N(0, 1/m), l ∈ [L]. Input: search space S, data Dtr = {(xi, yi)N i=1}, step size γ and Flagmethod {Eigen NAS, DARTS, }. |