Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis

Authors: Wuyang Chen, Wei Huang, Xinyu Gong, Boris Hanin, Zhangyang Wang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first experimentally verify our convergence analysis in Section 3.3. In all cases we use Re LU nonlinearities with Kaiming normal initialization [24]. We build the same three computational graphs of fully-connected layers in Figure 3. Three networks have hidden layers of a constant width of 1024. We train the network using SGD with a mini-batch of size 128. The learning rate is fixed at 1 10 5. No augmentation, weight decay, learning rate decay, or momentum is adopted.
Researcher Affiliation Academia Wuyang Chen University of Texas at Austin Wei Huang RIKEN AIP Xinyu Gong University of Texas at Austin Boris Hanin Princeton University Zhangyang Wang University of Texas at Austin
Pseudocode Yes We provide a pseudocode algorithm in Appendix A to demonstrate the usage of our method.
Open Source Code Yes Code is available at: https://github.com/VITA-Group/architecture_convergence.
Open Datasets Yes On both MNIST and CIFAR-10, the convergence rate of DAG#1 (Figure 3 left) is worse than DAG#2 (Figure 3 middle), and is further worse than DAG#3 (Figure 3 right).
Dataset Splits Yes The NAS-Bench-201 [17] provides 15,625 architectures that are stacked by repeated DAGs of four nodes (exactly the same DAG we considered in Section 3 and Figure 2). It contains architecture s performance on three datasets (CIFAR-10, CIFAR-100, Image Net-16-120 [15]) evaluated under a unified protocol (i.e. same learning rate, batch size, etc., for all architectures).
Hardware Specification Yes Recorded on a single GTX 1080Ti GPU.
Software Dependencies No The paper does not specify software dependencies with version numbers.
Experiment Setup Yes We train searched architectures for 250 epochs using SGD, with a learning rate as 0.5, a cosine scheduler, momentum as 0.9, weight decay as 3 10 5, and a batch size as 768. This setting follows previous works [1, 44, 69, 41, 66, 26, 12, 10].