reproducibilityindex.ai

Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis

Authors: Wuyang Chen, Wei Huang, Xinyu Gong, Boris Hanin, Zhangyang Wang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We first experimentally verify our convergence analysis in Section 3.3. In all cases we use Re LU nonlinearities with Kaiming normal initialization [24]. We build the same three computational graphs of fully-connected layers in Figure 3. Three networks have hidden layers of a constant width of 1024. We train the network using SGD with a mini-batch of size 128. The learning rate is fixed at 1 10 5. No augmentation, weight decay, learning rate decay, or momentum is adopted.
Researcher Affiliation	Academia	Wuyang Chen University of Texas at Austin Wei Huang RIKEN AIP Xinyu Gong University of Texas at Austin Boris Hanin Princeton University Zhangyang Wang University of Texas at Austin
Pseudocode	Yes	We provide a pseudocode algorithm in Appendix A to demonstrate the usage of our method.
Open Source Code	Yes	Code is available at: https://github.com/VITA-Group/architecture_convergence.
Open Datasets	Yes	On both MNIST and CIFAR-10, the convergence rate of DAG#1 (Figure 3 left) is worse than DAG#2 (Figure 3 middle), and is further worse than DAG#3 (Figure 3 right).
Dataset Splits	Yes	The NAS-Bench-201 [17] provides 15,625 architectures that are stacked by repeated DAGs of four nodes (exactly the same DAG we considered in Section 3 and Figure 2). It contains architecture s performance on three datasets (CIFAR-10, CIFAR-100, Image Net-16-120 [15]) evaluated under a unified protocol (i.e. same learning rate, batch size, etc., for all architectures).
Hardware Specification	Yes	Recorded on a single GTX 1080Ti GPU.
Software Dependencies	No	The paper does not specify software dependencies with version numbers.
Experiment Setup	Yes	We train searched architectures for 250 epochs using SGD, with a learning rate as 0.5, a cosine scheduler, momentum as 0.9, weight decay as 3 10 5, and a batch size as 768. This setting follows previous works [1, 44, 69, 41, 66, 26, 12, 10].