Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis
Authors: Wuyang Chen, Wei Huang, Xinyu Gong, Boris Hanin, Zhangyang Wang
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first experimentally verify our convergence analysis in Section 3.3. In all cases we use Re LU nonlinearities with Kaiming normal initialization [24]. We build the same three computational graphs of fully-connected layers in Figure 3. Three networks have hidden layers of a constant width of 1024. We train the network using SGD with a mini-batch of size 128. The learning rate is fixed at 1 10 5. No augmentation, weight decay, learning rate decay, or momentum is adopted. |
| Researcher Affiliation | Academia | Wuyang Chen University of Texas at Austin Wei Huang RIKEN AIP Xinyu Gong University of Texas at Austin Boris Hanin Princeton University Zhangyang Wang University of Texas at Austin |
| Pseudocode | Yes | We provide a pseudocode algorithm in Appendix A to demonstrate the usage of our method. |
| Open Source Code | Yes | Code is available at: https://github.com/VITA-Group/architecture_convergence. |
| Open Datasets | Yes | On both MNIST and CIFAR-10, the convergence rate of DAG#1 (Figure 3 left) is worse than DAG#2 (Figure 3 middle), and is further worse than DAG#3 (Figure 3 right). |
| Dataset Splits | Yes | The NAS-Bench-201 [17] provides 15,625 architectures that are stacked by repeated DAGs of four nodes (exactly the same DAG we considered in Section 3 and Figure 2). It contains architecture s performance on three datasets (CIFAR-10, CIFAR-100, Image Net-16-120 [15]) evaluated under a unified protocol (i.e. same learning rate, batch size, etc., for all architectures). |
| Hardware Specification | Yes | Recorded on a single GTX 1080Ti GPU. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers. |
| Experiment Setup | Yes | We train searched architectures for 250 epochs using SGD, with a learning rate as 0.5, a cosine scheduler, momentum as 0.9, weight decay as 3 10 5, and a batch size as 768. This setting follows previous works [1, 44, 69, 41, 66, 26, 12, 10]. |