How to Characterize The Landscape of Overparameterized Convolutional Neural Networks

Authors: Yihong Gu, Weizhong Zhang, Cong Fang, Jason D. Lee, Tong Zhang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We start from using our proposed technique NNG to show that two overparameterized CNNs with same architecture and initialization method, though have different initializations at the beginning of training, learn the unique solution path during the whole training process, which could be explained by our theory in Section 5.1. Then we give some empirical evidences for the convexity of overparameterized CNN by showing the uniqueness of its optimal solution and visualizing its the loss landscape in Section 6.2. All our empirical findings are consistent across a range of architectures and datasets, we only present the results on CIFAR-10 with VGG-16 below and postpone the results on other architectures and datasets to appendix.
Researcher Affiliation Academia 1 Princeton University, 2 Hong Kong University of Science and Technology
Pseudocode Yes Detailed steps are given in Alg. 1 in Appendix. An optimization-based algorithm is designed to construct θγ, i.e., Alg. 2 in Appendix.
Open Source Code No The paper does not provide concrete access to source code for the methodology described, such as a repository link or an explicit statement of code release.
Open Datasets Yes All our empirical findings are consistent across a range of architectures and datasets, we only present the results on CIFAR-10 with VGG-16 below and postpone the results on other architectures and datasets to appendix.
Dataset Splits No The paper mentions "validation error" but does not specify the dataset splits (e.g., percentages, sample counts, or a citation to a predefined split) needed to reproduce the data partitioning for validation.
Hardware Specification No The paper does not provide specific details about the hardware used for running its experiments (e.g., exact CPU/GPU models, memory details, or specific cluster configurations).
Software Dependencies No The paper does not provide specific ancillary software dependencies with version numbers needed to replicate the experiment.
Experiment Setup Yes We use ℓ1,2 regularizer, and save intermediate checkpoints of NN parameters at time-step t {1, 2, 5, 8} S{10k : k N+} in the entire section. We let θ1 and θ2 to be VGG-16 trained after 2 epochs using he_uniform and he_normal initializers, attaining 19% and 57% accuracy respectively...