Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets

Authors: Rohith Kuditipudi, Xiang Wang, Holden Lee, Yi Zhang, Zhiyuan Li, Wei Hu, Rong Ge, Sanjeev Arora

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our explanation holds for realistic multilayer nets, and experiments are presented to verify the theory. ... We now demonstrate that our assumptions and theoretical findings accurately characterize mode connectivity in practical settings. In particular, we empirically validate our claims using standard convolutional architectures for which we treat individual filters as the hidden units and apply channel-wise dropout (see Remark 1) trained on datasets such as CIFAR-10 and MNIST.
Researcher Affiliation Academia Rohith Kuditipudi Duke University rohith.kuditipudi@duke.edu Xiang Wang Duke University xwang@cs.duke.edu Holden Lee Princeton University holdenl@princeton.edu Yi Zhang Princeton University y.zhang@cs.princeton.edu Zhiyuan Li Princeton University zhiyuanli@cs.princeton.edu Wei Hu Princeton University huwei@cs.princeton.edu Sanjeev Arora Princeton University and Institute for Advanced Study arora@cs.princeton.edu Rong Ge Duke University rongge@cs.duke.edu
Pseudocode Yes Algorithm 1 Dropout (Ai, p)
Open Source Code No The paper does not provide an explicit statement about open-sourcing its code or a link to a code repository.
Open Datasets Yes In particular, we empirically validate our claims using standard convolutional architectures for which we treat individual filters as the hidden units and apply channel-wise dropout (see Remark 1) trained on datasets such as CIFAR-10 and MNIST.
Dataset Splits No The paper mentions using MNIST and CIFAR-10 datasets and discusses training and test accuracy, but it does not specify the explicit percentages or sample counts for training, validation, and test splits.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., library names with versions) that would be needed to replicate the experiments.
Experiment Setup Yes We also demonstrate that the VGG-11 (Simonyan and Zisserman, 2014) architecture trained with channel-wise dropout (Tompson et al., 2015; Keshari et al., 2018) with p = 0.25 at the first three layers5 and p = 0.5 at the others on CIFAR-10 converges to a noise stable minima as measured by layer cushion, interlayer cushion, activation contraction and interlayer smoothness.