Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets
Authors: Rohith Kuditipudi, Xiang Wang, Holden Lee, Yi Zhang, Zhiyuan Li, Wei Hu, Rong Ge, Sanjeev Arora
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our explanation holds for realistic multilayer nets, and experiments are presented to verify the theory. ... We now demonstrate that our assumptions and theoretical findings accurately characterize mode connectivity in practical settings. In particular, we empirically validate our claims using standard convolutional architectures for which we treat individual filters as the hidden units and apply channel-wise dropout (see Remark 1) trained on datasets such as CIFAR-10 and MNIST. |
| Researcher Affiliation | Academia | Rohith Kuditipudi Duke University rohith.kuditipudi@duke.edu Xiang Wang Duke University xwang@cs.duke.edu Holden Lee Princeton University holdenl@princeton.edu Yi Zhang Princeton University y.zhang@cs.princeton.edu Zhiyuan Li Princeton University zhiyuanli@cs.princeton.edu Wei Hu Princeton University huwei@cs.princeton.edu Sanjeev Arora Princeton University and Institute for Advanced Study arora@cs.princeton.edu Rong Ge Duke University rongge@cs.duke.edu |
| Pseudocode | Yes | Algorithm 1 Dropout (Ai, p) |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing its code or a link to a code repository. |
| Open Datasets | Yes | In particular, we empirically validate our claims using standard convolutional architectures for which we treat individual filters as the hidden units and apply channel-wise dropout (see Remark 1) trained on datasets such as CIFAR-10 and MNIST. |
| Dataset Splits | No | The paper mentions using MNIST and CIFAR-10 datasets and discusses training and test accuracy, but it does not specify the explicit percentages or sample counts for training, validation, and test splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., library names with versions) that would be needed to replicate the experiments. |
| Experiment Setup | Yes | We also demonstrate that the VGG-11 (Simonyan and Zisserman, 2014) architecture trained with channel-wise dropout (Tompson et al., 2015; Keshari et al., 2018) with p = 0.25 at the first three layers5 and p = 0.5 at the others on CIFAR-10 converges to a noise stable minima as measured by layer cushion, interlayer cushion, activation contraction and interlayer smoothness. |