reproducibilityindex.ai

Optimizing Mode Connectivity via Neuron Alignment

Authors: Norman Tatro, Pin-Yu Chen, Payel Das, Igor Melnyk, Prasanna Sattigeri, Rongjie Lai

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically verify that the permutation given by alignment is locally optimal via a proximal alternating minimization scheme. Empirically, optimizing the weight permutation is critical for efﬁciently learning a simple, planar, low-loss curve between networks that successfully generalizes. Our alignment method can signiﬁcantly alleviate the recently identiﬁed robust loss barrier on the path connecting two adversarial robust models and ﬁnd more robust and accurate models on the path. Code is available at https://github.com/IBM/Neuron Alignment.
Researcher Affiliation	Collaboration	N. Joseph Tatro Dept. of Mathematical Sciences Rensselaer Polytechnic Institute Troy, NY tatron@rpi.edu, Pin-Yu Chen IBM Research Yorktown Heights, NY pin-yu.chen@ibm.com, Payel Das IBM Research Yorktown Heights, NY daspa@us.ibm.com, Igor Melnyk IBM Research Yorktown Heights, NY igor.melnyk@ibm.com, Prasanna Sattigeri IBM Research Yorktown Heights, NY psattig@us.ibm.com, Rongjie Lai Dept. of Mathematical Sciences Rensselaer Polytechnic Institute Troy, NY lair@rpi.edu
Pseudocode	Yes	Algorithm 1: Permutation via Neuron Alignment
Open Source Code	Yes	Code is available at https://github.com/IBM/Neuron Alignment.
Open Datasets	Yes	We trained neural networks to classify images from CIFAR10 and CIFAR100 (Krizhevsky et al., 2009), as well as Tiny Image Net (Deng et al., 2009).
Dataset Splits	Yes	The default training and test set splits are used for each dataset. 20% of the images in the training set are used for computing alignments between pairs of models.
Hardware Specification	Yes	Models were trained on NVIDIA 2080 Ti GPUs.
Software Dependencies	No	The paper does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	We set a learning rate of 1E-1 that decays by a factor of 0.5 every 20 epochs. Weight decay of 5E-4 was used for regularization. Each model was trained for 250 epochs, and all models were seen to converge. Curves are trained for 250 epochs using SGD with a learning rate of 1E-2 and a batch size of 128. The rate anneals by a factor of 0.5 every 20 epochs.