Optimizing Mode Connectivity via Neuron Alignment
Authors: Norman Tatro, Pin-Yu Chen, Payel Das, Igor Melnyk, Prasanna Sattigeri, Rongjie Lai
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically verify that the permutation given by alignment is locally optimal via a proximal alternating minimization scheme. Empirically, optimizing the weight permutation is critical for efficiently learning a simple, planar, low-loss curve between networks that successfully generalizes. Our alignment method can significantly alleviate the recently identified robust loss barrier on the path connecting two adversarial robust models and find more robust and accurate models on the path. Code is available at https://github.com/IBM/Neuron Alignment. |
| Researcher Affiliation | Collaboration | N. Joseph Tatro Dept. of Mathematical Sciences Rensselaer Polytechnic Institute Troy, NY tatron@rpi.edu, Pin-Yu Chen IBM Research Yorktown Heights, NY pin-yu.chen@ibm.com, Payel Das IBM Research Yorktown Heights, NY daspa@us.ibm.com, Igor Melnyk IBM Research Yorktown Heights, NY igor.melnyk@ibm.com, Prasanna Sattigeri IBM Research Yorktown Heights, NY psattig@us.ibm.com, Rongjie Lai Dept. of Mathematical Sciences Rensselaer Polytechnic Institute Troy, NY lair@rpi.edu |
| Pseudocode | Yes | Algorithm 1: Permutation via Neuron Alignment |
| Open Source Code | Yes | Code is available at https://github.com/IBM/Neuron Alignment. |
| Open Datasets | Yes | We trained neural networks to classify images from CIFAR10 and CIFAR100 (Krizhevsky et al., 2009), as well as Tiny Image Net (Deng et al., 2009). |
| Dataset Splits | Yes | The default training and test set splits are used for each dataset. 20% of the images in the training set are used for computing alignments between pairs of models. |
| Hardware Specification | Yes | Models were trained on NVIDIA 2080 Ti GPUs. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | We set a learning rate of 1E-1 that decays by a factor of 0.5 every 20 epochs. Weight decay of 5E-4 was used for regularization. Each model was trained for 250 epochs, and all models were seen to converge. Curves are trained for 250 epochs using SGD with a learning rate of 1E-2 and a batch size of 128. The rate anneals by a factor of 0.5 every 20 epochs. |