Equivariant Deep Weight Space Alignment

Authors: Aviv Navon, Aviv Shamsian, Ethan Fetaya, Gal Chechik, Nadav Dym, Haggai Maron

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results indicate that a feed-forward pass with DEEP-ALIGN produces better or equivalent alignments compared to those produced by current optimization algorithms. Additionally, our alignments can be used as an effective initialization for other methods, leading to improved solutions with a significant speedup in convergence.
Researcher Affiliation Collaboration 1Bar-Ilan University 2NVIDIA Research 3Technion.
Pseudocode No The paper describes the architecture and its components textually and with a diagram (Figure 2), but does not provide structured pseudocode or an algorithm block.
Open Source Code Yes To support future research and the reproducibility of our results, we made our source code and datasets publicly available at: https: //github.com/Aviv Navon/deep-align.
Open Datasets Yes We use six network datasets. Two datasets consist of MLP classifiers for MNIST and CIFAR10, and four datasets consist of CNN classifiers trained using CIFAR10 and STL10.
Dataset Splits Yes When using image datasets, we use the standard train-test split and allocate 10% of the training data for validation.
Hardware Specification Yes We compare DEEP-ALIGN to baselines by measuring the time required to align a pair of models in the CIFAR10 CNN and VGG classifiers datasets, and report the averaged alignment time using 1000 random pairs on a single A100 Nvidia GPU.
Software Dependencies No The paper mentions using the Adam W optimizer and various datasets but does not specify versions for programming languages, libraries, or other software dependencies.
Experiment Setup Yes We use a 4-hidden layer DEEP-ALIGN network with a hidden dimension of 64 and an output dimension of 128 from the FDW S block. We optimize our method with a learning rate of 5e 4 using the Adam W (Loshchilov & Hutter, 2017) optimizer.