Improved Generalization of Weight Space Networks via Augmentations

Authors: Aviv Shamsian, Aviv Navon, David W. Zhang, Yan Zhang, Ethan Fetaya, Gal Chechik, Haggai Maron

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on three types of INR datasets: grayscale images (FMNIST), color images (CIFAR10), and 3D shapes (Modelnet40). Our results indicate that data augmentation schemes, and specifically our proposed weight space Mix Up variants, can enhance the accuracy of weight space models by up to 18%, equivalent to using 10 times more training data.
Researcher Affiliation Collaboration 1Bar-Ilan University 2University of Amsterdam 3Samsung SAIT AI Lab, Montreal 4NVIDIA Research 5Technion.
Pseudocode No The paper describes methods and equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes To support future research and the reproducibility of our results, we made our source code and datasets publicly available at: https://github.com/AvivSham/deep-weight-space-augmentations.
Open Datasets Yes To address this issue, we present new INR classification benchmarks based on Model Net40 (Wu et al., 2015), Fashion-MNIST (Xiao et al., 2017), and CIFAR10 (Krizhevsky et al., 2009) datasets.
Dataset Splits Yes We split the INRs dataset into train, validation, and test sets of sizes 55K, 5K, and 10K respectively. Additionally, we utilize the validation set for early stopping, i.e. selecting the best model w.r.t validation accuracy.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper mentions specific optimizers like Adam W (Loshchilov & Hutter, 2017) and uses the SIREN (Sitzmann et al., 2020) architecture, but it does not list general software dependencies (e.g., Python, PyTorch, CUDA) with specific version numbers.
Experiment Setup Yes In all experiments, we use DWS (Navon et al., 2023b) network with 4 hidden layers and hidden dimension of 128. We optimized the network using a 5e 3 learning rate with Adam W (Loshchilov & Hutter, 2017) optimizer. For the GNN, we use the version of Relation Transformer presented in (Zhang et al., 2023) with 4 hidden layers, node dimension of 64, and edge dimension of 32. We optimized the network using a 1e 3 learning rate with Adam W (Loshchilov & Hutter, 2017) optimizer and a 1000 steps warmup schedule. We optimized the weight space architecture for 250 epochs for the Model Net40, and 300 epochs for the FMNIST and CIFAR10 INRs datasets. Additionally, we utilize the validation set for early stopping, i.e. selecting the best model w.r.t validation accuracy.