Improved Generalization of Weight Space Networks via Augmentations
Authors: Aviv Shamsian, Aviv Navon, David W. Zhang, Yan Zhang, Ethan Fetaya, Gal Chechik, Haggai Maron
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on three types of INR datasets: grayscale images (FMNIST), color images (CIFAR10), and 3D shapes (Modelnet40). Our results indicate that data augmentation schemes, and specifically our proposed weight space Mix Up variants, can enhance the accuracy of weight space models by up to 18%, equivalent to using 10 times more training data. |
| Researcher Affiliation | Collaboration | 1Bar-Ilan University 2University of Amsterdam 3Samsung SAIT AI Lab, Montreal 4NVIDIA Research 5Technion. |
| Pseudocode | No | The paper describes methods and equations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | To support future research and the reproducibility of our results, we made our source code and datasets publicly available at: https://github.com/AvivSham/deep-weight-space-augmentations. |
| Open Datasets | Yes | To address this issue, we present new INR classification benchmarks based on Model Net40 (Wu et al., 2015), Fashion-MNIST (Xiao et al., 2017), and CIFAR10 (Krizhevsky et al., 2009) datasets. |
| Dataset Splits | Yes | We split the INRs dataset into train, validation, and test sets of sizes 55K, 5K, and 10K respectively. Additionally, we utilize the validation set for early stopping, i.e. selecting the best model w.r.t validation accuracy. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions specific optimizers like Adam W (Loshchilov & Hutter, 2017) and uses the SIREN (Sitzmann et al., 2020) architecture, but it does not list general software dependencies (e.g., Python, PyTorch, CUDA) with specific version numbers. |
| Experiment Setup | Yes | In all experiments, we use DWS (Navon et al., 2023b) network with 4 hidden layers and hidden dimension of 128. We optimized the network using a 5e 3 learning rate with Adam W (Loshchilov & Hutter, 2017) optimizer. For the GNN, we use the version of Relation Transformer presented in (Zhang et al., 2023) with 4 hidden layers, node dimension of 64, and edge dimension of 32. We optimized the network using a 1e 3 learning rate with Adam W (Loshchilov & Hutter, 2017) optimizer and a 1000 steps warmup schedule. We optimized the weight space architecture for 250 epochs for the Model Net40, and 300 epochs for the FMNIST and CIFAR10 INRs datasets. Additionally, we utilize the validation set for early stopping, i.e. selecting the best model w.r.t validation accuracy. |