Equivariant Architectures for Learning in Deep Weight Spaces
Authors: Aviv Navon, Aviv Shamsian, Idan Achituve, Ethan Fetaya, Gal Chechik, Haggai Maron
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate DWSNets in two families of tasks. (1) First, taking input networks that represent data, like INRs (Park et al., 2019; Sitzmann et al., 2020). Specifically, we train a model to classify INRs based on the class of the image they represent or predict continuous properties of the objects they represent. (2) Second, taking input networks that represent standard input-output mappings such as image classifiers. We train a model to operate on these mappings and adapting them to new domains. We also perform additional experiments, for example predicting the generalization performance of an image classifier in Appendix K. Full experimental and technical details are discussed in Appendix J. |
| Researcher Affiliation | Collaboration | 1Bar-Ilan University, Ramat Gan, Israel 2Nvidia, Tel-Aviv, Israel. |
| Pseudocode | No | The paper describes operations and methods but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | To support future research and the reproducibility of our results, we made our source code and datasets publicly available at: https://github.com/Aviv Navon/DWSNets. |
| Open Datasets | Yes | MNIST (Le Cun et al., 1998) and Fashion-MNIST (Xiao et al., 2017). ... Using the CIFAR10 (Krizhevsky et al., 2009) dataset as the source domain... |
| Dataset Splits | Yes | We split each dataset into three data splits, namely train, test and validation sets. ... We use 800 INRs for training and 100, 100 INRs for testing and validation. ... We split the INR dataset into train, validation and test sets of sizes 55K, 5K, 10K respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processors, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer', 'Re LU activations', 'Batch-Normalization (BN) layers', and 'Adam W' but does not specify their version numbers or the versions of broader software frameworks (e.g., Python, PyTorch). |
| Experiment Setup | Yes | Hyperparameter optimization and early stopping. For each learning setup and each method we search over the learning rate in {5e 3, 1e 3, 5e 4, 1e 4}. We select the best learning rate using the validation set. Additionally, we utilize the validation set for early stopping, i.e., select the best model w.r.t. validation metric. ... We train all methods using the Adam W (Loshchilov & Hutter, 2019) optimizer with a weight-decay of 5e 4. We repeat all experiments using 3 random seeds... |