Equivariant Architectures for Learning in Deep Weight Spaces

Authors: Aviv Navon, Aviv Shamsian, Idan Achituve, Ethan Fetaya, Gal Chechik, Haggai Maron

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate DWSNets in two families of tasks. (1) First, taking input networks that represent data, like INRs (Park et al., 2019; Sitzmann et al., 2020). Specifically, we train a model to classify INRs based on the class of the image they represent or predict continuous properties of the objects they represent. (2) Second, taking input networks that represent standard input-output mappings such as image classifiers. We train a model to operate on these mappings and adapting them to new domains. We also perform additional experiments, for example predicting the generalization performance of an image classifier in Appendix K. Full experimental and technical details are discussed in Appendix J.
Researcher Affiliation Collaboration 1Bar-Ilan University, Ramat Gan, Israel 2Nvidia, Tel-Aviv, Israel.
Pseudocode No The paper describes operations and methods but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes To support future research and the reproducibility of our results, we made our source code and datasets publicly available at: https://github.com/Aviv Navon/DWSNets.
Open Datasets Yes MNIST (Le Cun et al., 1998) and Fashion-MNIST (Xiao et al., 2017). ... Using the CIFAR10 (Krizhevsky et al., 2009) dataset as the source domain...
Dataset Splits Yes We split each dataset into three data splits, namely train, test and validation sets. ... We use 800 INRs for training and 100, 100 INRs for testing and validation. ... We split the INR dataset into train, validation and test sets of sizes 55K, 5K, 10K respectively.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processors, or memory amounts used for running experiments.
Software Dependencies No The paper mentions software components like 'Adam optimizer', 'Re LU activations', 'Batch-Normalization (BN) layers', and 'Adam W' but does not specify their version numbers or the versions of broader software frameworks (e.g., Python, PyTorch).
Experiment Setup Yes Hyperparameter optimization and early stopping. For each learning setup and each method we search over the learning rate in {5e 3, 1e 3, 5e 4, 1e 4}. We select the best learning rate using the validation set. Additionally, we utilize the validation set for early stopping, i.e., select the best model w.r.t. validation metric. ... We train all methods using the Adam W (Loshchilov & Hutter, 2019) optimizer with a weight-decay of 5e 4. We repeat all experiments using 3 random seeds...