Magnitude Invariant Parametrizations Improve Hypernetwork Learning
Authors: Jose Javier Gonzalez Ortiz, John Guttag, Adrian V Dalca
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the proposed solution on several hypernetwork tasks, where it consistently stabilizes training and achieves faster convergence. Furthermore, we perform a comprehensive ablation study including choices of activation function, normalization strategies, input dimensionality, and hypernetwork architecture; and find that MIP improves training in all scenarios. |
| Researcher Affiliation | Academia | Jose Javier Gonzalez Ortiz MIT CSAIL Cambridge, MA josejg@mit.edu John Guttag MIT CSAIL Cambridge, MA guttag@mit.edu Adrian V. Dalca MIT CSAIL & MGH, HMS Cambridge, MA adalca@mit.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release our implementation as an open-source PyTorch library, Hyper Light. ... Anonymized source code is available at https://github.com/anonresearcher8/hyperlight. |
| Open Datasets | Yes | MNIST. We train models on the MNIST digit classification task. ... Oxford Flowers-102. We use the Oxford Flowers-102 dataset, a fine-grained vision classification dataset with 8,189 examples from 102 flower categories (Nilsback & Zisserman, 2006). ... OASIS We use a version of the open-access OASIS Brains dataset (Hoopes et al., 2022; Marcus et al., 2007) |
| Dataset Splits | Yes | For the MNIST database of handwritten digits, we use the official train-test split for training data, and further divide the training split into training and validation using a stratified 80%-20% split. ... For OASIS, we use 64%, 16% and 20% splits for training, validation and test. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'PyTorch' as the library for their implementation ('our PyTorch hypernetwork framework') but does not specify its version number or versions for other software dependencies. |
| Experiment Setup | Yes | We use two popular choices of optimizer: SGD with Nesterov momentum, and Adam. We search over a range of initial learning rates and report the best performing models; further details are included in section B of the supplement. ... Unless specified otherwise, the hypernetwork architecture has two hidden layers with 16 and 128 neurons respectively. |