Magnitude Invariant Parametrizations Improve Hypernetwork Learning

Authors: Jose Javier Gonzalez Ortiz, John Guttag, Adrian V Dalca

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the proposed solution on several hypernetwork tasks, where it consistently stabilizes training and achieves faster convergence. Furthermore, we perform a comprehensive ablation study including choices of activation function, normalization strategies, input dimensionality, and hypernetwork architecture; and find that MIP improves training in all scenarios.
Researcher Affiliation Academia Jose Javier Gonzalez Ortiz MIT CSAIL Cambridge, MA josejg@mit.edu John Guttag MIT CSAIL Cambridge, MA guttag@mit.edu Adrian V. Dalca MIT CSAIL & MGH, HMS Cambridge, MA adalca@mit.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes We release our implementation as an open-source PyTorch library, Hyper Light. ... Anonymized source code is available at https://github.com/anonresearcher8/hyperlight.
Open Datasets Yes MNIST. We train models on the MNIST digit classification task. ... Oxford Flowers-102. We use the Oxford Flowers-102 dataset, a fine-grained vision classification dataset with 8,189 examples from 102 flower categories (Nilsback & Zisserman, 2006). ... OASIS We use a version of the open-access OASIS Brains dataset (Hoopes et al., 2022; Marcus et al., 2007)
Dataset Splits Yes For the MNIST database of handwritten digits, we use the official train-test split for training data, and further divide the training split into training and validation using a stratified 80%-20% split. ... For OASIS, we use 64%, 16% and 20% splits for training, validation and test.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No The paper mentions 'PyTorch' as the library for their implementation ('our PyTorch hypernetwork framework') but does not specify its version number or versions for other software dependencies.
Experiment Setup Yes We use two popular choices of optimizer: SGD with Nesterov momentum, and Adam. We search over a range of initial learning rates and report the best performing models; further details are included in section B of the supplement. ... Unless specified otherwise, the hypernetwork architecture has two hidden layers with 16 and 128 neurons respectively.