Graph Metanetworks for Processing Diverse Neural Architectures

Authors: Derek Lim, Haggai Maron, Marc T. Law, Jonathan Lorraine, James Lucas

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations show that our approach solves a variety of metanet tasks with diverse neural architectures. Empirically, we can process diverse neural architectures, including layers that appear in state-of-the-art models, and we outperform existing metanetwork baselines across all tasks that we evaluated. 4 EXPERIMENTS
Researcher Affiliation Collaboration Derek Lim MIT CSAIL dereklim@mit.edu Haggai Maron Technion / NVIDIA hmaron@nvidia.com Marc T. Law NVIDIA marcl@nvidia.com Jonathan Lorraine NVIDIA jlorraine@nvidia.com James Lucas NVIDIA jalucas@nvidia.com
Pseudocode No The paper describes algorithms and processes in text and mathematical formulations but does not contain a dedicated pseudocode or algorithm block.
Open Source Code No We plan to release the code for constructing parameter graphs at a later date.
Open Datasets Yes We consider image classification neural networks trained on the CIFAR-10 dataset (Krizhevsky, 2009)
Dataset Splits Yes Then 2000 random networks are selected for validation, and the rest are for testing.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or specific cloud instances used for its experiments.
Software Dependencies No The paper mentions PyTorch, FFCV library, and Adam optimizer with citations, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We train all metanets with the Adam optimizer (Kingma & Ba, 2014). The training loss is a binary cross entropy loss between the predicted and true accuracy; each metanet has a sigmoid nonlinearity at the end to ensure that its input is within [0, 1]. We train the metanets for 50 000 iterations with a batch size of 32, using the Adam optimizer with .001 learning rate. (from E.2), and Table 5 lists Learning rate, Weight decay, Label smoothing, Optimizer for CIFAR-10 image classifiers.