Graph Metanetworks for Processing Diverse Neural Architectures
Authors: Derek Lim, Haggai Maron, Marc T. Law, Jonathan Lorraine, James Lucas
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations show that our approach solves a variety of metanet tasks with diverse neural architectures. Empirically, we can process diverse neural architectures, including layers that appear in state-of-the-art models, and we outperform existing metanetwork baselines across all tasks that we evaluated. 4 EXPERIMENTS |
| Researcher Affiliation | Collaboration | Derek Lim MIT CSAIL dereklim@mit.edu Haggai Maron Technion / NVIDIA hmaron@nvidia.com Marc T. Law NVIDIA marcl@nvidia.com Jonathan Lorraine NVIDIA jlorraine@nvidia.com James Lucas NVIDIA jalucas@nvidia.com |
| Pseudocode | No | The paper describes algorithms and processes in text and mathematical formulations but does not contain a dedicated pseudocode or algorithm block. |
| Open Source Code | No | We plan to release the code for constructing parameter graphs at a later date. |
| Open Datasets | Yes | We consider image classification neural networks trained on the CIFAR-10 dataset (Krizhevsky, 2009) |
| Dataset Splits | Yes | Then 2000 random networks are selected for validation, and the rest are for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or specific cloud instances used for its experiments. |
| Software Dependencies | No | The paper mentions PyTorch, FFCV library, and Adam optimizer with citations, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We train all metanets with the Adam optimizer (Kingma & Ba, 2014). The training loss is a binary cross entropy loss between the predicted and true accuracy; each metanet has a sigmoid nonlinearity at the end to ensure that its input is within [0, 1]. We train the metanets for 50 000 iterations with a batch size of 32, using the Adam optimizer with .001 learning rate. (from E.2), and Table 5 lists Learning rate, Weight decay, Label smoothing, Optimizer for CIFAR-10 image classifiers. |