Towards Scalable and Versatile Weight Space Learning

Authors: Konstantin Schürholt, Michael W. Mahoney, Damian Borth

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical evaluation demonstrates that SANE matches or exceeds state-of-the-art performance on several weight representation learning benchmarks, particularly in initialization for new tasks and larger Res Net architectures. We pretrain SANE following Alg. 1 on several populations of trained NN models, from the model zoo dataset (Schurholt et al., 2022c).
Researcher Affiliation Academia 1 AIML Lab, University of St.Gallen, St. Gallen, Switzerland 2 International Computer Science Institute, Berkeley, CA, USA 3 Lawrence Berkeley National Laboratory, Berkeley, CA, USA 4 Department of Statistics, University of California at Berkeley, CA, USA.
Pseudocode Yes Algorithm 1 SANE pretraining, Algorithm 2 SANE model embedding computation, Algorithm 3 Sampling models with SANE
Open Source Code Yes Code is available at github.com/HSG-AIML/SANE.
Open Datasets Yes We pretrain SANE following Alg. 1 on several populations of trained NN models, from the model zoo dataset (Schurholt et al., 2022c). The MNIST and SVHN zoos contain Le Net-style models... The slightly larger CIFAR-10 and STL-10 zoos use the same architecture... We also use the CIFAR10, CIFAR-100, and Tiny-Imagenet zoos containing Res Net-18 models (Schurholt et al., 2022c).
Dataset Splits Yes All zoos are split into training, validation, and test splits 70 : 15 : 15.
Hardware Specification No The paper mentions 'GPU memory load' and 'automatic mixed precision and flash attention (Dao et al., 2022) to enhance performance', which implies the use of GPUs. However, it does not specify any particular GPU models, CPU types, or other detailed hardware specifications.
Software Dependencies No The paper states: 'We build SANE in Py Torch (Paszke et al., 2019), using automatic mixed precision and flash attention (Dao et al., 2022) to enhance performance. We use ray.tune (Liaw et al., 2018) for hyperparameter optimization.' It also mentions 'FFCV (Leclerc et al., 2023)'. While libraries are named, specific version numbers for PyTorch, ray.tune, FFCV, or CUDA are not provided.
Experiment Setup Yes We train for 50 epochs using a One Cycle learning rate scheduler (Smith & Topin, 2018). Seeds are recorded to ensure reproducibility. In Table 8, we provide additional information on the training hyper-parameters for SANE on populations of small CNNs as well as Res Net18s. These values are the stable mean across all experiments, exact values can vary from population to population. Full experiment configurations are documented in the code.