Towards Scalable and Versatile Weight Space Learning
Authors: Konstantin Schürholt, Michael W. Mahoney, Damian Borth
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive empirical evaluation demonstrates that SANE matches or exceeds state-of-the-art performance on several weight representation learning benchmarks, particularly in initialization for new tasks and larger Res Net architectures. We pretrain SANE following Alg. 1 on several populations of trained NN models, from the model zoo dataset (Schurholt et al., 2022c). |
| Researcher Affiliation | Academia | 1 AIML Lab, University of St.Gallen, St. Gallen, Switzerland 2 International Computer Science Institute, Berkeley, CA, USA 3 Lawrence Berkeley National Laboratory, Berkeley, CA, USA 4 Department of Statistics, University of California at Berkeley, CA, USA. |
| Pseudocode | Yes | Algorithm 1 SANE pretraining, Algorithm 2 SANE model embedding computation, Algorithm 3 Sampling models with SANE |
| Open Source Code | Yes | Code is available at github.com/HSG-AIML/SANE. |
| Open Datasets | Yes | We pretrain SANE following Alg. 1 on several populations of trained NN models, from the model zoo dataset (Schurholt et al., 2022c). The MNIST and SVHN zoos contain Le Net-style models... The slightly larger CIFAR-10 and STL-10 zoos use the same architecture... We also use the CIFAR10, CIFAR-100, and Tiny-Imagenet zoos containing Res Net-18 models (Schurholt et al., 2022c). |
| Dataset Splits | Yes | All zoos are split into training, validation, and test splits 70 : 15 : 15. |
| Hardware Specification | No | The paper mentions 'GPU memory load' and 'automatic mixed precision and flash attention (Dao et al., 2022) to enhance performance', which implies the use of GPUs. However, it does not specify any particular GPU models, CPU types, or other detailed hardware specifications. |
| Software Dependencies | No | The paper states: 'We build SANE in Py Torch (Paszke et al., 2019), using automatic mixed precision and flash attention (Dao et al., 2022) to enhance performance. We use ray.tune (Liaw et al., 2018) for hyperparameter optimization.' It also mentions 'FFCV (Leclerc et al., 2023)'. While libraries are named, specific version numbers for PyTorch, ray.tune, FFCV, or CUDA are not provided. |
| Experiment Setup | Yes | We train for 50 epochs using a One Cycle learning rate scheduler (Smith & Topin, 2018). Seeds are recorded to ensure reproducibility. In Table 8, we provide additional information on the training hyper-parameters for SANE on populations of small CNNs as well as Res Net18s. These values are the stable mean across all experiments, exact values can vary from population to population. Full experiment configurations are documented in the code. |