Function-space Parameterization of Neural Networks for Sequential Learning

Authors: Aidan Scannell, Riccardo Mereu, Paul Edmund Chang, Ella Tamir, Joni Pajarinen, Arno Solin

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that we can retain knowledge in continual learning and incorporate new data efficiently. We further show its strengths in uncertainty quantification and guiding exploration in model-based RL.
Researcher Affiliation Academia Aidan Scannell , Riccardo Mereu , Paul Chang, Ella Tamir, Joni Pajarinen & Arno Solin Aalto University, Espoo, Finland {aidan.scannell,riccardo.mereu}@aalto.fi
Pseudocode Yes Algorithm A1 Compute SFR s sparse dual parameters
Open Source Code Yes Further information and code is available on the project website1. 1https://aaltoml.github.io/sfr
Open Datasets Yes We evaluate the effectiveness of SFR s sparse dual parameterization on eight UCI (Dua & Graff, 2017) classification tasks, two image classification tasks: Fashion-MNIST (FMNIST, Xiao et al., 2017) and CIFAR-10 (Krizhevsky et al., 2009), and the large-scale House Electric data set.
Dataset Splits Yes We used a two-layer MLP with width 50, tanh activation functions and a 70% (train) : 15% (validation) : 15% (test) data split.
Hardware Specification Yes We ran our experiments on a cluster and used a single GPU. The cluster is equipped with four AMD MI250X GPUs based on the 2nd Gen AMD CDNA architecture. A MI250x GPU is a multi-chip module (MCM) with two GPU dies named by AMD Graphics Compute Die (GCD). Each of these dies features 110 compute units (CU) and have access to a 64 GB slice of HBM memory for a total of 220 CUs and 128 GB total memory per MI250x module.
Software Dependencies No The paper mentions PyTorch (Paszke et al., 2019), hamiltorch, Laplace Redux library (Daxberger et al., 2021), Mammoth framework (Buzzega et al., 2020), FROMP codebase, and S-FSVI codebase, but no specific version numbers for these software dependencies are provided.
Experiment Setup Yes We used a two-layer MLP with width 50, tanh activation functions and a 70% (train) : 15% (validation) : 15% (test) data split. We trained the NN using Adam (Kingma & Ba, 2015) with a learning rate of 10 4 and a batch size of 128. Training was stopped when the validation loss stopped decreasing after 1000 steps. The checkpoint with the lowest validation loss was used as the NN MAP. Each experiment was run for 5 seeds.