Function-space Parameterization of Neural Networks for Sequential Learning
Authors: Aidan Scannell, Riccardo Mereu, Paul Edmund Chang, Ella Tamir, Joni Pajarinen, Arno Solin
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that we can retain knowledge in continual learning and incorporate new data efficiently. We further show its strengths in uncertainty quantification and guiding exploration in model-based RL. |
| Researcher Affiliation | Academia | Aidan Scannell , Riccardo Mereu , Paul Chang, Ella Tamir, Joni Pajarinen & Arno Solin Aalto University, Espoo, Finland {aidan.scannell,riccardo.mereu}@aalto.fi |
| Pseudocode | Yes | Algorithm A1 Compute SFR s sparse dual parameters |
| Open Source Code | Yes | Further information and code is available on the project website1. 1https://aaltoml.github.io/sfr |
| Open Datasets | Yes | We evaluate the effectiveness of SFR s sparse dual parameterization on eight UCI (Dua & Graff, 2017) classification tasks, two image classification tasks: Fashion-MNIST (FMNIST, Xiao et al., 2017) and CIFAR-10 (Krizhevsky et al., 2009), and the large-scale House Electric data set. |
| Dataset Splits | Yes | We used a two-layer MLP with width 50, tanh activation functions and a 70% (train) : 15% (validation) : 15% (test) data split. |
| Hardware Specification | Yes | We ran our experiments on a cluster and used a single GPU. The cluster is equipped with four AMD MI250X GPUs based on the 2nd Gen AMD CDNA architecture. A MI250x GPU is a multi-chip module (MCM) with two GPU dies named by AMD Graphics Compute Die (GCD). Each of these dies features 110 compute units (CU) and have access to a 64 GB slice of HBM memory for a total of 220 CUs and 128 GB total memory per MI250x module. |
| Software Dependencies | No | The paper mentions PyTorch (Paszke et al., 2019), hamiltorch, Laplace Redux library (Daxberger et al., 2021), Mammoth framework (Buzzega et al., 2020), FROMP codebase, and S-FSVI codebase, but no specific version numbers for these software dependencies are provided. |
| Experiment Setup | Yes | We used a two-layer MLP with width 50, tanh activation functions and a 70% (train) : 15% (validation) : 15% (test) data split. We trained the NN using Adam (Kingma & Ba, 2015) with a learning rate of 10 4 and a batch size of 128. Training was stopped when the validation loss stopped decreasing after 1000 steps. The checkpoint with the lowest validation loss was used as the NN MAP. Each experiment was run for 5 seeds. |