reproducibilityindex.ai

Function-space Parameterization of Neural Networks for Sequential Learning

Authors: Aidan Scannell, Riccardo Mereu, Paul Edmund Chang, Ella Tamir, Joni Pajarinen, Arno Solin

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that we can retain knowledge in continual learning and incorporate new data efficiently. We further show its strengths in uncertainty quantification and guiding exploration in model-based RL.
Researcher Affiliation	Academia	Aidan Scannell , Riccardo Mereu , Paul Chang, Ella Tamir, Joni Pajarinen & Arno Solin Aalto University, Espoo, Finland {aidan.scannell,riccardo.mereu}@aalto.fi
Pseudocode	Yes	Algorithm A1 Compute SFR s sparse dual parameters
Open Source Code	Yes	Further information and code is available on the project website1. 1https://aaltoml.github.io/sfr
Open Datasets	Yes	We evaluate the effectiveness of SFR s sparse dual parameterization on eight UCI (Dua & Graff, 2017) classification tasks, two image classification tasks: Fashion-MNIST (FMNIST, Xiao et al., 2017) and CIFAR-10 (Krizhevsky et al., 2009), and the large-scale House Electric data set.
Dataset Splits	Yes	We used a two-layer MLP with width 50, tanh activation functions and a 70% (train) : 15% (validation) : 15% (test) data split.
Hardware Specification	Yes	We ran our experiments on a cluster and used a single GPU. The cluster is equipped with four AMD MI250X GPUs based on the 2nd Gen AMD CDNA architecture. A MI250x GPU is a multi-chip module (MCM) with two GPU dies named by AMD Graphics Compute Die (GCD). Each of these dies features 110 compute units (CU) and have access to a 64 GB slice of HBM memory for a total of 220 CUs and 128 GB total memory per MI250x module.
Software Dependencies	No	The paper mentions PyTorch (Paszke et al., 2019), hamiltorch, Laplace Redux library (Daxberger et al., 2021), Mammoth framework (Buzzega et al., 2020), FROMP codebase, and S-FSVI codebase, but no specific version numbers for these software dependencies are provided.
Experiment Setup	Yes	We used a two-layer MLP with width 50, tanh activation functions and a 70% (train) : 15% (validation) : 15% (test) data split. We trained the NN using Adam (Kingma & Ba, 2015) with a learning rate of 10 4 and a batch size of 128. Training was stopped when the validation loss stopped decreasing after 1000 steps. The checkpoint with the lowest validation loss was used as the NN MAP. Each experiment was run for 5 seeds.