Efficient Parametric Approximations of Neural Network Function Space Distance

Authors: Nikita Dhawan, Sicong Huang, Juhan Bae, Roger Baker Grosse

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically assess the effectiveness of LAFTR (our idealized method) and the BGLN (our practical algorithm) in approximating FSD as well as their usefulness for downstream tasks: continual learning and influence function estimation.
Researcher Affiliation Collaboration Nikita Dhawan 1 2 Sicong Huang 1 2 Juhan Bae 1 2 Roger Grosse 1 2 3 1University of Toronto 2Vector Institute 3Anthropic.
Pseudocode Yes Algorithm 1 A stochastic version of BGLN (BGLN-S) ... Algorithm 2 BGLN-D ... Algorithm 3 BGLN-S (Conv)
Open Source Code No The paper does not provide an explicit statement about the release of its source code or a direct link to a code repository for the methodology described.
Open Datasets Yes Datasets. Split MNIST consists of five binary prediction tasks to classify non-overlapping pairs of MNIST digits (Deng, 2012). Permuted MNIST is a sequence of ten tasks to classify ten digits, with a different fixed random permutation applied to the pixels of all training images for each task. Finally, Split CIFAR100 consists of six ten-way classification tasks, with the first being CIFAR10 (Krizhevsky et al., a), and subsequent ones containing ten non-overlapping classes each from the CIFAR100 dataset (Krizhevsky et al., b). ... We used Concrete, Energy, Housing, Kinetics, and Wine datasets from the UCI benchmark (Dua & Graff, 2017).
Dataset Splits No The paper mentions using a 'validation loss' for hyperparameter selection but does not provide specific percentages or counts for training, validation, or test dataset splits. While standard datasets are used, the specific splitting methodology or ratios are not detailed in the main text.
Hardware Specification No The paper mentions that 'Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute (www.vectorinstitute.ai/partners),' but does not provide specific details on the hardware used for the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions using 'SGD optimizer' and general software concepts but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific library versions) that would be needed for reproducibility.
Experiment Setup Yes Hyperparameters. We have performed a grid search over some key hyperparameters and used the ones that resulted in the best final average accuracy across all tasks. All hyperparameter search was done with random seed 42. ... For the learning rate, we used 0.001 for all CL experiments except the BGLN-S method for Split MNIST and BGLN-D method for Permuted MNIST, where we used 0.0001 instead. We used the same number of epochs on each CL task and the exact numbers are reported in Table 7. ... For each method and dataset, the scaling factor for FSD penalty, λFSD, is reported in Table 8. Similarly, batch size is reported in Table 9.