Function-Space Regularization in Neural Networks: A Probabilistic Perspective

Authors: Tim G. J. Rudner, Sanyam Kapoor, Shikai Qiu, Andrew Gordon Wilson

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate deterministic neural networks trained with the proposed regularized optimization objective on a broad range of standard classification, real-world domain adaption, and machine learning safety benchmarking tasks. We find that the proposed method successfully biases neural network training dynamics towards solutions that reflect the inductive biases of prior distributions over neural network functions, which can yield improved predictive performance and leads to significantly improved uncertainty quantification visa-vis standard parameter-space regularization and state-of-the-art function-space regularization methods. In Section 5, we present an empirical evaluation in which we compare highly-tuned parameterand functionspace regularization baselines to neural networks trained with FS-EB regularization and find that FS-EB yields (i) near-perfect semantic shift detection, (ii) highly-calibrated predictive uncertainty estimates, (iii) successful task adaption from pre-trained models, and (iv) improved generalization under covariate shift.
Researcher Affiliation Academia Tim G. J. Rudner 1 Sanyam Kapoor 1 Shikai Qiu 1 Andrew Gordon Wilson 1 1New York University, USA.
Pseudocode No The paper contains mathematical derivations and equations but no explicit blocks labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes The code for our experiments can be accessed at: https://github.com/timrudner/function-space-empirical-bayes.
Open Datasets Yes We evaluate deterministic neural networks trained with the proposed regularized optimization objective on a broad range of standard classification, real-world domain adaption, and machine learning safety benchmarking tasks... We evaluate empirical variational inference (FS-EB) along various dimensions generalization... We use Fashion MNIST (Xiao et al., 2017), CIFAR-10 (Krizhevsky, 2010), KMNIST (Clanuwat et al., 2018), SVHN (Netzer et al., 2011), Corrupted CIFAR-10 (Hendrycks & Dietterich, 2019), and Image Net (Russakovsky et al., 2014).
Dataset Splits Yes In Tables 8 and 9, we quantify the performance of FS-EB in the low-data regime. For various fractions (10%, 25%, 50%, 75%) of the full training dataset, we train both PS-MAP and FS-EB. In Table 7, we report the performance metrics for CIFAR-10 trained models evaluated on the CIFAR-10.1 test set.
Hardware Specification No The paper mentions training models but does not provide specific hardware details such as GPU models, CPU types, or cloud computing specifications used for experiments.
Software Dependencies No The paper describes hyperparameter ranges and training routines (e.g., 'momentum SGD') but does not specify software dependencies with version numbers, such as 'PyTorch 1.9' or 'TensorFlow 2.x'.
Experiment Setup Yes In Table 5, we provide the key hyperparameters used with FS-EB. We operate over the search space using randomized grid search. In addition to the learning rate η, cosine scheduler α, and weight decay used by standard PS-MAP, we use two more hyperparameters the prior variance τ-1f and the number of Monte Carlo samples J. (Table 5: LEARNING RATE η [10-10, 10-1], SCHEDULER α [0, 1], WEIGHT DECAY τ-1θ [10-10, 1], PRIOR VARIANCE τ-1f [10-7, 5·104], MONTE CARLO SAMPLES J {1, 2, 5, 10}).