Should We Learn Most Likely Functions or Parameters?

Authors: Shikai Qiu, Tim G. J. Rudner, Sanyam Kapoor, Andrew G. Wilson

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our analysis, which includes theoretical insights as well as experiments with both carefully designed tractable models and neural networks, paints a complex picture and unearths distinct benefits and drawbacks of learning most likely functions instead of parameters.
Researcher Affiliation Academia Shikai Qiu Tim G. J. Rudner Sanyam Kapoor Andrew Gordon Wilson New York University
Pseudocode No No section or figure explicitly labeled "Pseudocode" or "Algorithm" was found.
Open Source Code Yes Our code is available at https://github.com/activatedgeek/function-space-map.
Open Datasets Yes UCI Regression. In Table 1, we report normalized test RMSE on UCI datasets [1]... Image Classification. In Table 2, we compare L-MAP and PS-MAP on image classification using a Res Net-18 [9]... Fashion MNIST [31] (D = KMNIST [4]), and CIFAR-10 [15] (D = CIFAR-100).
Dataset Splits Yes For each dataset, we tune hyperparameters based on validation RMSE, where the validation set is constructed by holding out 10% of training data.
Hardware Specification No The paper does not provide specific details on the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions optimizers (Adam [13], SGD) but does not specify version numbers for programming languages, libraries, or other software components.
Experiment Setup Yes For both PS-MAP and FS-MAP, we train with the Adam [13] optimizer with a learning rate 0.1 for 2,500 steps... We use an MLP with 3 hidden layers, 256 units, and Re LU activations. We train it with the Adam optimizer for 10,000 steps with a learning rate of 10-3... The parameter variance in the Laplacian estimator is fixed to β2 = 10-6... We use a learning rate of 0.1 with SGD and a cosine decay schedule over 50 epochs for Fashion MNIST and 200 epochs for CIFAR-10. The mini-batch size is fixed to 128.