Should We Learn Most Likely Functions or Parameters?
Authors: Shikai Qiu, Tim G. J. Rudner, Sanyam Kapoor, Andrew G. Wilson
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our analysis, which includes theoretical insights as well as experiments with both carefully designed tractable models and neural networks, paints a complex picture and unearths distinct benefits and drawbacks of learning most likely functions instead of parameters. |
| Researcher Affiliation | Academia | Shikai Qiu Tim G. J. Rudner Sanyam Kapoor Andrew Gordon Wilson New York University |
| Pseudocode | No | No section or figure explicitly labeled "Pseudocode" or "Algorithm" was found. |
| Open Source Code | Yes | Our code is available at https://github.com/activatedgeek/function-space-map. |
| Open Datasets | Yes | UCI Regression. In Table 1, we report normalized test RMSE on UCI datasets [1]... Image Classification. In Table 2, we compare L-MAP and PS-MAP on image classification using a Res Net-18 [9]... Fashion MNIST [31] (D = KMNIST [4]), and CIFAR-10 [15] (D = CIFAR-100). |
| Dataset Splits | Yes | For each dataset, we tune hyperparameters based on validation RMSE, where the validation set is constructed by holding out 10% of training data. |
| Hardware Specification | No | The paper does not provide specific details on the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions optimizers (Adam [13], SGD) but does not specify version numbers for programming languages, libraries, or other software components. |
| Experiment Setup | Yes | For both PS-MAP and FS-MAP, we train with the Adam [13] optimizer with a learning rate 0.1 for 2,500 steps... We use an MLP with 3 hidden layers, 256 units, and Re LU activations. We train it with the Adam optimizer for 10,000 steps with a learning rate of 10-3... The parameter variance in the Laplacian estimator is fixed to β2 = 10-6... We use a learning rate of 0.1 with SGD and a cosine decay schedule over 50 epochs for Fashion MNIST and 200 epochs for CIFAR-10. The mini-batch size is fixed to 128. |