Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Bayesian Model Selection, the Marginal Likelihood, and Generalization
Authors: Sanae Lotfi, Pavel Izmailov, Gregory Benton, Micah Goldblum, Andrew Gordon Wilson
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate the correlation between the log marginal likelihood (LML) and generalization in the context of image classification using the CIFAR-10 and CIFAR-100 datasets. |
| Researcher Affiliation | Academia | Sanae Lotfi 1 Pavel Izmailov 1 Gregory Benton 1 Micah Goldblum 1 Andrew Gordon Wilson 1 1New York University. Correspondence to: Sanae Lotfi <EMAIL>, Andrew Gordon Wilson <EMAIL>. |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of open-source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | We investigate the correlation between the log marginal likelihood (LML) and generalization in the context of image classification using the CIFAR-10 and CIFAR-100 datasets. [...] In UCI regression tasks, we examine the performance of LML vs CLML in terms of test performance when training with limited amounts of training data. [...] we train on the Omniglot dataset and test on the EMNIST dataset. |
| Dataset Splits | Yes | We train a model on 80% of the training data, and fit the LA approximation on the same subset of the data. [...] We choose the value of T that achieves the highest BMA accuracy (average over 20 samples) on 5% of the training data. [...] The CLML is computed using a 80% 20% split of the training data as described in detail in Section D. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions general software components like 'SGD' and 'Adam optimizer' but does not specify their versions or the versions of other key software libraries or dependencies. |
| Experiment Setup | Yes | All models were trained for 250 epochs with an SGD optimizer and an initial learning rate of 0.01. The batch-size was fixed to 128. For experiments where the prior precision was optimized, we used online optimization where the prior precision was updated every 5 epochs for 100 iterations using an Adam optimizer with an initial learning equal to 1.0. |