Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Measuring Stochastic Data Complexity with Boltzmann Influence Functions
Authors: Nathan Hoyen Ng, Roger Baker Grosse, Marzyeh Ghassemi
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally validate IF-COMP on uncertainty calibration, mislabel detection, and OOD detection tasks, where it consistently matches or beats strong baseline methods. |
| Researcher Affiliation | Academia | 1Massachusetts Institute of Technology 2University of Toronto 3Vector Institute. |
| Pseudocode | No | The paper describes the methodology using prose and mathematical equations, but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or a direct link to the open-source code for the methodology described. |
| Open Datasets | Yes | To verify that IF-COMP can accurately approximate the ground truth p NML parametric complexity on both in-distribution (ID) and out-of-distribution (OOD) samples, we fine-tune a CIFAR-10 (Krizhevsky, 2009) pre-trained Res Net-18 (He et al., 2016) model with the BPBO (12) on 20 random test images each from CIFAR-10, CIFAR-100, and MNIST (Deng, 2012). |
| Dataset Splits | Yes | For both CIFAR-10 and CIFAR-100 datasets we use a Res Net-18 model trained with the standard training procedure detailed in the section above, with early stopping calculated on a clean validation set. |
| Hardware Specification | Yes | All experiments were implemented in Py Torch and were run on single RTX6000 or A40 GPUs. |
| Software Dependencies | No | The paper states that experiments were 'implemented in Py Torch' but does not specify version numbers for PyTorch or any other software libraries used, which is necessary for a reproducible description of software dependencies. |
| Experiment Setup | Yes | CIFAR-10 ensemble models were trained with following standard training procedures using SGD with momentum of 0.9, weight decay of 0.0005, and a learning rate of 0.1 that decays by a factor of 5 at epochs 60, 120, and 160. |