Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Information-theoretic Generalization Analysis for Expected Calibration Error
Authors: Futoshi Futami, Masahiro Fujisawa
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments using deep learning models show that our bounds are nonvacuous thanks to this information-theoretic generalization analysis approach. |
| Researcher Affiliation | Collaboration | Futoshi Futami Osaka University / RIKEN AIP EMAIL Masahiro Fujisawa RIKEN AIP EMAIL |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. It provides theoretical analyses and mathematical derivations. |
| Open Source Code | Yes | We submitted our source codes through Open Review. |
| Open Datasets | Yes | We further conducted two binary classification tasks on MNIST [25] using a convolutional neural network (CNN) and on CIFAR-10 [21] using Res Net. |
| Dataset Splits | No | The paper describes the use of training and test datasets but does not explicitly mention a validation set split percentage or count for experiments. |
| Hardware Specification | Yes | We used NVIDIA GPUs with 32GB memory (NVIDIA DGX-1 with Tesla V100 and DGX-2) for MNIST (SGLD) and CIFAR-10 experiments. We also used CPU (Apple M1) with 16GB memory for the other experiments. |
| Software Dependencies | No | The paper mentions using 'sklearn.feature_selection.mutual_info_classif function' but does not provide specific version numbers for this or other software libraries or dependencies. It states adapting code from a previous work but does not list its dependencies with versions. |
| Experiment Setup | Yes | Optimizer Adam with 0.001 learning rate and β1 = 0.9 SGLD with 0.004 learning rate (decaying by a factor 0.9 after each 100 iterations) Batch size 128 (for Adam) or 100 (for SGLD) Num. of training samples [75, 250, 1000, 4000] Num. of epochs 200 |