Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Calibrating Neural Simulation-Based Inference with Differentiable Coverage Probability
Authors: Maciej Falkiewicz, Naoya Takeishi, Imahn Shekhzadeh, Antoine Wehenkel, Arnaud Delaunoy, Gilles Louppe, Alexandros Kalousis
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show on six benchmark problems that the proposed method achieves competitive or better results in terms of coverage and expected posterior density than the previously existing approaches. |
| Researcher Affiliation | Academia | 1Computer Science Department, University of Geneva 2HES-SO/HEG Genève 3The University of Tokyo 4RIKEN 5University of Liège |
| Pseudocode | Yes | Algorithm 1 Computing the regularizer loss with calibration objective. |
| Open Source Code | Yes | 1The code is available at https://github.com/DMML-Geneva/calibrated-posterior. |
| Open Datasets | Yes | In our experiments, we basically follow the experimental protocol introduced in Hermans et al. [20] for evaluating SBI methods. We focus on two prevailing amortized neural inference methods, i.e. NRE approximating the likelihood-to-evidence ratio and NPE using conditional NF as the underlying model. |
| Dataset Splits | No | The paper discusses training on 'training instances' and evaluating on 'test instances' but does not explicitly mention validation sets or specific train/validation/test splits with percentages or counts. |
| Hardware Specification | No | The computations were performed at the University of Geneva on 'Baobab' and 'Yggdrasil' HPC clusters. |
| Software Dependencies | No | The paper mentions software like PyTorch and the torchsort library, and the Adam W optimizer, but does not provide specific version numbers for any of these components, which are necessary for reproducible descriptions. |
| Experiment Setup | Yes | In the main experiments, we set the weight of the regularizer λ to 5, and the number of samples L to 16 for all benchmarks. |