Calibrating Neural Simulation-Based Inference with Differentiable Coverage Probability
Authors: Maciej Falkiewicz, Naoya Takeishi, Imahn Shekhzadeh, Antoine Wehenkel, Arnaud Delaunoy, Gilles Louppe, Alexandros Kalousis
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show on six benchmark problems that the proposed method achieves competitive or better results in terms of coverage and expected posterior density than the previously existing approaches. |
| Researcher Affiliation | Academia | 1Computer Science Department, University of Geneva 2HES-SO/HEG Genève 3The University of Tokyo 4RIKEN 5University of Liège |
| Pseudocode | Yes | Algorithm 1 Computing the regularizer loss with calibration objective. |
| Open Source Code | Yes | 1The code is available at https://github.com/DMML-Geneva/calibrated-posterior. |
| Open Datasets | Yes | In our experiments, we basically follow the experimental protocol introduced in Hermans et al. [20] for evaluating SBI methods. We focus on two prevailing amortized neural inference methods, i.e. NRE approximating the likelihood-to-evidence ratio and NPE using conditional NF as the underlying model. |
| Dataset Splits | No | The paper discusses training on 'training instances' and evaluating on 'test instances' but does not explicitly mention validation sets or specific train/validation/test splits with percentages or counts. |
| Hardware Specification | No | The computations were performed at the University of Geneva on 'Baobab' and 'Yggdrasil' HPC clusters. |
| Software Dependencies | No | The paper mentions software like PyTorch and the torchsort library, and the Adam W optimizer, but does not provide specific version numbers for any of these components, which are necessary for reproducible descriptions. |
| Experiment Setup | Yes | In the main experiments, we set the weight of the regularizer λ to 5, and the number of samples L to 16 for all benchmarks. |