Contrastive Neural Ratio Estimation

Authors: Benjamin K Miller, Christoph Weniger, Patrick Forré

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate NRE-B and NRE-C in a fair comparison in several training regimes in Section 3. We perform a hyperparameter search on three simulators with tractable likelihood by benchmarking the behavior when (a) jointly drawn pairs (θ, x) are unlimited or when jointly drawn pairs (θ, x) are fixed but we (b) can draw from the prior p(θ) without limit or (c) are restricted to the initial pairs.
Researcher Affiliation Academia Benjamin Kurt Miller University of Amsterdam b.k.miller@uva.nl Christoph Weniger University of Amsterdam c.weniger@uva.nl Patrick Forré University of Amsterdam p.d.forre@uva.nl
Pseudocode No The paper includes mathematical formulations for loss functions and optimization but does not provide a distinct pseudocode block or algorithm section.
Open Source Code Yes 1The code for our project can be found at https://github.com/bkmi/cnre under the Apache License 2.0.
Open Datasets Yes On all hyperparameter searches we consider three simulators from the simulation-based inference benchmark, namely SLCP, Two Moons, and Gaussian Mixture [44].
Dataset Splits No The paper mentions using a "validation set" and that the metric is "estimated over the validation set versus training epochs" but does not provide specific details on the split percentages or exact counts for this set.
Hardware Specification Yes We thank the DAS-5 computing cluster for access to their Titan X GPUs.
Software Dependencies No This work uses numpy [26], scipy [72], seaborn [73], matplotlib [32], pandas [52, 74], pytorch [57], and jupyter [38].
Experiment Setup Yes Our surrogate models are parameterized by one of these architectures: Small NN is like the benchmark with 50 hidden units and two residual blocks. Large NN has 128 hidden units and three residual blocks. We use batch normalization, unlike the benchmark. We compare their performance on a grid of γ and K values. ... We generally use residual networks [28] with batch normalization [33] and train them using adam [37]. ... We applied the largest number of computationally practical contrastive parameters, namely K = 99, and set γ = 1.