Confidence Sets and Hypothesis Testing in a Likelihood-Free Inference Setting
Authors: Niccolo Dalmasso, Rafael Izbicki, Ann Lee
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the efficacy of ACORE with both theoretical and empirical results. Our implementation is available on Github. In Section 4, we show empirical results connecting the power of the constructed hypothesis tests to the performance of the classifier. We consider two examples where the true likelihood is known. First we investigate how the power of ACORE and the size of the derived confidence sets depend on the performance of the classifier used in the odds ratio estimation (Section 3.1). We consider three classifiers: multilayer perceptron (MLP), nearest neighbor (NN) and quadratic discriminant analysis (QDA). For different values of B (sample size for estimating odds ratios), we compute the binary cross entropy (a measure of classifier performance), the power as a function of θ, and the size of the constructed confidence set. Table 2 summarizes results based on 100 repetitions. |
| Researcher Affiliation | Academia | Niccol o Dalmasso 1 Rafael Izbicki 2 Ann B. Lee 1 1Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, USA 2Department of Statistics, Federal University of S ao Carlos, S ao Paulo, Brazil. Correspondence to: Niccol o Dalmasso <ndalmass@stat.cmu.edu>. |
| Pseudocode | Yes | Algorithm 1 Estimate the critical value C for a level-α test of composite hypotheses H0 : θ Θ0 vs. H1 : θ Θ1. Algorithm 2 [Many Simple Null Hypotheses] Estimate the critical values Cθ0 for a level-α test of H0,θ0 : θ = θ0 vs. H1,θ0 : θ = θ0 for all θ0 Θ simultaneously. |
| Open Source Code | Yes | Our implementation is available on Github. |
| Open Datasets | No | The paper describes using a "stochastic forward simulator" (Fθ) to generate data for its examples (Poisson, GMM, HEP model), and states that it uses a "labeled training sample TB" and a "training sample T B". However, it does not refer to or provide access information for any pre-existing publicly available dataset in the conventional sense that would require a link or formal citation. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing. |
| Hardware Specification | Yes | More specifically, an 8-core Intel Xeon 3.33GHz X5680 CPU. |
| Software Dependencies | No | The paper mentions software like Pytorch and scikit-learn in the references, implying their use. However, it does not specify version numbers for these or any other software dependencies needed for replication. |
| Experiment Setup | Yes | We consider three classifiers: multilayer perceptron (MLP), nearest neighbor (NN) and quadratic discriminant analysis (QDA). For different values of B (sample size for estimating odds ratios), we compute the binary cross entropy. To compute the critical values in Algorithm 2, we use quantile gradient boosted trees and a large enough sample size B = 5000. For all 18 settings, the computation of one ACORE confidence set takes between 10 to 30 seconds on a single CPU. We use n = 10. We use a 5-layer deep neural network with B = 100000. |