Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
CD-split and HPD-split: Efficient Conformal Regions in High Dimensions
Authors: Rafael Izbicki, Gilson Shimizu, Rafael B. Stern
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we provide new insights on CD-split by exploring its theoretical properties. In particular, we show that CD-split converges asymptotically to the oracle highest predictive density set and satisfies local and asymptotic conditional validity. We also present simulations that show how to tune CD-split. Finally, we introduce HPD-split, a variation of CD-split that requires less tuning, and show that it shares the same theoretical guarantees as CD-split. In a wide variety of our simulations, CD-split and HPD-split have better conditional coverage and yield smaller prediction regions than other methods. ... Section 5 uses several new experiments based on simulations to show how to tune CD-split and to compare CD-split and HPD-split to other existing methods. Section 6 demonstrates the performance of CD-split and HPD-split in an application of photometric redshift prediction. |
| Researcher Affiliation | Academia | Rafael Izbicki EMAIL Gilson Shimizu EMAIL Rafael B. Stern EMAIL Departamento de Estatística Universidade Federal de São Carlos São Carlos, SP 18052-780, Brazil |
| Pseudocode | Yes | Algorithm 1 CD-split Input: Data (xi, yi), i = 1,...,n, coverage level 1 α (0,1), algorithm B for fitting conditional density function, a partition of R+, I . Output: Prediction band for xn+1 Rd ... Algorithm 2 HPD-split Input: Data (xi,Yi), i = 1,...,n, coverage level 1 α (0,1), algorithm B for fitting conditional density function. Output: Prediction band for xn+1 Rd ... Algorithm 3 CD-split+ Input: Data (xi,Yi), i = 1,...,n, coverage level 1 α (0,1), algorithm B for fitting conditional density function, number of elements of the partition J. Output: Prediction band for xn+1 Rd |
| Open Source Code | Yes | R code for implementing CD-split, CD-split+ and HPD-split is available at: https://github.com/rizbicki/predictionBands |
| Open Datasets | Yes | CD-split+ and HPD-split are also applied to the MNIST data set (Le Cun et al., 1995). ... Here we construct prediction bands for redshifts using the Happy A dataset (Beck et al., 2017), which is designed to compare photometric redshift prediction algorithms. |
| Dataset Splits | Yes | The data is divided in three sets: 9% as potential future samples, 70% to estimate P(y|x), and 21% to calculate split residuals. ... We use 64,950 galaxies as the training set, 5,000 as the prediction set, and 5,000 for comparing the performance of conformal methods. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for its experiments, such as specific GPU or CPU models. It mentions models like random forests and neural networks, but not the hardware on which they were run. |
| Software Dependencies | No | The paper mentions software components and methods like "Flex Code (Izbicki and Lee, 2017; Dalmasso et al., 2020)", "random forests (Breiman, 2001)", "k-means++ (Arthur and Vassilvitskii, 2007)", and "R code". However, it does not provide specific version numbers for these software dependencies or the R language itself, which is required for reproducibility. |
| Experiment Setup | Yes | Each scenario runs 5,000 times and each predictive method uses a coverage level of 1 α = 90%. ... the feature space is divided in a partition of size n/100 ... The conditional density, P(Y = y|x), is estimated using a convolutional neural network. ... the conformal score is fit using a Gaussian mixture density neural network (Bishop, 1994) with three components, one hidden layer and 500 neurons. All methods use a marginal coverage level of 1 α = 80%. |