Computing Full Conformal Prediction Set with Approximate Homotopy

Authors: Eugene Ndiaye, Ichiro Takeuchi

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the approximation of a full conformal prediction set for both linear and non-linear regression problems, using synthetic and real datasets that are publicly available in sklearn. We illustrated in Table 1 the computational cost of our proposed homotopy for Lasso regression, using vanilla coordinate descent (CD) optimization solvers in sklearn [19].
Researcher Affiliation Academia Eugene Ndiaye RIKEN Center for Advanced Intelligence Project eugene.ndiaye@riken.jp Ichiro Takeuchi Nagoya Institute of Technology takeuchi.ichiro@nitech.ac.jp
Pseudocode Yes Algorithm 1 ϵ-online_homotopy Input: Dn = {(x1, y1), , (xn, yn)}, xn+1, [ymin, ymax], ϵ0 < ϵ Initialization: zt0 = x n+1β where β is an ϵ0-solution for the problem (1) using only Dn repeat ztk+1 = ztk sϵ where sϵ = q 2 ν (ϵ ϵ0) if the loss is ν-smooth Get β(ztk+1) by minimizing Pztk+1 up to accuracy ϵ0 {warm started with β(ztk)} until [ymin, ymax] is covered Return: {ztk, β(ztk)}k [Tϵ]
Open Source Code Yes For reproducibility, our implementation is available in https://github.com/Eugene Ndiaye/homotopy_conformal_prediction
Open Datasets Yes We illustrate the approximation of a full conformal prediction set for both linear and non-linear regression problems, using synthetic and real datasets that are publicly available in sklearn. Computing a conformal set for a Lasso regression problem on a climate data set NCEP/NCAR Reanalysis [11]. Logcosh regression with ℓ2 2 regularization on Boston dataset (n = 506, p = 13). Linex regression on the Boston (resp. Diabetes) dataset with n = 506 observations and p = 13 features (resp. n = 442 and p = 10).
Dataset Splits Yes All experiments were conducted with a coverage level of 0.9 (α = 0.1) and a regularization parameter selected by cross-validation on a randomly separated training set (for real data, we used 33% of the data). averaged over 100 randomly held-out validation data sets.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions general computing environments implied by the use of libraries like sklearn.
Software Dependencies No The paper mentions using 'sklearn' and 'vanilla coordinate descent (CD) optimization solvers in sklearn [19]' but does not provide specific version numbers for these software components, which is required for reproducibility.
Experiment Setup Yes All experiments were conducted with a coverage level of 0.9 (α = 0.1) and a regularization parameter selected by cross-validation on a randomly separated training set (for real data, we used 33% of the data).