Large-scale probabilistic predictors with and without guarantees of validity

Authors: Vladimir Vovk, Ivan Petej, Valentina Fedorova

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper studies theoretically and empirically a method of turning machine-learning algorithms into probabilistic predictors that automatically enjoys a property of validity (perfect calibration) and is computationally efficient. When these imprecise probabilities are merged into precise probabilities, the resulting predictors, while losing the theoretical property of perfect calibration, are consistently more accurate than the existing methods in empirical studies.
Researcher Affiliation Collaboration Department of Computer Science, Royal Holloway, University of London, UK Yandex, Moscow, Russia {volodya.vovk,ivan.petej,alushaf}@gmail.com
Pseudocode Yes Algorithm 1 CVAP(T, x) // cross-Venn Abers predictor for training set T 1: split the training set T into K folds T1, . . . , TK 2: for k {1, . . . , K} 3: (pk 0, pk 1) := IVAP(T \ Tk, Tk, x) 4: return GM(p1)/(GM(1 p0) + GM(p1))
Open Source Code No The paper states: 'our code being publicly available [9]'. However, reference [9] is to an arXiv technical report ('ar Xiv.org e-Print archive, November 2015. A full version of this paper.'), not a direct link to a code repository like GitHub, GitLab, or Bitbucket, nor does it explicitly state the code is provided as supplementary material with the paper.
Open Datasets Yes For illustrating our results in this paper we use the adult data set available from the UCI repository [18] (this is the main data set used in [6] and one of the data sets used in [8]).
Dataset Splits Yes We use the original split of the data set into a training set of Ntrain = 32, 561 observations and a test set of Ntest = 16, 281 observations. In the case of CVAPs, the training set is split into K equal (or as close to being equal as possible) contiguous folds: the first Ntrain/K training observations are included in the first fold, the next Ntrain/K (or Ntrain/K ) in the second fold, etc. (first and then is used unless Ntrain is divisible by K). In the case of the other calibration methods, we used the first K 1 / K Ntrain training observation as the proper training set (used for training the scoring algorithm) and the rest of the training observations are used as the calibration set.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It describes the software and datasets but not the underlying computational resources.
Software Dependencies No The paper mentions software used, such as 'Weka [17]', 'MATLAB s Statistics toolbox', and 'R package fdrtool (namely, the function monoreg)'. However, it does not provide specific version numbers for these software components.
Experiment Setup Yes For each of the standard prediction algorithms within Weka that we use, we optimise the parameters by minimising the Brier loss on the calibration set, apart from the column labelled all. Most of the parameters are set to their default values, and the only parameters that are optimised are C (pruning confidence) for J48 and J48 bagging, R (ridge) for logistic regression, L (learning rate) and M (momentum) for neural networks (Multilayer Perceptron), and C (complexity constant) for SVM (SMO, with the linear kernel); na ıve Bayes does not involve any parameters.