reproducibilityindex.ai

Large-scale probabilistic predictors with and without guarantees of validity

Authors: Vladimir Vovk, Ivan Petej, Valentina Fedorova

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper studies theoretically and empirically a method of turning machine-learning algorithms into probabilistic predictors that automatically enjoys a property of validity (perfect calibration) and is computationally efﬁcient. When these imprecise probabilities are merged into precise probabilities, the resulting predictors, while losing the theoretical property of perfect calibration, are consistently more accurate than the existing methods in empirical studies.
Researcher Affiliation	Collaboration	Department of Computer Science, Royal Holloway, University of London, UK Yandex, Moscow, Russia {volodya.vovk,ivan.petej,alushaf}@gmail.com
Pseudocode	Yes	Algorithm 1 CVAP(T, x) // cross-Venn Abers predictor for training set T 1: split the training set T into K folds T1, . . . , TK 2: for k {1, . . . , K} 3: (pk 0, pk 1) := IVAP(T \ Tk, Tk, x) 4: return GM(p1)/(GM(1 p0) + GM(p1))
Open Source Code	No	The paper states: 'our code being publicly available [9]'. However, reference [9] is to an arXiv technical report ('ar Xiv.org e-Print archive, November 2015. A full version of this paper.'), not a direct link to a code repository like GitHub, GitLab, or Bitbucket, nor does it explicitly state the code is provided as supplementary material with the paper.
Open Datasets	Yes	For illustrating our results in this paper we use the adult data set available from the UCI repository [18] (this is the main data set used in [6] and one of the data sets used in [8]).
Dataset Splits	Yes	We use the original split of the data set into a training set of Ntrain = 32, 561 observations and a test set of Ntest = 16, 281 observations. In the case of CVAPs, the training set is split into K equal (or as close to being equal as possible) contiguous folds: the ﬁrst Ntrain/K training observations are included in the ﬁrst fold, the next Ntrain/K (or Ntrain/K ) in the second fold, etc. (ﬁrst and then is used unless Ntrain is divisible by K). In the case of the other calibration methods, we used the ﬁrst K 1 / K Ntrain training observation as the proper training set (used for training the scoring algorithm) and the rest of the training observations are used as the calibration set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It describes the software and datasets but not the underlying computational resources.
Software Dependencies	No	The paper mentions software used, such as 'Weka [17]', 'MATLAB s Statistics toolbox', and 'R package fdrtool (namely, the function monoreg)'. However, it does not provide specific version numbers for these software components.
Experiment Setup	Yes	For each of the standard prediction algorithms within Weka that we use, we optimise the parameters by minimising the Brier loss on the calibration set, apart from the column labelled all. Most of the parameters are set to their default values, and the only parameters that are optimised are C (pruning conﬁdence) for J48 and J48 bagging, R (ridge) for logistic regression, L (learning rate) and M (momentum) for neural networks (Multilayer Perceptron), and C (complexity constant) for SVM (SMO, with the linear kernel); na ıve Bayes does not involve any parameters.