Obtaining Well Calibrated Probabilities Using Bayesian Binning

Authors: Mahdi Pakdaman Naeini, Gregory Cooper, Milos Hauskrecht

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The method is computationally tractable, and empirically accurate, as evidenced by the set of experiments reported here on both real and simulated datasets. This section describes the set of experiments that we performed to evaluate the performance of the proposed calibration method in comparison to other commonly used calibration methods: histogram binning, Platt s method, and isotonic regression.
Researcher Affiliation Academia Mahdi Pakdaman Naeini1, Gregory F. Cooper1,2, and Milos Hauskrecht1,3 1Intelligent Systems Program, University of Pittsburgh, PA, USA 2Department of Biomedical Informatics, University of Pittsburgh, PA, USA 3Computer Science Department, University of Pittsburgh, PA, USA
Pseudocode No The paper describes the mathematical formulation of the BBQ method and evaluation measures but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes An implementation of BBQ method can be found at the following address: https://github.com/pakdaman/calibration.git
Open Datasets Yes In terms of real data, we used 30 different real world binary classification data sets from the UCI and Lib SVM repository 4 (Bache and Lichman 2013; Chang and Lin 2011).
Dataset Splits No The data were divided into 1000 instances for training and calibrating the prediction model, and 1000 instances for testing the models. The paper mentions training and testing data splits but does not explicitly define a separate validation dataset split.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or cloud computing specifications.
Software Dependencies No The paper mentions using classifiers like Logistic Regression, SVM, and Naive Bayes, and refers to the UCI and Lib SVM repositories for datasets. However, it does not specify any software names with version numbers for libraries, frameworks, or programming languages used in the experiments.
Experiment Setup Yes We define the range of possible values of the number of bins as B { 3 N C , . . . , C 3 N}, where C is a constant that controls the number of binning models (C = 10 in our experiments). We set N = 2 in our experiments. We fix a small number ρ > 0 (ρ = 0.001 in our experiments). In computing these measures, the predictions are sorted and partitioned into K fixed number of bins (K = 10 in our experiments).