On discriminative learning of prediction uncertainty

Authors: Vojtech Franc, Daniel Prusa

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimented with two methods to learn the classifier h(x) and two baselines to compute the uncertainty measure: i) Logistic-Regression estimating ˆp(y | x) so that the plug-in Bayes classifier and the conditional risk can be computed, ii) the SVMs, representing discriminative methods without probabilistic output, in which case the classification score with the maximal response was used as an uncertainty measure. Then, on top of the LR and SVM classifiers we learned the uncertainty measure by minimizing the proposed loss. We implemented both variants based on the stochastic gradient approximation and the regularized empirical risk minimization. The methods were evaluated on 10 classification problems with different sizes, number of features and number of classes. The goal was to minimize the classification error, hence we used the 0/1-loss ℓ(y, y ) = [[y = y ]].
Researcher Affiliation Academia 1Department of Cybernetics, Faculty of Electrical Engineering, Czech Technical University in Prague.
Pseudocode No The paper describes algorithmic steps in paragraph form, but it does not contain any structured pseudocode or algorithm blocks (e.g., labeled Algorithm 1).
Open Source Code No The paper does not provide any statement about releasing source code, nor does it include any links to a code repository.
Open Datasets Yes We selected 10 classification problems from UCI repository (Dua & Taniskidou, 2017) and lib SVM datasets (Chang & C.J.Lin, 2011). The datasets are summarized in Table 1.
Dataset Splits Yes Each dataset was randomly split 5 times into 5 subsets, Trn1/Val1/Trn2/Val2/Tst, in ratio 30/10/30/10/20 (up to COVTYPE with ratio 28/20/2/20/30).
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions software tools like "Logistic-Regression", "SVM", "MLP", "ADAM" (Kingma & Ba, 2015), and "batchnormalization" (Ioffe & C. Szegedy, 2015), but it does not specify any version numbers for these or other software dependencies.
Experiment Setup Yes The optimal C was selected from {1, 10, 100, 1000} based on the validation classification error. (...) The number of batches was P = 5, and the regularization constant C was selected from {1, 10, 100, 1000} based on the proposed loss (12) evaluated on the validation set. Finally, we used MLP with 1, 5, or 10 layers (the optimal number was selected based on the validation set) each having the same number of neurons as the input dimension d. The Re LU was used as the transfer function. (...) The parameters θ were learned by ADAM (Kingma & Ba, 2015) optimizing the convex loss function (20). To speed up convergence, we used the batchnormalization (Ioffe & C. Szegedy, 2015) placed after each fully-connected layer.