Minimax Classification with 0-1 Loss and Performance Guarantees

Authors: Santiago Mazuelas, Andrea Zanoni, Aritz Pérez

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also present MRCs finite-sample generalization bounds in terms of training size and smallest minimax risk, and show their competitive classification performance w.r.t. state-of-the-art techniques using benchmark datasets.
Researcher Affiliation Academia Santiago Mazuelas BCAM-Basque Center for Applied Mathematics and IKERBASQUE-Basque Foundation for Science Bilbao, Spain smazuelas@bcamath.org Andrea Zanoni École Polytechnique Fédérale de Lausanne Lausanne, Switzerland andrea.zanoni@epfl.ch Aritz Pérez BCAM-Basque Center for Applied Mathematics Bilbao, Spain aperez@bcamath.org
Pseudocode Yes Algorithm 1 Pseudocode for MRC learning
Open Source Code Yes Python code with the proposed MRC is provided in https://github.com/Machine Learning BCAM/Minimax-risk-classifiers-Neur IPS-2020 with the settings used in these experimental results.
Open Datasets Yes In this section we show numerical results for MRCs using 8 UCI datasets for multi-class classification. [...] In the first set of experimental results, we use Adult and Magic data sets from the UCI repository. [...] In the second set of experimental results, we use 6 data sets from the UCI repository (first column of Table 1).
Dataset Splits Yes The errors and standard deviations in Table 1 have been estimated using paired and stratified 10-fold cross validation.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or specific computer specifications used for running its experiments.
Software Dependencies No The paper mentions 'CVX package' but does not specify its version number. It also mentions 'scikit-learn package' without a version, and mentions 'publicly available code' for AMC and MEM implementations without detailing their specific software dependencies and versions.
Experiment Setup Yes We obtain up to k = 200/|Y| thresholds using one-dimensional decision trees (decision stumps) so that the feature mapping has up to m = 200 + |Y| components, and we solve the optimization problems at learning with the constraints corresponding to the r = n matrices Φi = Φxi, i = 1, 2, . . ., n, obtained from the n training instances. For all datasets, interval estimates for feature mapping expectations were obtained using (2) with λ(i) = 0.25 for i = 1, 2, . . . , m.