Optimizing F-Measures by Cost-Sensitive Classification
Authors: Shameem Puthiya Parambath, Nicolas Usunier, Yves Grandvalet
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present numerical experiments to illustrate the relative importance of cost asymmetry and thresholding when learning linear classifiers on various F-measure optimization tasks. [...] 4 Experiments The goal of this section is to give illustration of the algorithms suggested by the theory. First, our results suggest that cost-sensitive classification algorithms may be preferable to the more usual probability thresholding method. We compare cost-sensitive classification, as implemented by SVMs with asymmetric costs, to thresholded logistic regression, with linear classifiers. Besides, the structured SVM approach to F1-measure maximization SVMperf [11] provides another baseline. [...] Table 1: (macro-)F1-measures (in %). Options: T stands for thresholded, CS for cost-sensitive and CS&T for cost-sensitive and thresholded. |
| Researcher Affiliation | Academia | Shameem A. Puthiya Parambath, Nicolas Usunier, Yves Grandvalet Universit e de Technologie de Compi egne CNRS, Heudiasyc UMR 7253 Compi egne, France {sputhiya,nusunier,grandval}@utc.fr |
| Pseudocode | No | The paper describes a 'meta-algorithm' conceptually but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using the third-party library 'Lib Linear [9]' but does not provide concrete access to its own source code for the methodology described. |
| Open Datasets | Yes | The datasets we use are Adult, Letter, News20, and Siam. All datasets except for News20 and Siam are obtained from the UCI repository3. Footnote 3: https://archive.ics.uci.edu/ml/datasets.html. News201: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#news20. Siam2: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multilabel.html#siam-competition2007. |
| Dataset Splits | Yes | For each experiment, the training set was split at random, keeping 1/3 for the validation set used to select all hyper-parameters, based on the maximization of the F1-measure on this set. For datasets that do not come with a separate test set, the data was first split to keep 1/4 for test. |
| Hardware Specification | No | No specific hardware details (e.g., CPU/GPU models, processor types, memory amounts) used for running experiments are provided. The paper only mentions running times without specifying the hardware used. |
| Software Dependencies | No | The paper mentions using 'Lib Linear [9]' but does not provide its version number or any other software dependencies with specific version numbers. |
| Experiment Setup | Yes | The algorithms have from one to three hyper-parameters: (i) all algorithms are run with L2 regularization, with a regularization parameter C {2^6, 2^5, ..., 2^6}; (ii) for the cost-sensitive algorithms, the cost for false negatives is chosen in { (1+β^2 t)/t , t {0.1, 0.2, ..., 1.9}} of Proposition 6 4; (iii) for the thresholded algorithms, the threshold is chosen among all the scores of the validation examples. The maximum number of iteration for SVMs was set to 50,000 instead of the default 1,000. |