Precision-Recall-Gain Curves: PR Analysis Done Right
Authors: Peter Flach, Meelis Kull
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We give experimental evidence in Section 4 that this matters by demonstrating that the area under traditional Precision-Recall curves can easily favour models with lower expected F1 score than others. |
| Researcher Affiliation | Academia | Peter A. Flach Intelligent Systems Laboratory University of Bristol, United Kingdom Peter.Flach@bristol.ac.uk Meelis Kull Intelligent Systems Laboratory University of Bristol, United Kingdom Meelis.Kull@bristol.ac.uk |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | To assist practitioners we have made R, Matlab and Java code to calculate AUPRG and PRG curves available at http://www.cs.bris.ac.uk/ flach/PRGcurves/. |
| Open Datasets | Yes | Using the Open ML platform [17] we took all those binary classification tasks which have 10-fold cross-validated predictions using at least 30 models from different learning methods (these are called flows in Open ML). In each of the obtained 886 tasks (covering 426 different datasets) we applied the following procedure. |
| Dataset Splits | Yes | Using the Open ML platform [17] we took all those binary classification tasks which have 10-fold cross-validated predictions using at least 30 models from different learning methods |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'R, Matlab and Java code' but does not specify version numbers for these or any associated libraries. |
| Experiment Setup | No | The paper describes its analytical process for evaluating existing models from Open ML datasets, but it does not specify hyperparameters or system-level training settings for the models themselves, as it did not train them. |