Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Bipartite Ranking: a Risk-Theoretic Perspective
Authors: Aditya Krishna Menon, Robert C. Williamson
JMLR 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experiments that assess the efficacy of several proper composite losses proposed in the previous section for the problem of maximising accuracy at the head of the ranked list. The aim of our experiments is not to position the new losses as a superior alternative to the existing 푝-classification and 푝-norm push approaches. Rather, we wish to demonstrate that the proper composite interpretation gives one way of generating a family of losses for this problem, with the 푝-classification loss being but one example of this family. An attraction of these losses is that they are simple to optimise using gradient-based methods, with complexity linear in the number of training examples (as opposed to methods that operate on pairs of examples). To clarify the effect of the choice of loss and choice of risk, we consider all combinations of the three risk types considered in this paper proper composite (Equation 14), bipartite (Equation 17), and 푝-norm push (Equation 56) and the loss functions of interest. On the one hand, one expects the 푝-norm push risk to perform best when combined with a loss suitable for ranking the best. On the other hand, our analysis in the previous section indicates that there is promise in the minimisation of a suitable proper composite risk. For our losses, we experiment with the standard logistic and exponential losses, as well as the 푝-classification loss. Based on our hybrid loss proposal in Lemma 61, we consider the following: The proper composite loss with weight 푤(푐) = 1 푐 (1 푐) 2 1 푝+1 , and sigmoid link, which we term the Log- 푝-classification Hybrid ; The proper composite loss with weight being a hybrid of 1 푐 (1 푐) and 1 2 푐3 2 (1 푐)3 2 about threshold 1 푝+1, and sigmoid link, which we term the Log-Exp Hybrid ; The proper composite loss with weight being a hybrid of 4 and 1 2 푐3 2 (1 푐)3 2 about threshold 1 푝+1, and link being a hybrid of the identity and sigmoid link, which we term the Square-Exp Hybrid . We compare these methods on four UCI data sets: ionosphere, housing, german and car. |
| Researcher Affiliation | Academia | Aditya Krishna Menon EMAIL Robert C. Williamson EMAIL Data61 and the Australian National University Canberra, ACT, Australia |
| Pseudocode | No | The paper describes algorithms and methods but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available. |
| Open Datasets | Yes | We compare these methods on four UCI data sets: ionosphere, housing, german and car. |
| Dataset Splits | Yes | For each data set, we created 5 random train-test splits in the ratio 2 1. For each split, we performed 5-fold cross-validation on the training set to tune the strength of regularisation λ {10−6, 10−5, ..., 102}, and where appropriate the constant 푝 {1, 2, 4, 8, 16, 32, 64}. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments. |
| Software Dependencies | No | Each method was trained with a regularised linear model, where the training objective was minimised using L-BFGS (Nocedal and Wright, 2006, pg. 177). The paper mentions L-BFGS but does not provide a specific version number for it or other software dependencies. |
| Experiment Setup | Yes | Each method was trained with a regularised linear model, where the training objective was minimised using L-BFGS (Nocedal and Wright, 2006, pg. 177). For each data set, we created 5 random train-test splits in the ratio 2 1. For each split, we performed 5-fold cross-validation on the training set to tune the strength of regularisation λ {10−6, 10−5, ..., 102}, and where appropriate the constant 푝 {1, 2, 4, 8, 16, 32, 64}. We then evaluated performance on the test set, and report the average across all splits. As performance measures, we used the AUC, ARR, DCG, AP, and PTop (Agarwal, 2011; Boyd et al., 2012). For all measures, a higher score is better. Parameter tuning was done based on the AP on the test folds. |