Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Bipartite Ranking: a Risk-Theoretic Perspective

Authors: Aditya Krishna Menon, Robert C. Williamson

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present experiments that assess the eﬃcacy of several proper composite losses proposed in the previous section for the problem of maximising accuracy at the head of the ranked list. The aim of our experiments is not to position the new losses as a superior alternative to the existing 푝-classiﬁcation and 푝-norm push approaches. Rather, we wish to demonstrate that the proper composite interpretation gives one way of generating a family of losses for this problem, with the 푝-classiﬁcation loss being but one example of this family. An attraction of these losses is that they are simple to optimise using gradient-based methods, with complexity linear in the number of training examples (as opposed to methods that operate on pairs of examples). To clarify the eﬀect of the choice of loss and choice of risk, we consider all combinations of the three risk types considered in this paper proper composite (Equation 14), bipartite (Equation 17), and 푝-norm push (Equation 56) and the loss functions of interest. On the one hand, one expects the 푝-norm push risk to perform best when combined with a loss suitable for ranking the best. On the other hand, our analysis in the previous section indicates that there is promise in the minimisation of a suitable proper composite risk. For our losses, we experiment with the standard logistic and exponential losses, as well as the 푝-classiﬁcation loss. Based on our hybrid loss proposal in Lemma 61, we consider the following: The proper composite loss with weight 푤(푐) = 1 푐 (1 푐) 2 1 푝+1 , and sigmoid link, which we term the Log- 푝-classiﬁcation Hybrid ; The proper composite loss with weight being a hybrid of 1 푐 (1 푐) and 1 2 푐3 2 (1 푐)3 2 about threshold 1 푝+1, and sigmoid link, which we term the Log-Exp Hybrid ; The proper composite loss with weight being a hybrid of 4 and 1 2 푐3 2 (1 푐)3 2 about threshold 1 푝+1, and link being a hybrid of the identity and sigmoid link, which we term the Square-Exp Hybrid . We compare these methods on four UCI data sets: ionosphere, housing, german and car.
Researcher Affiliation	Academia	Aditya Krishna Menon EMAIL Robert C. Williamson EMAIL Data61 and the Australian National University Canberra, ACT, Australia
Pseudocode	No	The paper describes algorithms and methods but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets	Yes	We compare these methods on four UCI data sets: ionosphere, housing, german and car.
Dataset Splits	Yes	For each data set, we created 5 random train-test splits in the ratio 2 1. For each split, we performed 5-fold cross-validation on the training set to tune the strength of regularisation λ {10−6, 10−5, ..., 102}, and where appropriate the constant 푝 {1, 2, 4, 8, 16, 32, 64}.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments.
Software Dependencies	No	Each method was trained with a regularised linear model, where the training objective was minimised using L-BFGS (Nocedal and Wright, 2006, pg. 177). The paper mentions L-BFGS but does not provide a specific version number for it or other software dependencies.
Experiment Setup	Yes	Each method was trained with a regularised linear model, where the training objective was minimised using L-BFGS (Nocedal and Wright, 2006, pg. 177). For each data set, we created 5 random train-test splits in the ratio 2 1. For each split, we performed 5-fold cross-validation on the training set to tune the strength of regularisation λ {10−6, 10−5, ..., 102}, and where appropriate the constant 푝 {1, 2, 4, 8, 16, 32, 64}. We then evaluated performance on the test set, and report the average across all splits. As performance measures, we used the AUC, ARR, DCG, AP, and PTop (Agarwal, 2011; Boyd et al., 2012). For all measures, a higher score is better. Parameter tuning was done based on the AP on the test folds.