PiRank: Scalable Learning To Rank via Differentiable Sorting

Authors: Robin Swezey, Aditya Grover, Bruno Charron, Stefano Ermon

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we benchmark Pi Rank against 5 competing methods on two of the largest publicly available LTR datasets: MSLR-WEB30K [20] and Yahoo! C14. We find that Pi Rank is superior or competitive on 13 out of 16 ranking metrics and their variants, including 9 on which it is significantly superior to all baselines, and that it is able to scale to very large item lists. We also provide several ablation experiments to understand the impact of various factors on performance.
Researcher Affiliation Collaboration Robin Swezey1 Aditya Grover2,3 Bruno Charron1 Stefano Ermon4 1Amazon 2University of California, Los Angeles 3Facebook AI Research 4Stanford University
Pseudocode No The paper describes algorithmic strategies but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code Yes Finally, we provide an open-source implementation2 based on Tensor Flow Ranking [21]. 2https://github.com/ermongroup/pirank
Open Datasets Yes To empirically test Pi Rank, we consider two of the largest open-source benchmarks for LTR: the MSLR-WEB30K3 and the Yahoo! LTR dataset C144. Both datasets have relevance scores on a 5-point scale of 0 to 4, with 0 denoting complete irrelevance and 4 denoting perfect relevance. We give extensive details on the datasets and experimental protocol in Appendix C.
Dataset Splits No The paper mentions "at validation" and refers to Appendix C for experimental details, but the main text does not specify exact training/validation/test split percentages or sample counts.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as CPU or GPU models.
Software Dependencies No The paper mentions "Tensor Flow Ranking [21]" but does not specify its version or the versions of other software dependencies.
Experiment Setup Yes All approaches use the same 3-layer fully connected network architecture with Re LU activations to compute the scores ˆy for all (query, item) pairs, trained on 100,000 iterations. The maximum list size for each group of items to score and rank is fixed to 200, for both training and testing.