Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Two-sided fairness in rankings via Lorenz dominance
Authors: Virginie Do, Sam Corbett-Davies, Jamal Atif, Nicolas Usunier
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments also show that it increases the utility of the worse-off at lower costs in terms of overall utility. We report experimental results on music and friend recommendation tasks, where we analyze the trade-offs obtained by different methods by looking at different points of their Lorenz curves. Our welfare approach generates a wide variety of trade-offs, and is, in particular, more effective at improving the utility of worse-off users than the baselines. Section 5 Experiments |
| Researcher Affiliation | Collaboration | 1Facebook AI 2LAMSADE, Université PSL, Université Paris Dauphine, CNRS, France EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the Frank-Wolfe algorithm and its steps in text (Section 4) but does not provide a formal pseudocode block or algorithm box. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We present here our experiments with the Lastfm-2k dataset [9, 47], which contains the music listening histories of 1.9k users. We present in App. F.3 results using the Movie Lens-20m dataset [24]. We generate an artificial task based on the Higgs Twitter dataset [15]. |
| Dataset Splits | No | The paper states: 'We split the data in two parts: 80% for training and 20% for testing' (Appendix F.1). However, it does not mention a separate validation split, only training and testing. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions using a 'matrix factorization algorithm' but does not specify any software libraries, frameworks, or their version numbers used in the implementation. |
| Experiment Setup | No | The paper mentions aspects of the experimental protocol such as dataset selection and splitting (e.g., 'We select the top 2500 items most listened to, and estimate preferences with a matrix factorization algorithm using a random sample of 80% of the data.'), but it does not specify concrete hyperparameters (e.g., learning rate, batch size, optimizer settings) or other low-level configuration details of the models or algorithms used. |