Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PeerReview4All: Fair and Accurate Reviewer Assignment in Peer Review

Authors: Ivan Stelmakh, Nihar Shah, Aarti Singh

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our ﬁfth and ﬁnal contribution comprises empirical evaluations. We designed and conducted an experiment on the Amazon Mechanical Turk crowdsourcing platform to objectively compare the performance of diﬀerent reviewer-assignment algorithms. The design of the experiment is done carefully to circumvent the challenge posed by the absence of a ground truth in peer review settings, so that we can evaluate accuracy objectively. In addition to the MTurk experiment, we provide an extensive evaluation of our algorithm on synthetic data, provide an evaluation on a reconstructed similarity matrix from the ICLR 2018 conference, and report the results of the experiment on real conference data conducted by Kobren et al. (2019).
Researcher Affiliation	Academia	Ivan Stelmakh EMAIL Nihar Shah EMAIL Aarti Singh EMAIL School of Computer Science Carnegie Mellon University 5000 Forbes Ave, Pittsburgh, PA 15213
Pseudocode	Yes	Algorithm 1 Peer Review4All Algorithm Input: λ [n]: number of reviewers required per paper S [0, 1]n m: similarity matrix µ [m]: reviewers maximum load f: transformation of similarities Output: Reviewer assignment APR4A f
Open Source Code	Yes	The data set pertaining to the MTurk experiment, as well as the code for our Peer Review4All algorithm, are available on the ﬁrst author s website.
Open Datasets	Yes	The data set pertaining to the MTurk experiment, as well as the code for our Peer Review4All algorithm, are available on the ﬁrst author s website.
Dataset Splits	Yes	In each of the 6 regions, we ﬁrst split the 10 questions into two sets: a gold standard set of 8 questions chosen uniformly at random and an unresolved set comprising the 2 remaining questions.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running its experiments.
Software Dependencies	No	The paper mentions other tools like the Toronto Paper Matching System (TPMS) and its open-source code for constructing a similarity matrix, but it does not specify version numbers for any software dependencies used in their own experimental setup or for the Peer Review4All algorithm itself.
Experiment Setup	Yes	We consider the instance of the reviewer assignment problem with m = n = 100 and λ = µ = 4. [...] In each of these assignments, every question was answered by λ = 3 workers and every worker answered at most µ = 2 questions.