reproducibilityindex.ai

On Testing for Biases in Peer Review

Authors: Ivan Stelmakh, Nihar Shah, Aarti Singh

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present two sets of results in this paper. The ﬁrst set of results is negative, and pertains to the statistical tests and the experimental setup used in the work of Tomkins et al. ... Our second set of results is positive, in that we present a general framework for testing for biases in (single vs. double blind) peer review. We then present a hypothesis test with guaranteed control over false alarm probability and non-trivial power even under conditions (a) (c). Conditions (d) and (e) are more fundamental problems that are tied to the experimental setup and not necessarily related to the test. and Figure 1: Synthetic simulations evaluating performance of the test in Tomkins et al. [33] ( previous work ) and the test proposed in this paper ( DISAGREEMENT test ).
Researcher Affiliation	Academia	Ivan Stelmakh, Nihar B. Shah and Aarti Singh School of Computer Science Carnegie Mellon University {stiv,nihars,aarti}@cs.cmu.edu
Pseudocode	Yes	Test 1 DISAGREEMENT Input: Signiﬁcance level 2 (0, 1). Set of tuples T , where each t 2 T is of the form (jt, Yjt, Xjt, wjt) for some paper j 2 [n]. 1. Initialize U and V to be empty arrays. 2. For each tuple t 2 T , if Yjt 6= Xjt, append Yjt to U if wjt = 1 V if wjt = 1 . 3. Run permutation test [12] at the level to test if entries of U and V are exchangeable random variables, using the test statistic: 4. Reject the null if and only if the permutation test rejects the null. (If any of the arrays V and U are empty, the test keeps the null.)
Open Source Code	No	The paper does not provide an explicit statement or link for the open-source code for the described methodology.
Open Datasets	No	The paper describes generating 'synthetic simulations' and 'synthetic data' for its experiments, rather than using a publicly available dataset. Details of the simulation setup are provided in Section 3 and Appendix A.
Dataset Splits	No	The paper uses synthetic data generated based on various parameters (e.g., number of papers, reviewer load, correlation) rather than pre-defined dataset splits. It does not provide specific train/validation/test split percentages or sample counts in the traditional sense.
Hardware Specification	No	The paper discusses theoretical analysis and synthetic simulations without providing any specific details on the hardware used to conduct these simulations.
Software Dependencies	No	The paper does not provide specific details or version numbers for any software dependencies or libraries used to implement the tests or run simulations.
Experiment Setup	No	The paper describes parameters for its synthetic data generation (e.g., correlation coefficient, reviewer load, number of papers) in Section 3 and Appendix A, but it does not provide typical experimental setup details such as hyperparameter values, learning rates, or optimizer settings for a trained model.