On Testing for Biases in Peer Review

Authors: Ivan Stelmakh, Nihar Shah, Aarti Singh

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present two sets of results in this paper. The first set of results is negative, and pertains to the statistical tests and the experimental setup used in the work of Tomkins et al. ... Our second set of results is positive, in that we present a general framework for testing for biases in (single vs. double blind) peer review. We then present a hypothesis test with guaranteed control over false alarm probability and non-trivial power even under conditions (a) (c). Conditions (d) and (e) are more fundamental problems that are tied to the experimental setup and not necessarily related to the test. and Figure 1: Synthetic simulations evaluating performance of the test in Tomkins et al. [33] ( previous work ) and the test proposed in this paper ( DISAGREEMENT test ).
Researcher Affiliation Academia Ivan Stelmakh, Nihar B. Shah and Aarti Singh School of Computer Science Carnegie Mellon University {stiv,nihars,aarti}@cs.cmu.edu
Pseudocode Yes Test 1 DISAGREEMENT Input: Significance level 2 (0, 1). Set of tuples T , where each t 2 T is of the form (jt, Yjt, Xjt, wjt) for some paper j 2 [n]. 1. Initialize U and V to be empty arrays. 2. For each tuple t 2 T , if Yjt 6= Xjt, append Yjt to U if wjt = 1 V if wjt = 1 . 3. Run permutation test [12] at the level to test if entries of U and V are exchangeable random variables, using the test statistic: 4. Reject the null if and only if the permutation test rejects the null. (If any of the arrays V and U are empty, the test keeps the null.)
Open Source Code No The paper does not provide an explicit statement or link for the open-source code for the described methodology.
Open Datasets No The paper describes generating 'synthetic simulations' and 'synthetic data' for its experiments, rather than using a publicly available dataset. Details of the simulation setup are provided in Section 3 and Appendix A.
Dataset Splits No The paper uses synthetic data generated based on various parameters (e.g., number of papers, reviewer load, correlation) rather than pre-defined dataset splits. It does not provide specific train/validation/test split percentages or sample counts in the traditional sense.
Hardware Specification No The paper discusses theoretical analysis and synthetic simulations without providing any specific details on the hardware used to conduct these simulations.
Software Dependencies No The paper does not provide specific details or version numbers for any software dependencies or libraries used to implement the tests or run simulations.
Experiment Setup No The paper describes parameters for its synthetic data generation (e.g., correlation coefficient, reviewer load, number of papers) in Section 3 and Appendix A, but it does not provide typical experimental setup details such as hyperparameter values, learning rates, or optimizer settings for a trained model.