Uncovering Latent Biases in Text: Method and Application to Peer Review

Authors: Emaad Manzoor, Nihar B. Shah4767-4775

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply our framework to quantify biases in the text of peer reviews from a reputed machine-learning conference before and after the conference adopted a double-blind reviewing policy. We show evidence of biases in the review ratings that serves as ground truth , and show that our proposed framework accurately detects the presence (and absence) of these biases from the review text without having access to the review ratings.
Researcher Affiliation Academia Emaad Manzoor , Nihar B. Shah Carnegie Mellon University emaad@cmu.edu, nihars@cs.cmu.edu
Pseudocode No The paper describes its methodology in prose and mathematical equations but does not include any pseudocode or algorithm blocks.
Open Source Code Yes Reproducibility: We make our code and data publicly available at http://emaadmanzoor.com/biases-in-text/.
Open Datasets Yes We assemble a dataset of 16,880 peer reviews from the Open Review platform for all the 5,638 papers submitted to the International Conference on Learning Representations (ICLR) from 2017 to 2020. [...] Reproducibility: We make our code and data publicly available at http://emaadmanzoor.com/biases-in-text/.
Dataset Splits Yes We estimate the value of perf(f; t) and perf(g; t) using k-fold crossvalidation. To eliminate any dependence on the choice of cross-validation folds, we repeat the bias estimation procedure many times with the data belonging to each fold randomized uniformly in each iteration. [...] We report results with multinomial Naive Bayes classifiers for f( ) and g( ) and the AUC as our chosen measure of classification performance. We estimate the value of perf(f; t) and perf(g; t) using 10-fold crossvalidation.
Hardware Specification No The paper does not specify any hardware used for running the experiments, such as CPU or GPU models, or cloud computing resources.
Software Dependencies No The paper mentions using 'multinomial Naive Bayes classifiers' and 'gender package' (from a GitHub link), but does not provide specific version numbers for any software dependencies like Python, scikit-learn, or other libraries.
Experiment Setup Yes We use multinomial Naive Bayes classifiers with add-one smoothing for f( ) and g( ) on frequencies of unigrams and bigrams in the review and abstract text respectively. We use the area under the ROC curve (AUC) for both perf(f; t) and perf(g; t), estimated using 10-fold crossvalidation. We downsample the reviews and abstracts in year t DB to equalize the sample sizes and subgroup proportions in t SB and t DB, as described in Section . We repeat the bias estimation procedure 1,000 times with downsampling and the cross-validation folds randomized uniformly in each iteration.