Uncovering Latent Biases in Text: Method and Application to Peer Review
Authors: Emaad Manzoor, Nihar B. Shah4767-4775
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our framework to quantify biases in the text of peer reviews from a reputed machine-learning conference before and after the conference adopted a double-blind reviewing policy. We show evidence of biases in the review ratings that serves as ground truth , and show that our proposed framework accurately detects the presence (and absence) of these biases from the review text without having access to the review ratings. |
| Researcher Affiliation | Academia | Emaad Manzoor , Nihar B. Shah Carnegie Mellon University emaad@cmu.edu, nihars@cs.cmu.edu |
| Pseudocode | No | The paper describes its methodology in prose and mathematical equations but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Reproducibility: We make our code and data publicly available at http://emaadmanzoor.com/biases-in-text/. |
| Open Datasets | Yes | We assemble a dataset of 16,880 peer reviews from the Open Review platform for all the 5,638 papers submitted to the International Conference on Learning Representations (ICLR) from 2017 to 2020. [...] Reproducibility: We make our code and data publicly available at http://emaadmanzoor.com/biases-in-text/. |
| Dataset Splits | Yes | We estimate the value of perf(f; t) and perf(g; t) using k-fold crossvalidation. To eliminate any dependence on the choice of cross-validation folds, we repeat the bias estimation procedure many times with the data belonging to each fold randomized uniformly in each iteration. [...] We report results with multinomial Naive Bayes classifiers for f( ) and g( ) and the AUC as our chosen measure of classification performance. We estimate the value of perf(f; t) and perf(g; t) using 10-fold crossvalidation. |
| Hardware Specification | No | The paper does not specify any hardware used for running the experiments, such as CPU or GPU models, or cloud computing resources. |
| Software Dependencies | No | The paper mentions using 'multinomial Naive Bayes classifiers' and 'gender package' (from a GitHub link), but does not provide specific version numbers for any software dependencies like Python, scikit-learn, or other libraries. |
| Experiment Setup | Yes | We use multinomial Naive Bayes classifiers with add-one smoothing for f( ) and g( ) on frequencies of unigrams and bigrams in the review and abstract text respectively. We use the area under the ROC curve (AUC) for both perf(f; t) and perf(g; t), estimated using 10-fold crossvalidation. We downsample the reviews and abstracts in year t DB to equalize the sample sizes and subgroup proportions in t SB and t DB, as described in Section . We repeat the bias estimation procedure 1,000 times with downsampling and the cross-validation folds randomized uniformly in each iteration. |