Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Group Fairness in Peer Review
Authors: Haris Aziz, Evi Micha, Nisarg Shah
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use real data from CVPR and ICLR conferences to compare our algorithm to existing reviewing assignment algorithms on a number of metrics. |
| Researcher Affiliation | Academia | Haris Aziz UNSW Sydney EMAIL Evi Micha University of Toronto EMAIL Nisarg Shah University of Toronto EMAIL |
| Pseudocode | Yes | ALGORITHM 1: Co BRA Input: N, P, σ, ka, kp Output: R |
| Open Source Code | No | The paper mentions systems like Toronto Paper Matching System and OpenReview as existing tools but does not provide a link to the open-source code for its own proposed method (Co BRA). |
| Open Datasets | Yes | We use three conference datasets: from the Conference on Computer Vision and Pattern Recognition (CVPR) in 2017 and 2018, which were both used by Kobren et al. [16], and from the International Conference on Learning Representations (ICLR) in 2018, which was used by Xu et al. [25]. |
| Dataset Splits | No | The paper mentions subsampling 100 papers for computing the core violation factor, but it does not provide specific training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit standard splits for model training and evaluation). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | We use ka = kp = 3 in these experiments. (...) we subsample 100 papers from each dataset in each run, and report results averaged over 100 runs. |