Robust Evaluation Measures for Evaluating Social Biases in Masked Language Models
Authors: Yang Liu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the publicly available datasets Stereo Set (SS) and Crow S-Pairs (CP) show that our proposed measures are significantly more robust and interpretable than those proposed previously. |
| Researcher Affiliation | Academia | Tianjin University lauyon@tju.edu.cn |
| Pseudocode | No | The paper describes its methods using mathematical formulas and textual explanations but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | All experiments were conducted on a Ge Force RTX 3070 GPU and the code is available on Git Hub4. |
| Open Datasets | Yes | Our experiments use publicly available Stereo Set (SS; Nadeem, Bethke, and Reddy 2021)5 and Crow S-Pairs (CP; Nangia et al. 2020)6 datasets. |
| Dataset Splits | Yes | Because the test set part of the SS dataset is not publicly available, we use its development set. |
| Hardware Specification | Yes | All experiments were conducted on a Ge Force RTX 3070 GPU and the code is available on Git Hub4. |
| Software Dependencies | No | The paper mentions models like BERT, RoBERTa, and ALBERT, but it does not specify versions for software dependencies such as deep learning frameworks (e.g., PyTorch, TensorFlow) or other relevant libraries. |
| Experiment Setup | No | The paper mentions the language models used (BERT, RoBERTa, ALBERT) and the datasets, but it does not provide specific experimental setup details such as hyperparameters (e.g., learning rate, batch size, number of epochs) or optimizer settings. |