reproducibilityindex.ai

Robust Evaluation Measures for Evaluating Social Biases in Masked Language Models

Authors: Yang Liu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on the publicly available datasets Stereo Set (SS) and Crow S-Pairs (CP) show that our proposed measures are significantly more robust and interpretable than those proposed previously.
Researcher Affiliation	Academia	Tianjin University lauyon@tju.edu.cn
Pseudocode	No	The paper describes its methods using mathematical formulas and textual explanations but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	All experiments were conducted on a Ge Force RTX 3070 GPU and the code is available on Git Hub4.
Open Datasets	Yes	Our experiments use publicly available Stereo Set (SS; Nadeem, Bethke, and Reddy 2021)5 and Crow S-Pairs (CP; Nangia et al. 2020)6 datasets.
Dataset Splits	Yes	Because the test set part of the SS dataset is not publicly available, we use its development set.
Hardware Specification	Yes	All experiments were conducted on a Ge Force RTX 3070 GPU and the code is available on Git Hub4.
Software Dependencies	No	The paper mentions models like BERT, RoBERTa, and ALBERT, but it does not specify versions for software dependencies such as deep learning frameworks (e.g., PyTorch, TensorFlow) or other relevant libraries.
Experiment Setup	No	The paper mentions the language models used (BERT, RoBERTa, ALBERT) and the datasets, but it does not provide specific experimental setup details such as hyperparameters (e.g., learning rate, batch size, number of epochs) or optimizer settings.