reproducibilityindex.ai

Automatically Neutralizing Subjective Bias in Text

Authors: Reid Pryzant, Richard Diehl Martinez, Nathan Dass, Sadao Kurohashi, Dan Jurafsky, Diyi Yang480-489

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Large-scale human evaluation across four domains (encyclopedias, news headlines, books, and political speeches) suggests that these algorithms are a ﬁrst step towards the automatic identiﬁcation and reduction of bias.
Researcher Affiliation	Academia	1Stanford University {rpryzant, rdm, ndass, jurafsky}@stanford.edu 2Kyoto University kuro@i.kyoto-u.ac.jp 3Georgia Institute of Technology diyi.yang@cc.gate.edu
Pseudocode	No	The paper describes algorithms using text and diagrams (Figures 2 and 3) but does not include formal pseudocode or algorithm blocks.
Open Source Code	Yes	We release data and code to the public.2 https://github.com/rpryzant/neutralizing-bias
Open Datasets	Yes	We introduce the Wiki Neutrality Corpus (WNC). This is a new parallel corpus of 180,000 biased and neutralized sentence pairs along with contextual sentences and metadata. The corpus was harvested from Wikipedia edits... We release data and code to the public.2 https://github.com/rpryzant/neutralizing-bias
Dataset Splits	Yes	This yielded 53,803 training pairs (about a quarter of the WNC), from which we sampled 700 development and 1,000 test pairs.
Hardware Specification	Yes	All computations were performed on a single NVIDIA TITAN X GPU; training the full system took approximately 10 hours.
Software Dependencies	No	The paper mentions software like Pytorch and Adam, and the BERT model ('bert-base-uncased'), but does not provide specific version numbers for these software dependencies, which are necessary for full reproducibility.
Experiment Setup	Yes	We implemented nonlinear models with Pytorch (Paszke et al. 2017) and optimized using Adam (Kingma and Ba 2014) as conﬁgured in (Devlin et al. 2019) with a learning rate of 5e-5. We used a batch size of 16. All vectors were of length h = 512 unless otherwise speciﬁed. We use gradient clipping with a maximum gradient norm of 3 and a dropout probability of 0.2 on the inputs of each LSTM cell (Srivastava et al. 2014). We pre-trained the tagging module for 4 epochs. We pretrained the editing module on the neutral portion of our WNC for 4 epochs. The joint system was trained on the same data as the tagger for 25,000 steps (about 7 epochs). We perform interference using beam search and a beam width of 4.