reproducibilityindex.ai

Reducing Sentiment Bias in Pre-trained Sentiment Classification via Adaptive Gumbel Attack

Authors: Jiachen Tian, Shizhan Chen, Xiaowang Zhang, Xin Wang, Zhiyong Feng

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results illustrate that our method significantly reduced sentiment bias and improved the performance of sentiment classification.
Researcher Affiliation	Academia	Jiachen Tian, Shizhan Chen, Xiaowang Zhang*, Xin Wang, Zhiyong Feng College of Intelligence and Computing, Tianjin University, Tianjin, China Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin, China {jiachen6677, shizhan, xiaowangzhang, wangx, zyfeng}@tju.edu.cn
Pseudocode	No	The paper describes its methods in detail with equations and prose, but does not provide structured pseudocode or an algorithm block.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets	Yes	We conducted experiments on seven datasets. IMDb is a binary film review dataset, which is widely used as a benchmark for sentiment classification (Maas et al. 2011). SST-2 is the Stanford Sentiment Treebank (SST) consists of sentences from movie reviews and human annotations of their sentiments (Socher et al. 2013). YELP-2 and YELP-5 are subsets of Yelp s businesses, reviews, and user data, respectively (Xie et al. 2020). Amazon-2 and Amazon-5 are the Amazon review datasets from the Stanford Network Analysis Project, respectively (Xie et al. 2020). Sem Eval is an English aspect-level sentiment classification, which has 4 pre-defined aspect categories with 4 sentiment polarities (Yang et al. 2021).
Dataset Splits	No	The paper states 'We tune the number of epochs on the validation set of each dataset,' implying the use of a validation set, but it does not provide specific split percentages or sample counts for training, validation, and test sets to reproduce the data partitioning. Table 1 only lists 'Train Samples' and 'Test Samples'.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running the experiments. It only mentions using 'PLMs' without further specification of the underlying hardware.
Software Dependencies	Yes	We used the NLTK version 1 of the part-of-speech tagging tool to randomly select 400 entities from each dataset...
Experiment Setup	Yes	The only variations made involve tuning the initial learning rate from 1e-5 to 5e-5 for each dataset and adjusting the threshold of the average confidence λ from 0.6 to 0.8. Besides, the number of experts H is set to 7, and the batch size is set to 32. The classifier has a hidden layer of size 50. We use Adam as the basic parameter-updating algorithm with β1 = 0.9, β2 = 0.999. We tune the number of epochs on the validation set of each dataset. ...We used slanted triangular learning rates (Howard and Ruder 2018), that is, we set different learning rates for each layer of g and g . ...Concretely, we adopt 0.9, 0.85 and 0.8 for Sem Eval, IMDb and SST2, 0.75 and 0.7 for YELP-2 and YELP-5, 0.55 and 0.6 for Amazon-2 and Amazon-5, respectively.