Reducing Sentiment Bias in Pre-trained Sentiment Classification via Adaptive Gumbel Attack
Authors: Jiachen Tian, Shizhan Chen, Xiaowang Zhang, Xin Wang, Zhiyong Feng
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results illustrate that our method significantly reduced sentiment bias and improved the performance of sentiment classification. |
| Researcher Affiliation | Academia | Jiachen Tian, Shizhan Chen, Xiaowang Zhang*, Xin Wang, Zhiyong Feng College of Intelligence and Computing, Tianjin University, Tianjin, China Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin, China {jiachen6677, shizhan, xiaowangzhang, wangx, zyfeng}@tju.edu.cn |
| Pseudocode | No | The paper describes its methods in detail with equations and prose, but does not provide structured pseudocode or an algorithm block. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | We conducted experiments on seven datasets. IMDb is a binary film review dataset, which is widely used as a benchmark for sentiment classification (Maas et al. 2011). SST-2 is the Stanford Sentiment Treebank (SST) consists of sentences from movie reviews and human annotations of their sentiments (Socher et al. 2013). YELP-2 and YELP-5 are subsets of Yelp s businesses, reviews, and user data, respectively (Xie et al. 2020). Amazon-2 and Amazon-5 are the Amazon review datasets from the Stanford Network Analysis Project, respectively (Xie et al. 2020). Sem Eval is an English aspect-level sentiment classification, which has 4 pre-defined aspect categories with 4 sentiment polarities (Yang et al. 2021). |
| Dataset Splits | No | The paper states 'We tune the number of epochs on the validation set of each dataset,' implying the use of a validation set, but it does not provide specific split percentages or sample counts for training, validation, and test sets to reproduce the data partitioning. Table 1 only lists 'Train Samples' and 'Test Samples'. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running the experiments. It only mentions using 'PLMs' without further specification of the underlying hardware. |
| Software Dependencies | Yes | We used the NLTK version 1 of the part-of-speech tagging tool to randomly select 400 entities from each dataset... |
| Experiment Setup | Yes | The only variations made involve tuning the initial learning rate from 1e-5 to 5e-5 for each dataset and adjusting the threshold of the average confidence λ from 0.6 to 0.8. Besides, the number of experts H is set to 7, and the batch size is set to 32. The classifier has a hidden layer of size 50. We use Adam as the basic parameter-updating algorithm with β1 = 0.9, β2 = 0.999. We tune the number of epochs on the validation set of each dataset. ...We used slanted triangular learning rates (Howard and Ruder 2018), that is, we set different learning rates for each layer of g and g . ...Concretely, we adopt 0.9, 0.85 and 0.8 for Sem Eval, IMDb and SST2, 0.75 and 0.7 for YELP-2 and YELP-5, 0.55 and 0.6 for Amazon-2 and Amazon-5, respectively. |