reproducibilityindex.ai

Structured Probabilistic Coding

Authors: Dou Hu, Lingwei Wei, Yaxin Liu, Wei Zhou, Songlin Hu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on 12 natural language understanding tasks demonstrate that our SPC effectively improves the performance of pre-trained language models for classification and regression. Extensive experiments show that SPC can enhance the generalization capability, robustness to label noise, and clustering quality of output representations.
Researcher Affiliation	Academia	Dou Hu1,2, Lingwei Wei1, Yaxin Liu1,2, Wei Zhou1, Songlin Hu1,2 1 Institute of Information Engineering, Chinese Academy of Sciences 2 School of Cyber Security, University of Chinese Academy of Sciences {hudou, weilingwei, liuyaxin, zhouwei, husonglin}@iie.ac.cn
Pseudocode	No	No pseudocode or clearly labeled algorithm blocks were found in the paper.
Open Source Code	Yes	The code is available at https://github.com/zerohd4869/SPC
Open Datasets	Yes	We conduct experiments on various classification and regression tasks. Concretely, following Barbieri et al. (2020), we experiment on 7 classification tasks about tweet analysis on social media, i.e., Emoji Eval (Barbieri et al. 2018), Emotion Eval (Mohammad et al. 2018), Hat Eval (Basile et al. 2019), Irony Eval (Hee, Lefever, and Hoste 2018), Offens Eval (Zampieri et al. 2019), Senti Eval (Rosenthal, Farra, and Nakov 2017), and Stance Eval (Mohammad et al. 2016). To better evaluate the generalization of the method for cross-domain scenes, we also experiment on 3 emotion-related datasets from different domains, i.e., ISEAR (Scherer and Wallbott 1994), MELD (Poria et al. 2019), and Go Emotions (Demszky et al. 2020). Besides, we experiment on 2 regression benchmarks, i.e., STS-B (Cer et al. 2017) and CLAIRE (Roth, Anthonio, and Sauer 2022).
Dataset Splits	Yes	Due to the lack of a predefined split in the original dataset, we randomly split the dataset into train/valid/test set in a ratio of 4:1:5 based on the label distribution. The validation sets are used to tune hyperparameters and choose the optimal model. Table 6 provides detailed statistics including # Train, # Val, # Test for all datasets.
Hardware Specification	Yes	All experiments are conducted on a single NVIDIA Tesla A100 80GB card.
Software Dependencies	No	The paper mentions using specific models like BERT and RoBERTa and the Adamax optimizer, but it does not provide specific version numbers for software dependencies such as Python, PyTorch, or other libraries (e.g., 'PyTorch 1.9').
Experiment Setup	Yes	All experiments are conducted on a single NVIDIA Tesla A100 80GB card. For each method, we run five random seeds and report the average result of the test sets. Besides, we conduct experiments using an epoch number of 20, a total batch size of 128, and a maximum token length of 128. The maximum patience for early stopping is set to 5 epochs. The network parameters are optimized by using Adamax optimizer (Kingma and Ba 2015) with the learning rate of 5e-5, the weight decay coefficient of {0, 0.01, 0.001}. For SPC, the trade-off parameter β and γ are searched from {0.001, 0.01, 0.1, 1, 10} respectively. Detailed hyperparameter settings are listed in Table 7 and Table 8 of the Appendix.