Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

CASA: Conversational Aspect Sentiment Analysis for Dialogue Understanding

Authors: Linfeng Song, Chunlei Xin, Shaopeng Lai, Ante Wang, Jinsong Su, Kun Xu

JAIR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To aid the training and evaluation of data-driven methods, we annotate 3,000 chit-chat dialogues (27,198 sentences) with ﬁne-grained sentiment information, including all sentiment expressions, their polarities and the corresponding target mentions. We also annotate an out-of-domain test set of 200 dialogues for robustness evaluation. Besides, we develop multiple baselines based on either pretrained BERT or self-attention for preliminary study. Experimental results show that our BERT-based model has strong performances for both in-domain and outof-domain datasets, and thorough analysis indicates several potential directions for further improvements.
Researcher Affiliation	Collaboration	Linfeng Song EMAIL Tencent AI Lab, Bellevue, WA, USA 98004 Chunlei Xin EMAIL Shaopeng Lai EMAIL Ante Wang EMAIL Xiamen University, Xiamen, Fujian, China 361005 Jinsong Su EMAIL Xiamen University, Xiamen, Fujian, China 361005 Pengcheng Lab, Shenzhen, Guangdong, China 518066 Kun Xu EMAIL Tencent AI Lab, Bellevue, WA, USA 98004
Pseudocode	No	The paper describes the models (Encoder SE and Encoder ME) using equations and figures (Figure 1), and textual descriptions of the steps, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	We introduce strong baseline models for a preliminary study of this task and release our code and datasets at https://github.com/freesunshine0316/lab-conv-asa.
Open Datasets	Yes	We manually create a dataset of 3,000 dialogues (with 27,198 sentences) for training and evaluating data-driven methods. This can boost the research for sentiment analysis as well. ... We introduce strong baseline models for a preliminary study of this task and release our code and datasets at https://github.com/freesunshine0316/lab-conv-asa.
Dataset Splits	Yes	In the subsequent experiments, we split Du Conv into training, development and test sets, with each containing 80%, 10% and 10% of the whole data, respectively. The whole News Dialogue dataset is used as an out-of-domain test set.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies	No	For BERT-based models, we adopt the huggingface-based RoBERTa-wwm-ext model (Cui et al., 2020). ... Adam (Kingma and Ba, 2014) with learning rate 10^-5 is adopted to train all systems for 50 epochs.
Experiment Setup	Yes	All models are trained for 20 epochs using Adam (Kingma and Ba, 2014) with learning rate 10^-5. ... For both baselines and our model, the encoder and decoder take 4 multi-head self-attention layers, each layer taking 512 hidden units and 8 heads. Adam (Kingma and Ba, 2014) with learning rate 10^-5 is adopted to train all systems for 50 epochs. The batch size is set to 16 for all systems as well.