reproducibilityindex.ai

It Takes Two to Empathize: One to Seek and One to Provide

Authors: Mahshid Hosseini, Cornelia Caragea13018-13026

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we introduce IEMPATHIZE, a dataset compiled from an online cancer survivors network annotated with perceived ﬁne-grained empathy. To our knowledge, we are the ﬁrst to create a dataset labeled with ﬁne-grained empathy. We thus take one step further in the detection of empathy and identify the direction of empathetic support, seeking versus providing, from the reader’s perspective. Our dataset, which is available online,1 contains 5,007 sentences, each annotated with a ﬁne-grained empathy label, seeking-empathy, providing-empathy, or none. Our contributions are three-folded: (1) We propose the task of ﬁne-grained empathy direction detection by constructing and analyzing IEMPATHIZE, the ﬁrst dataset on ﬁne-grained empathy direction detection; (2) We establish strong baselines for a ﬁne-grained empathy direction detection task using pre-trained language models BERT (Devlin et al. 2019). To our knowledge, this is the ﬁrst work on automatically detecting the direction of empathetic support whether a message aims to provide empathy versus seek empathy. Moreover, we incorporate underlying inductive biases into BERT via domain-adaptive pre-training, which results in a better performance when integrating data from relevant domains; (3) We show that, in general, messages that provide empathy have the capacity to make a positive shift in the sentiment of participants who seek empathy. Empathy Modeling We now turn to modeling the empathy direction detection in IEMPATHIZE. We create classiﬁers in two different settings. A binary setting for identifying sentences seeking empathetic support and sentences providing empathetic support. We also explore a multi-class setting for identifying sentences seeking, providing, and none (neither seeking nor providing), to compare the performance between these two different settings. To create the providing-classiﬁer, we group the two classes of none and seeking-empathy as negative samples and keep providing-empathy as positive samples. Similarly, to create the seeking-classiﬁer, we group the two classes of none and providing-empathy as negative samples and keep seeking-empathy as positive samples. For the second setting, we consider all three classes. We then perform a 60/20/20 split to build the train, validation, and test sets. Table 4 shows the splits. Below, we discuss our models. ... Table 5 presents the classiﬁcation results. As we can see from the table, BERT outperforms other models in both settings.
Researcher Affiliation	Academia	Mahshid Hosseini and Cornelia Caragea Computer Science Department University of Illinois at Chicago mhosse4@uic.edu, cornelia@uic.edu
Pseudocode	No	The paper describes the models and methods used (e.g., CNN, LSTM, BERT) but does not provide any structured pseudocode or algorithm blocks.
Open Source Code	No	Our dataset, which is available online,1 contains 5,007 sentences, each annotated with a ﬁne-grained empathy label, seeking-empathy, providing-empathy, or none. (Footnote 1: https://github.com/Mahhos/Empathy). The paper explicitly states the dataset is available online, but does not explicitly state that the code for the methodology (e.g., the BERT model implementation or training scripts) is also provided at this link or elsewhere.
Open Datasets	Yes	In this paper, we introduce IEMPATHIZE, a dataset compiled from an online cancer survivors network annotated with perceived ﬁne-grained empathy. ... Our dataset, which is available online,1 contains 5,007 sentences, each annotated with a ﬁne-grained empathy label, seeking-empathy, providing-empathy, or none. (Footnote 1: https://github.com/Mahhos/Empathy)
Dataset Splits	Yes	We then perform a 60/20/20 split to build the train, validation, and test sets. Table 4 shows the splits. ... Table 4: Dataset splits. p/n represents positive vs. negative examples, respectively. Train Valid Test p / n p / n p / n seek 624/2, 379 199/802 223/780 provide 584/2, 419 182/821 200/801 Total 1, 208/4, 798 381/1, 623 423/1, 581
Hardware Specification	No	The paper mentions ﬁne-tuning BERT, which implies the use of computational hardware, but it does not specify any particular GPU models, CPU models, memory sizes, or types of computing clusters used for the experiments.
Software Dependencies	No	The paper mentions using specific tools like “Hugging Face Transformers library (Wolf et al. 2020)” and “Stanford sentiment toolkit (Manning et al. 2014)” and models like “BERT (Devlin et al. 2019)”. However, it does not provide specific version numbers for these libraries or any underlying software dependencies like Python, PyTorch, or TensorFlow versions, which are crucial for reproducibility.
Experiment Setup	No	The paper states: “We estimate hyper-parameters on the validation set” and describes some general settings for BERT’s pre-training tasks (MLM/NSP masking details). However, it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) for the main fine-tuning experiments, which are essential for reproducing the results.