It Takes Two to Empathize: One to Seek and One to Provide

Authors: Mahshid Hosseini, Cornelia Caragea13018-13026

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we introduce IEMPATHIZE, a dataset compiled from an online cancer survivors network annotated with perceived fine-grained empathy. To our knowledge, we are the first to create a dataset labeled with fine-grained empathy. We thus take one step further in the detection of empathy and identify the direction of empathetic support, seeking versus providing, from the reader’s perspective. Our dataset, which is available online,1 contains 5,007 sentences, each annotated with a fine-grained empathy label, seeking-empathy, providing-empathy, or none. Our contributions are three-folded: (1) We propose the task of fine-grained empathy direction detection by constructing and analyzing IEMPATHIZE, the first dataset on fine-grained empathy direction detection; (2) We establish strong baselines for a fine-grained empathy direction detection task using pre-trained language models BERT (Devlin et al. 2019). To our knowledge, this is the first work on automatically detecting the direction of empathetic support whether a message aims to provide empathy versus seek empathy. Moreover, we incorporate underlying inductive biases into BERT via domain-adaptive pre-training, which results in a better performance when integrating data from relevant domains; (3) We show that, in general, messages that provide empathy have the capacity to make a positive shift in the sentiment of participants who seek empathy. Empathy Modeling We now turn to modeling the empathy direction detection in IEMPATHIZE. We create classifiers in two different settings. A binary setting for identifying sentences seeking empathetic support and sentences providing empathetic support. We also explore a multi-class setting for identifying sentences seeking, providing, and none (neither seeking nor providing), to compare the performance between these two different settings. To create the providing-classifier, we group the two classes of none and seeking-empathy as negative samples and keep providing-empathy as positive samples. Similarly, to create the seeking-classifier, we group the two classes of none and providing-empathy as negative samples and keep seeking-empathy as positive samples. For the second setting, we consider all three classes. We then perform a 60/20/20 split to build the train, validation, and test sets. Table 4 shows the splits. Below, we discuss our models. ... Table 5 presents the classification results. As we can see from the table, BERT outperforms other models in both settings.
Researcher Affiliation Academia Mahshid Hosseini and Cornelia Caragea Computer Science Department University of Illinois at Chicago mhosse4@uic.edu, cornelia@uic.edu
Pseudocode No The paper describes the models and methods used (e.g., CNN, LSTM, BERT) but does not provide any structured pseudocode or algorithm blocks.
Open Source Code No Our dataset, which is available online,1 contains 5,007 sentences, each annotated with a fine-grained empathy label, seeking-empathy, providing-empathy, or none. (Footnote 1: https://github.com/Mahhos/Empathy). The paper explicitly states the *dataset* is available online, but does not explicitly state that the *code for the methodology* (e.g., the BERT model implementation or training scripts) is also provided at this link or elsewhere.
Open Datasets Yes In this paper, we introduce IEMPATHIZE, a dataset compiled from an online cancer survivors network annotated with perceived fine-grained empathy. ... Our dataset, which is available online,1 contains 5,007 sentences, each annotated with a fine-grained empathy label, seeking-empathy, providing-empathy, or none. (Footnote 1: https://github.com/Mahhos/Empathy)
Dataset Splits Yes We then perform a 60/20/20 split to build the train, validation, and test sets. Table 4 shows the splits. ... Table 4: Dataset splits. p/n represents positive vs. negative examples, respectively. Train Valid Test p / n p / n p / n seek 624/2, 379 199/802 223/780 provide 584/2, 419 182/821 200/801 Total 1, 208/4, 798 381/1, 623 423/1, 581
Hardware Specification No The paper mentions fine-tuning BERT, which implies the use of computational hardware, but it does not specify any particular GPU models, CPU models, memory sizes, or types of computing clusters used for the experiments.
Software Dependencies No The paper mentions using specific tools like “Hugging Face Transformers library (Wolf et al. 2020)” and “Stanford sentiment toolkit (Manning et al. 2014)” and models like “BERT (Devlin et al. 2019)”. However, it does not provide specific version numbers for these libraries or any underlying software dependencies like Python, PyTorch, or TensorFlow versions, which are crucial for reproducibility.
Experiment Setup No The paper states: “We estimate hyper-parameters on the validation set” and describes some general settings for BERT’s pre-training tasks (MLM/NSP masking details). However, it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) for the main fine-tuning experiments, which are essential for reproducing the results.