reproducibilityindex.ai

Explaining (Sarcastic) Utterances to Enhance Affect Understanding in Multimodal Dialogues

Authors: Shivani Kumar, Ishani Mondal, Md Shad Akhtar, Tanmoy Chakraborty

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluation shows that MOSES outperforms the state-of-the-art system for SED by an average of 2% on different evaluation metrics, such as ROUGE, BLEU, and METEOR. Further, we observe that leveraging the generated explanation advances three downstream tasks for affect classification an average improvement of 14% F1-score in the sarcasm detection task and 2% in the humour identification and emotion recognition task. We also perform extensive analyses to assess the quality of the results. Experiment and Results This section illustrates the feature extraction strategy we use and the baseline systems to which we compare our model, followed by the results we obtain for the SED task. We use the standard generative metrics ROUGE-1/2/L (Lin 2004), BLEU-1/2/3/4 (Papineni et al. 2002), and METEOR (Denkowski and Lavie 2014) to capture the syntactic and semantic performance of our system.
Researcher Affiliation	Academia	Shivani Kumar1, Ishani Mondal2, Md Shad Akhtar1, Tanmoy Chakraborty3 1Indraprastha Institute of Information Technology Delhi, India 2University of Maryland, College Park 3Indian Institute of Technology Delhi, India shivaniku@iiitd.ac.in, ishani340@gmail.com, shad.akhtar@iiitd.ac.in, tanchak@iitd.ac.in
Pseudocode	No	The paper does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code	Yes	Reproducibility: The source code for MOSES and the execution instructions are present here: https://github.com/ LCS2-IIITD/MOSES.git.
Open Datasets	Yes	Due to the prevalence of code-mixing in today s world, we consider the WITS dataset (Kumar et al. 2022), which contains code-mixed dialogues (English-Hindi) from an Indian TV series.
Dataset Splits	Yes	Table 1: Statistics of the sarcasm, humour, and emotion (N: Ntrl: Neutral, Ang: Anger) datasets in consideration (number of dialogue instances marked as sarcastic (#S), non-sarcastic (#NS), non-humorous (#NH), and humorous (#H).). Train 1792 1669 1792 2795 995 1590 1147 623 429 Val 224 213 224 362 112 196 133 87 57 Test 224 218 224 367 106 195 141 70 67 Total 2240 2100 2240 3524 1213 1981 1421 780 553
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU specifications, or memory.
Software Dependencies	No	The paper mentions software components like 'python s g TTS library' and 'e Ge MAPS model', and models like 'BART' and 'Ro BERTa model', but it does not specify exact version numbers for these components or any other software dependencies needed for reproducibility.
Experiment Setup	No	The paper states, 'Details about the execution process and the hyperparameters used are mentioned in the supplementary.' However, it does not include these specific details (e.g., learning rates, batch sizes, optimizer settings) within the main text of the paper.