Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains
Authors: Zhaopei Huang, Jinming Zhao, Qin Jin
IJCAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on the RECCON-DD dataset [Poria et al., 2021]. This dataset supplements causal utterance annotations for each non-neutral utterance in the conversations of the Daily Dialog dataset [Li et al., 2017]. Extensive experimental results over various settings demonstrate the effectiveness of our method for predicting emotion-cause utterances and performing explainable emotion-cause reasoning. |
| Researcher Affiliation | Academia | Zhaopei Huang1 , Jinming Zhao2 , Qin Jin1 1Renmin University of China 2Independent Researcher EMAIL, EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. It uses diagrams and textual descriptions to explain procedures. |
| Open Source Code | Yes | Our code, data and more details are at https://github.com/hzp3517/ECR-Chain. |
| Open Datasets | Yes | We conduct experiments on the RECCON-DD dataset [Poria et al., 2021]. This dataset supplements causal utterance annotations for each non-neutral utterance in the conversations of the Daily Dialog dataset [Li et al., 2017]. |
| Dataset Splits | Yes | Table 1: Dataset statistics. We consider each target utterance as a sample. A conversation may contain several target utterances, forming several samples. Statistics Train Valid Test Samples 4,562 200 1,099 |
| Hardware Specification | No | The paper mentions using 'Chat GPT (gpt-3.5-turbo-0613)' and 'Vicuna-7B-v1.3' but does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | Yes | We utilize Chat GPT (gpt-3.5-turbo-0613) as our LLM... For the smaller language model, we opt for Vicuna-7B-v1.3... applied Lo RA fine-tuning [Hu et al., 2021] |
| Experiment Setup | Yes | Our total training batch is set to 256 (with gradient accumulation) and the learning rate is set to 1e-3. We train 10 epochs and pick the model that performed best on the validation set to evaluate on the test set. |