Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Empirical Analysis of Dialogue Relation Extraction with Large Language Models
Authors: Guozheng Li, Zijie Xu, Ziyu Shang, Jiajun Liu, Ke Ji, Yikai Guo
IJCAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on the two versions (V1 and V2) of Dialog RE dataset [Yu et al., 2020], the first human-annotated DRE dataset, originating from the complete transcripts of the series Friends. We compare the performances of generation-based methods with previous sequence-based and graph-based methods, and conduct extensive experiments to provide valuable insights and guide further exploration. |
| Researcher Affiliation | Academia | 1School of Computer Science and Engineering, Southeast University 2Beijing Institute of Computer Technology and Application |
| Pseudocode | No | The paper describes methods and processes but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | 3https://github.com/1iguozheng/Landre |
| Open Datasets | Yes | We conduct experiments on the two versions (V1 and V2 2) of Dialog RE dataset [Yu et al., 2020], the first human-annotated DRE dataset, originating from the complete transcripts of the series Friends. 2https://dataset.org/dialogre/ |
| Dataset Splits | Yes | Dialog RE Train Dev Test # Conversations 1,073 358 357 Average dialogue length 229.5 224.1 214.2 # Argument pairs 5,963 1,928 1,858 Average # of turns 13.1 13.1 12.4 Average # of speakers 3.3 3.2 3.3 |
| Hardware Specification | Yes | all experiments are conducted on a single Geforce GTX 3090 GPU. |
| Software Dependencies | No | The paper mentions various models and optimizers (e.g., GPT-2, BART, T5, BLOOM, LLaMA, AdamW, LoRA) but does not provide specific version numbers for the programming languages, libraries, or frameworks used (e.g., Python version, PyTorch version, Hugging Face Transformers version). |
| Experiment Setup | Yes | We set the rank r of the Lo RA parameters to 8 and the merging ratio α to 32. The model is optimized with Adam W [Loshchilov and Hutter, 2019] using learning rate 1e-4 with a linear warm up [Goyal et al., 2017] for the first 6% steps followed by a linear decay to 0. We train Landre for 5 epochs with batch size 4 |