Unsupervised Explanation Generation via Correct Instantiations

Authors: Sijie Cheng, Zhiyong Wu, Jiangjie Chen, Zhixing Li, Yang Liu, Lingpeng Kong

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on two standard explanation benchmarks, i.e., Com VE and e-SNLI. According to both automatic and human evaluations, NEON outperforms baselines, even for those with human-annotated instantiations.
Researcher Affiliation Collaboration Sijie Cheng1,2*, Zhiyong Wu1 , Jiangjie Chen2, Zhixing Li3, Yang Liu5,6, Lingpeng Kong1,4 1Shanghai Artificial Intelligence Laboratory 2Fudan University 3Full Truck Alliance 4The University of Hong Kong 5Institute for AI Industry Research, Tsinghua University 6Department of Computer Science and Technology, Tsinghua University
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The resources of NEON are available at: https://github.com/Shark-NLP/Neon.
Open Datasets Yes Our experiments are conducted on the two important explanation benchmarks, Com VE (Wang et al. 2020) and e-SNLI (Camburu et al. 2018).
Dataset Splits Yes Then they divide all these annotated instances into train/dev/test datasets with 10,000/997/1,000 instances. As for the e-SNLI task, the cn and sn can be seen as entailment and contradiction statements, respectively. Filtering the odd instances with only entailment or contradiction statement, our obtained train/dev/test is 5,189/3,280/2,640.
Hardware Specification Yes Our experiments are conducted with 8 A100 GPUs.
Software Dependencies No The paper mentions using "OPT-175B", "GPT2-large", and "RoBERTa-large" models, but does not provide specific version numbers for underlying software libraries, frameworks, or programming languages (e.g., PyTorch version, Python version).
Experiment Setup Yes In the first phase, to fix the maxlength of the context window (nctx = 2048), we set the number of examples as K = 16. Moreover, the max length of generated instantiations is 25 for Com VE and 40 for e SNLI. In the second phase, the max length of generated explanations is 30 for both tasks. The hyper-parameter of Top-p is 0.9, and the temperature is 0 for all generation models.