reproducibilityindex.ai

Learning Robust Rationales for Model Explainability: A Guidance-Based Approach

Authors: Shuaibo Hu, Kui Yu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on two synthetic settings prove that our method is robust to the rationalization degeneration and failure problems, while the results on two real datasets show its effectiveness in providing rationales in line with human judgments.
Researcher Affiliation	Academia	Shuaibo Hu, Kui Yu* School of Computer and Information, Hefei University of Technology shuaibohu@mail.hfut.edu.cn, yukui@hfut.edu.cn
Pseudocode	No	The paper describes the proposed method using textual descriptions and mathematical equations, but it does not include pseudocode or an algorithm block.
Open Source Code	Yes	The source code is available at https://github.com/shuaibo919/g-rat.
Open Datasets	Yes	Following the work of Huang et al. (2021) and Liu et al. (2022), we consider two widely used datasets for selective rationalization. 1) Beer Advocate (Mc Auley, Leskovec, and Jurafsky 2012) contains more than 220,000 beer reviews... 2) Hotel Review (Wang, Lu, and Zhai 2010) is another multi-aspect dataset similar to Beer Advocate.
Dataset Splits	No	The paper mentions following settings from previous works and that 'The Appendix has pre-processing settings details', but it does not explicitly provide specific dataset split information (percentages, sample counts, or direct links to splits) in the provided text.
Hardware Specification	Yes	Experiments are all conducted on a single Tesla A100 GPU.
Software Dependencies	No	The paper mentions using 'Glove' for embeddings, 'GRU' as the encoder, and 'Adam' as the optimizer. However, it does not provide specific version numbers for any libraries or software dependencies like Python or PyTorch.
Experiment Setup	Yes	More detailed settings on training and hyperparameters can be found in Appendix. In the previous setup, we set λguide = 5.0 and λmatch = 1.0. This is an empirical choice because these two regularizers can have a similar scale as the task loss Ltask. ... we predefined the sparsity with {15%, 10%, 10%} respectively. ... We linearly decay this coefficient τ until it reaches 0 in training.