Weakly Supervised Explainable Phrasal Reasoning with Neural Fuzzy Logic

Authors: Zijun Wu, Zi Xuan Zhang, Atharva Naik, Zhijian Mei, Mauajama Firdaus, Lili Mou

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we developed a comprehensive methodology (data annotation and evaluation metrics) to quantitatively evaluate phrasal reasoning performance, which has not been accomplished in previous work. We extend previous studies and obtain plausible baseline models. Results show that our EPR yields a much more meaningful explanation regarding F scores against human annotation.
Researcher Affiliation Academia Zijun Wu 1, Zi Xuan Zhang 1, Atharva Naik 2, Zhijian Mei1, Mauajama Firdaus1, Lili Mou1 1Dept. Computing Science & Alberta Machine Intelligence Institute (Amii), University of Alberta 2Carnegie Mellon University
Pseudocode No The paper describes methods with text and diagrams but does not include structured pseudocode blocks or sections explicitly labeled 'Algorithm' or 'Pseudocode'.
Open Source Code Yes Our code and annotated data are released for future studies.1 Code and resources available at https://github.com/MANGA-UOFA/EPR
Open Datasets Yes The main dataset we used in our experiments is the Stanford Natural Language Inference (SNLI) dataset (Bowman et al., 2015)... Textual explanation generation was evaluated on the e-SNLI dataset (Camburu et al., 2018)... we provide additional results on the matched section of the MNLI dataset (Williams et al., 2018)
Dataset Splits Yes The main dataset we used in our experiments is the Stanford Natural Language Inference (SNLI) dataset (Bowman et al., 2015), which consists of 550K training samples, 10K validation samples, and another 10K test samples.
Hardware Specification No The paper does not explicitly state the specific hardware (e.g., GPU/CPU models, RAM) used for running experiments. It mentions 'our machine' in a general context without specifications.
Software Dependencies No The paper mentions 'Sentence-BERT', 'SpaCy', and 'T5-small model' but does not specify their version numbers, which are required for reproducible software dependencies.
Experiment Setup Yes We trained the model with a batch size of 256. We used Adam (Kingma & Ba, 2015) with a learning rate of 5e-5, β1=0.9, β2=0.999, learning rate warm up over the first 10 percent of the total steps, and linear decay of the learning rate. The model was trained up to 3 epochs. We chose the coefficient for the global feature in Eq. (1) from a candidate set of {0.0, 0.2, 0.4, 0.6, 0.8, 1.0}.