reproducibilityindex.ai

Weakly Supervised Explainable Phrasal Reasoning with Neural Fuzzy Logic

Authors: Zijun Wu, Zi Xuan Zhang, Atharva Naik, Zhijian Mei, Mauajama Firdaus, Lili Mou

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we developed a comprehensive methodology (data annotation and evaluation metrics) to quantitatively evaluate phrasal reasoning performance, which has not been accomplished in previous work. We extend previous studies and obtain plausible baseline models. Results show that our EPR yields a much more meaningful explanation regarding F scores against human annotation.
Researcher Affiliation	Academia	Zijun Wu 1, Zi Xuan Zhang 1, Atharva Naik 2, Zhijian Mei1, Mauajama Firdaus1, Lili Mou1 1Dept. Computing Science & Alberta Machine Intelligence Institute (Amii), University of Alberta 2Carnegie Mellon University
Pseudocode	No	The paper describes methods with text and diagrams but does not include structured pseudocode blocks or sections explicitly labeled 'Algorithm' or 'Pseudocode'.
Open Source Code	Yes	Our code and annotated data are released for future studies.1 Code and resources available at https://github.com/MANGA-UOFA/EPR
Open Datasets	Yes	The main dataset we used in our experiments is the Stanford Natural Language Inference (SNLI) dataset (Bowman et al., 2015)... Textual explanation generation was evaluated on the e-SNLI dataset (Camburu et al., 2018)... we provide additional results on the matched section of the MNLI dataset (Williams et al., 2018)
Dataset Splits	Yes	The main dataset we used in our experiments is the Stanford Natural Language Inference (SNLI) dataset (Bowman et al., 2015), which consists of 550K training samples, 10K validation samples, and another 10K test samples.
Hardware Specification	No	The paper does not explicitly state the specific hardware (e.g., GPU/CPU models, RAM) used for running experiments. It mentions 'our machine' in a general context without specifications.
Software Dependencies	No	The paper mentions 'Sentence-BERT', 'SpaCy', and 'T5-small model' but does not specify their version numbers, which are required for reproducible software dependencies.
Experiment Setup	Yes	We trained the model with a batch size of 256. We used Adam (Kingma & Ba, 2015) with a learning rate of 5e-5, β1=0.9, β2=0.999, learning rate warm up over the ﬁrst 10 percent of the total steps, and linear decay of the learning rate. The model was trained up to 3 epochs. We chose the coefﬁcient for the global feature in Eq. (1) from a candidate set of {0.0, 0.2, 0.4, 0.6, 0.8, 1.0}.