reproducibilityindex.ai

Resolving Word Vagueness with Scenario-guided Adapter for Natural Language Inference

Authors: Yonghao Liu, Mengyu Li, Di Liang, Ximing Li, Fausto Giunchiglia, Lan Huang, Xiaoyue Feng, Renchu Guan

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive benchmark experiments demonstrate that our proposed Scena Fuse, a scenario-guided approach, consistently boosts NLI performance. We conduct extensive experiments on benchmarks to evaluate the effectiveness of our approach. Our empirical results demonstrate the superiority of Scena Fuse compared to other competitive baselines.
Researcher Affiliation	Academia	Yonghao Liu1 , Mengyu Li1 , Di Liang2 , Ximing Li1 , Fausto Giunchiglia3 , Lan Huang1 , Xiaoyue Feng1 and Renchu Guan1 1Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University 2Fudan University 3University of Trento
Pseudocode	No	The paper describes the model architecture and mathematical formulations but does not present any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not include any statement about releasing source code or a link to a code repository.
Open Datasets	Yes	We employ three datasets widely used by previous methods to validate the effectiveness of our proposed scenario-guided adapter. (I) SNLI [Bowman et al., 2015] is a large-scale dataset designed for NLI tasks, where premises are summarized from photo captions in the Flickr30k and hypotheses generated by humans. (II) SNLI-hard [Gururangan et al., 2018] is built upon the SNLI dataset, but it excludes examples from the original test set that have annotation artifacts. (III) SNLI-lexical [Glockner et al., 2018] is also based on the SNLI. Note that SNLI-hard and SNLI-lexical have the same training and validation sets as SNLI.
Dataset Splits	Yes	SNLI-hard and SNLI-lexical have the same training and validation sets as SNLI. The batch size is determined by grid search in {16, 32, 64}.
Hardware Specification	No	The paper mentions 'Our limited computational resources' when discussing LLMs, but does not provide specific details about the GPUs, CPUs, or other hardware used for the experiments.
Software Dependencies	No	The paper mentions using 'BERT [Devlin et al., 2019] or Ro BERTa [Liu et al., 2019b]' and 'Res Net-50' but does not specify exact version numbers for these or other software dependencies, such as Python, PyTorch, or TensorFlow versions.
Experiment Setup	Yes	In the default experimental setting, we use a pre-trained Res Net-50 as the image encoder to initialize the visual representation Xvis, which is later frozen during training. The BERT-base model is fine-tuned during training. Moreover, we use the Adam W optimizer with learning rate values of {1e-5, 2e-5, 3e-5, 5e-5}. The warm-up and weight decay are set as 0.1 and 1e-8, respectively. The batch size is determined by grid search in {16, 32, 64}. Additionally, the dropout is within the range of {0.1, 0.2, 0.3}. Meanwhile, we apply gradient clipping within {7.0, 10.0, 15.0} to prevent gradient explosion.