reproducibilityindex.ai

Natural Language Inference over Interaction Space

Authors: Yichen Gong, Heng Luo, Jian Zhang

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that an interaction tensor (attention weight) contains semantic information to solve natural language inference, and a denser interaction tensor contains richer semantic information. One instance of such architecture, Densely Interactive Inference Network (DIIN), demonstrates the state-of-the-art performance on large scale NLI copora and large-scale NLI alike corpus. It s noteworthy that DIIN achieve a greater than 20% error reduction on the challenging Multi-Genre NLI (Multi NLI; Williams et al. 2017) dataset with respect to the strongest published system.
Researcher Affiliation	Collaboration	Yichen Gong , Heng Luo , Jian Zhang New York University, New York, USA Horizon Robotics, Inc., Beijing, China yichen.gong@nyu.edu, {heng.luo, jian.zhang}@hobot.cc
Pseudocode	No	The paper describes the architecture and its components in text and with a diagram (Figure 1), but it does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	The code is open sourced at https://github.com/Yichen Gong/Densely-Interactive-Inference-Network
Open Datasets	Yes	SNLI Stanford Natural Language Inference (SNLI; Bowman et al. 2015) has 570k human annotated sentence pairs. Multi NLI Multi-Genre NLI Corpus (Multi NLI; Williams et al. 2017) has 433k sentence pairs... Quora question pair Quora question pair dataset contains over 400k real world question pair selected from Quora.com.
Dataset Splits	Yes	We use the same data split as in Bowman et al. (2015). (for SNLI) and We use the same data split as provided by Williams et al. (2017). (for Multi NLI) and Half of these selected genres appear in training set while the rest are not, creating in-domain (matched) and cross-domain (mismatched) development/test sets. (for Multi NLI) and We select the parameter by the best run of development accuracy.
Hardware Specification	No	The paper describes the software framework (TensorFlow) and optimization details but does not specify any hardware components like GPU models, CPU types, or memory used for experiments.
Software Dependencies	No	The paper states, 'We implement our algorithm with TensorFlow(Abadi et al., 2016) framework,' but it does not specify the version number of TensorFlow or any other software dependencies.
Experiment Setup	Yes	An Adadelta optimizer(Zeiler, 2012) with ρ as 0.95 and ϵ as 1e 8 is used to optimize all the trainable weights. The initial learning rate is set to 0.5 and batch size to 70. When the model does not improve best in-domain performance for 30,000 steps, an SGD optimizer with learning rate of 3e 4 is used to help model to ﬁnd a better local optimum. Dropout layers are applied before all linear layers and after word-embedding layer. We use an exponential decayed keep rate during training, where the initial keep rate is 1.0 and the decay rate is 0.977 for every 10,000 step. The sequence length is set as a hard cutoff on all experiments: 48 for Multi NLI, 32 for SNLI and 24 for Quora Question Pair Dataset.