Natural Language Inference over Interaction Space
Authors: Yichen Gong, Heng Luo, Jian Zhang
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that an interaction tensor (attention weight) contains semantic information to solve natural language inference, and a denser interaction tensor contains richer semantic information. One instance of such architecture, Densely Interactive Inference Network (DIIN), demonstrates the state-of-the-art performance on large scale NLI copora and large-scale NLI alike corpus. It s noteworthy that DIIN achieve a greater than 20% error reduction on the challenging Multi-Genre NLI (Multi NLI; Williams et al. 2017) dataset with respect to the strongest published system. |
| Researcher Affiliation | Collaboration | Yichen Gong , Heng Luo , Jian Zhang New York University, New York, USA Horizon Robotics, Inc., Beijing, China yichen.gong@nyu.edu, {heng.luo, jian.zhang}@hobot.cc |
| Pseudocode | No | The paper describes the architecture and its components in text and with a diagram (Figure 1), but it does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is open sourced at https://github.com/Yichen Gong/Densely-Interactive-Inference-Network |
| Open Datasets | Yes | SNLI Stanford Natural Language Inference (SNLI; Bowman et al. 2015) has 570k human annotated sentence pairs. Multi NLI Multi-Genre NLI Corpus (Multi NLI; Williams et al. 2017) has 433k sentence pairs... Quora question pair Quora question pair dataset contains over 400k real world question pair selected from Quora.com. |
| Dataset Splits | Yes | We use the same data split as in Bowman et al. (2015). (for SNLI) and We use the same data split as provided by Williams et al. (2017). (for Multi NLI) and Half of these selected genres appear in training set while the rest are not, creating in-domain (matched) and cross-domain (mismatched) development/test sets. (for Multi NLI) and We select the parameter by the best run of development accuracy. |
| Hardware Specification | No | The paper describes the software framework (TensorFlow) and optimization details but does not specify any hardware components like GPU models, CPU types, or memory used for experiments. |
| Software Dependencies | No | The paper states, 'We implement our algorithm with TensorFlow(Abadi et al., 2016) framework,' but it does not specify the version number of TensorFlow or any other software dependencies. |
| Experiment Setup | Yes | An Adadelta optimizer(Zeiler, 2012) with ρ as 0.95 and ϵ as 1e 8 is used to optimize all the trainable weights. The initial learning rate is set to 0.5 and batch size to 70. When the model does not improve best in-domain performance for 30,000 steps, an SGD optimizer with learning rate of 3e 4 is used to help model to find a better local optimum. Dropout layers are applied before all linear layers and after word-embedding layer. We use an exponential decayed keep rate during training, where the initial keep rate is 1.0 and the decay rate is 0.977 for every 10,000 step. The sequence length is set as a hard cutoff on all experiments: 48 for Multi NLI, 32 for SNLI and 24 for Quora Question Pair Dataset. |