Attention-Fused Deep Matching Network for Natural Language Inference
Authors: Chaoqun Duan, Lei Cui, Xinchi Chen, Furu Wei, Conghui Zhu, Tiejun Zhao
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment results show that AF-DMN achieves state-of-the-art performance and outperforms strong baselines on Stanford natural language inference (SNLI), multigenre natural language inference (Multi NLI), and Quora duplicate questions datasets. |
| Researcher Affiliation | Collaboration | 1 Harbin Institute of Technology, Harbin, China 2 Microsoft Research Asia, Beijing, China 3 School of Computer Science, Fudan University, Shanghai, China |
| Pseudocode | No | The paper describes the model architecture and mathematical formulations but does not contain a dedicated 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | No | The paper mentions that the code for ESIM is available at 'https://github.com/lukecq1231/nli', but there is no explicit statement or link providing the open-source code for the AF-DMN methodology described in this paper. |
| Open Datasets | Yes | We evaluate our model on three datasets: the Stanford Natural Language Inference (SNLI), the Multi Genre NLI Corpus (Multi NLI) and Quora duplicate questions1 (Quora). SNLI The SNLI corpus [Bowman et al., 2015] contains 570,152 sentence pairs. Multi NLI The Multi NLI corpus [Williams et al., 2017] is a new dataset for NLI, which contains 433k sentences pairs. Quora The Quora corpus contains over 400,000 question pairs. 1https://data.quora.com/First-Quora-Dataset-Release-QuestionPairs |
| Dataset Splits | Yes | Train Dev Test Avg.L Vocab SNLI 549K 9.8K 9.8K 14 8 36K Multi NLI1 392K 9.8K 9.8K 22 11 85K Multi NLI2 9.8K 9.8K 22 11 85K Quora 384K 10K 10K 12 12 107K |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using Adam as an optimizer and GloVe vectors for initialization but does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | In our model, word embeddings and all hidden states of LSTMs and MLPs are 300 dimensions. For the SNLI dataset, there are 3 computational blocks in the deep matching layer, while there are 2 for Multi NLI and Quora datasets. We employ the Adam [Kingma and Ba, 2014] for training, whose default hyper-parameters β1 and β2 are set to 0.9 and 0.999 for optimization respectively. The initial learning rate of Adam is set to 0.0002. The learning rate is halved when the accuracy on the development set drops. We also employ a dropout strategy [Srivastava et al., 2014] on word embeddings and all MLPs to avoid over-fitting. The dropout rate is set to 0.2. The batch size is set to 64. We set the maximum length of sentences to 200. |