Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Gaussian Transformer: A Lightweight Approach for Natural Language Inference
Authors: Maosheng Guo, Yu Zhang, Ting Liu6489-6496
AAAI 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our model achieves new state-of-the-art performance on both SNLI and Multi NLI benchmarks with significantly fewer parameters and considerably less training time. |
| Researcher Affiliation | Academia | Maosheng Guo, Yu Zhang, Ting Liu Research Center for Social Computing and Information Retrieval Harbin Institute of Technology, China EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions implementing the model using TensorFlow and Tensor2Tensor, and refers to publicly available code for ESIM, but does not provide concrete access or an explicit statement about releasing the code for their own described methodology. |
| Open Datasets | Yes | We conduct experiments on SNLI and Multi NLI datasets, which consists of 570k / 433k English sentence pairs, to train and evaluate the proposed model. |
| Dataset Splits | No | The paper mentions using "Multi NLI validation datasets" and "development datasets" but does not specify the exact percentages or counts for training/test/validation splits needed for reproduction. |
| Hardware Specification | Yes | All experiments are conducted on a single Nvidia Titan Xp GPU. |
| Software Dependencies | No | The paper states: "We implement our model using Tensorflow (Abadi et al. 2016), with the library tensor2tensor (Vaswani et al. 2018)." While it mentions software by name and provides citations, it does not specify exact version numbers for reproducibility (e.g., TensorFlow 1.x or Tensor2Tensor 1.x). |
| Experiment Setup | Yes | The best performing individual model consists of M = 3 encoding blocks, N = 2 interaction blocks, using H = 4 heads attention with dmodel = 120, dw = 300, dc = 30. Word embeddings are initialized from the pretrained fasttext word vectors (Bojanowski et al. 2016), while character-level 5-grams embeddings are randomly initialized, and all embeddings remain fixed during training. We share the parameters of encoding and interaction blocks between premise and hypothesis, where the parameters at various depth, however, are different. Dropout (Srivastava et al. 2014) (rate = 0.1) is applied to all sub-layers. We employ the Adam WR algorithm (Loshchilov and Hutter 2017) to train our model on SNLI and Multi NLI separately, with batch size 64, learning rate range [4E 5, 3E 4], normalized weight decay wnorm = 1/600, restarting term T = 10. |