reproducibilityindex.ai

A Compare-Aggregate Model for Matching Text Sequences

Authors: Shuohang Wang, Jing Jiang

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate our model on four different datasets representing different tasks. The ﬁrst three datasets are question answering tasks while the last one is on textual entailment. The statistics of the four datasets are shown in Table 2. We will ﬁst introduce the task settings and the way we customize the compare-aggregate structure to each task. Then we will show the baselines for the different datasets. Finally, we discuss the experiment results shown in Table 3 and the ablation study shown in Table 4.
Researcher Affiliation	Academia	Shuohang Wang School of Information Systems Singapore Management University shwang.2014@phdis.smu.edu.sg Jing Jiang School of Information Systems Singapore Management University jingjiang@smu.edu.sg
Pseudocode	No	The paper describes the model architecture and components using text and mathematical equations, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	We have also made our code available online.1 Footnote 1: https://github.com/shuohangwang/Seq Match Seq
Open Datasets	Yes	We present a model that follows this general framework and test it on four different datasets, namely, Movie QA, Insurance QA, Wiki QA and SNLI. ... For the machine comprehension task Movie QA (Tapaswi et al., 2016)... For the SNLI (Bowman et al., 2015) dataset... For the Insurance QA (Feng et al., 2015) dataset... For the Wiki QA (Yang et al., 2015) datasets...
Dataset Splits	Yes	The statistics of the four datasets are shown in Table 2. ... Table 2: The statistics of different datasets. Q:question/hypothesis, C:candidate answers for each question, A:answer/hypothesis, P:plot, w:word (average). (Includes columns for 'train', 'dev', and 'test' for each dataset).
Hardware Specification	No	The paper describes hyper-parameters and software dependencies but does not specify any details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using GloVe embeddings and ADAMAX optimizer with specific coefficients but does not provide specific version numbers for any programming languages, libraries, or frameworks (e.g., Python, TensorFlow, PyTorch).
Experiment Setup	Yes	The implementation details of the modes are as follows. The word embeddings are initialized from GloVe (Pennington et al., 2014). During training, they are not updated. The word embeddings not found in GloVe are initialized with zero. The dimensionality l of the hidden layers is set to be 150. We use ADAMAX (Kingma & Ba, 2015) with the coefﬁcients β1 = 0.9 and β2 = 0.999 to optimize the model. We do not use L2regularization. The main parameter we tuned is the dropout on the embedding layer. For Wiki QA, which is a relatively small dataset, we also tune the learning rate and the batch size. For the others, we set the batch size to be 30 and the learning rate 0.002.