A Compare-Aggregate Model for Matching Text Sequences

Authors: Shuohang Wang, Jing Jiang

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate our model on four different datasets representing different tasks. The first three datasets are question answering tasks while the last one is on textual entailment. The statistics of the four datasets are shown in Table 2. We will fist introduce the task settings and the way we customize the compare-aggregate structure to each task. Then we will show the baselines for the different datasets. Finally, we discuss the experiment results shown in Table 3 and the ablation study shown in Table 4.
Researcher Affiliation Academia Shuohang Wang School of Information Systems Singapore Management University shwang.2014@phdis.smu.edu.sg Jing Jiang School of Information Systems Singapore Management University jingjiang@smu.edu.sg
Pseudocode No The paper describes the model architecture and components using text and mathematical equations, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes We have also made our code available online.1 Footnote 1: https://github.com/shuohangwang/Seq Match Seq
Open Datasets Yes We present a model that follows this general framework and test it on four different datasets, namely, Movie QA, Insurance QA, Wiki QA and SNLI. ... For the machine comprehension task Movie QA (Tapaswi et al., 2016)... For the SNLI (Bowman et al., 2015) dataset... For the Insurance QA (Feng et al., 2015) dataset... For the Wiki QA (Yang et al., 2015) datasets...
Dataset Splits Yes The statistics of the four datasets are shown in Table 2. ... Table 2: The statistics of different datasets. Q:question/hypothesis, C:candidate answers for each question, A:answer/hypothesis, P:plot, w:word (average). (Includes columns for 'train', 'dev', and 'test' for each dataset).
Hardware Specification No The paper describes hyper-parameters and software dependencies but does not specify any details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using GloVe embeddings and ADAMAX optimizer with specific coefficients but does not provide specific version numbers for any programming languages, libraries, or frameworks (e.g., Python, TensorFlow, PyTorch).
Experiment Setup Yes The implementation details of the modes are as follows. The word embeddings are initialized from GloVe (Pennington et al., 2014). During training, they are not updated. The word embeddings not found in GloVe are initialized with zero. The dimensionality l of the hidden layers is set to be 150. We use ADAMAX (Kingma & Ba, 2015) with the coefficients β1 = 0.9 and β2 = 0.999 to optimize the model. We do not use L2regularization. The main parameter we tuned is the dropout on the embedding layer. For Wiki QA, which is a relatively small dataset, we also tune the learning rate and the batch size. For the others, we set the batch size to be 30 and the learning rate 0.002.