reproducibilityindex.ai

Hierarchical Attention Networks for Sentence Ordering

Authors: Tianming Wang, Xiaojun Wan7184-7191

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed model on three datasets, the ar Xiv dataset (Chen, Qiu, and Huang 2016), the VIST dataset (Huang et al. 2016) and the ROCStory dataset (Mostafazadeh et al. 2016). (...) Table 1 shows the results of all methods on three datasets. Among all prior methods, LSTM+Set2Seq has the best performance. We can see that our model strongly outperforms it and achieves the state-of-the-art scores on all datasets.
Researcher Affiliation	Academia	Tianming Wang, Xiaojun Wan Institute of Computer Science and Technology, Peking University The MOE Key Laboratory of Computational Linguistics, Peking University {wangtm,wanxiaojun}@pku.edu.cn
Pseudocode	No	The paper provides detailed descriptions of its model components and mathematical formulations, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The paper does not contain any statement about releasing the source code for the methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	We evaluate the proposed model on three datasets, the ar Xiv dataset (Chen, Qiu, and Huang 2016), the VIST dataset (Huang et al. 2016) and the ROCStory dataset (Mostafazadeh et al. 2016).
Dataset Splits	Yes	The ar Xiv dataset is a very large dataset for sentence ordering, which contains 884912 training abstracts, 110614 validation abstracts and 110615 testing abstracts of papers on ar Xiv website. The VIST dataset (...) includes 40155 training stories, 4990 validation stories and 5055 testing stories. The ROCStory dataset (...) We randomly split the dataset by 8:1:1 to get the training, validation and testing datasets of 78529, 9816 and 9817 stories respectively.
Hardware Specification	No	The paper does not specify any particular hardware components such as CPU or GPU models, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions 'We use TensorFlow to implement our model' and 'initialize all models (include baselines) with 300-dimensional GloVe word vectors' but does not provide specific version numbers for these software components.
Experiment Setup	Yes	We use the Adam Optimizer with an initial learning rate of 10^-4, momentum β1 = 0.9, β2 = 0.98 and weight decay ϵ = 10^-9. We set batch size to 64 and stop training when the metric Kendall’s τ on the validation set does not improve for 3 epochs. The size of hidden states of LSTM is set to 300 and dimension d is set to 600. The head of attention H is set to 4 and the number of layers M is set to 3. We apply dropout to the output of each multi-head attention sub-layer. We use a rate Pdrop = 0.05 for the ar Xiv dataset and Pdrop = 0.15 for the VIST and the ROCStory datasets.