reproducibilityindex.ai

Distraction-Based Neural Networks for Modeling Document

Authors: Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Without engineering any features, we train the models on two large datasets. The models achieve the state-of-the-art performance, and they signiﬁcantly beneﬁt from the distraction modeling, particularly when input documents are long.
Researcher Affiliation	Collaboration	1University of Science and Technology of China, Hefei, China 2York University, Canada 3i FLYTEK Research, Hefei, China
Pseudocode	Yes	Algorithm 1 Beam search with distraction
Open Source Code	Yes	We make our code publicly available2. Our implementation uses python and is based on the Theano library [Bergstra et al., 2010]. Footnote 2: Our code is available at https://github.com/lukecq1231/nats
Open Datasets	Yes	We experiment with our summarization models on two publicly available corpora with different document lengths and in different languages: a CNN news collection [Hermann et al., 2015] and a Chinese corpus made available more recently in [Hu et al., 2015].
Dataset Splits	Yes	We used the original training/testing split mentioned in [Hu et al., 2015], but additionally randomly sampled a small part of the training data as our validation set. Table 1: CNN LCSTS Train Valid Test Train Valid Test ... # Doc. 81,824 1,184 1,093 2,400,000 591 725
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper mentions 'python' and 'Theano library' but does not specify their version numbers or other software dependencies with versions.
Experiment Setup	Yes	We used mini-batch stochastic gradient descent (SGD) to optimize log-likelihood, and Adadelta [Zeiler, 2012] to automatically adapt the learning rate of parameters ( = 10 6 and = 0.95). For the CNN dataset, training was performed with shufﬂed mini-batches of size 64... We limit our vocabulary to include the top 25,000 most frequent words... we set embedding dimension to be 120, the vector length in hidden layers to be 500 for uni-GRU and 600 for bi-GRU. An end-of-sentence token was inserted between every sentence, and an end-of-document token was added at the end. The beam size of decoder was set to be 5.