reproducibilityindex.ai

Discourse Level Factors for Sentence Deletion in Text Simplification

Authors: Yang Zhong, Chao Jiang, Wei Xu, Junyi Jessy Li9709-9716

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper presents a data-driven study focusing on analyzing and predicting sentence deletion a prevalent but understudied phenomenon in document simpliﬁcation on a large English text simpliﬁcation corpus. We inspect various document and discourse factors associated with sentence deletion, using a new manually annotated sentence alignment corpus we collected. To predict whether a sentence will be deleted during simpliﬁcation to a certain level, we harness automatically aligned data to train a classiﬁcation model. Evaluated on our manually annotated data, our best models reached F1 scores of 65.2 and 59.7 for this task at the levels of elementary and middle school, respectively.
Researcher Affiliation	Academia	Yang Zhong,*1 Chao Jiang,1 Wei Xu,1 Junyi Jessy Li2 1Department of Computer Science and Engineering, The Ohio State University 2Department of Linguistics, The University of Texas at Austin
Pseudocode	No	No pseudocode or algorithm blocks are present in the paper.
Open Source Code	No	The paper states: 'To request our data, please ﬁrst obtain access to the Newsela corpus at: https://newsela.com/data/, then contact the authors.' This refers to data access, not the release of their source code. No other explicit statement about releasing source code for their methodology is found.
Open Datasets	Yes	We use the Newsela text simpliﬁcation corpus (Xu, Callison Burch, and Napoles 2015) of 936 news articles.
Dataset Splits	Yes	We use 15 of the manually aligned articles as the validation set and the other 35 articles as test set.
Hardware Specification	No	The acknowledgments section states: 'We thank NVIDIA and Texas Advanced Computing Center at UT Austin for providing GPU computing resources', but it does not specify any exact GPU models, CPU types, or other detailed hardware specifications.
Software Dependencies	No	The paper mentions software like PyTorch and Scikit-learn but does not provide specific version numbers for these or any other key software components, which are required for a reproducible description.
Experiment Setup	Yes	We use Adam (Kingma and Ba 2015) for optimization and also apply a dropout of 0.5 to prevent overﬁtting. We set the learning rate to 1e-5 and 2e-5 for experiments in Tables 9 and 10 respectively. We set the batch size to 64. We followed (Maddela and Xu 2018) and set the number of bins k to 10 and the adjustable fraction γ to 0.2 for the Gaussian feature vectorization layer.