Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Combining Lexical and Syntactic Features for Detecting Content-Dense Texts in News

Authors: Yinfei Yang, Ani Nenkova

JAIR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here we empirically test this assumption on news articles from the business, U.S. international relations, sports and science journalism domains. Our ﬁndings clearly indicate that about half of the news texts in our study are in fact not content-dense and motivate the development of a supervised content-density detector. We heuristically label a large training corpus for the task and train a two-layer classifying model based on lexical and unlexicalized syntactic features. On manually annotated data, we compare the performance of domain-speciﬁc classiﬁers, trained on data only from a given news domain and a general classiﬁer in which data from all four domains is pooled together. Our annotation and prediction experiments demonstrate that the concept of content density varies depending on the domain and that naive annotators provide judgement biased toward the stereotypical domain label.
Researcher Affiliation	Collaboration	Yinfei Yang EMAIL 1600 Amphitheatre Pkwy Mountain View, CA 94043 Ani Nenkova EMAIL University of Pennsylvania 3330 Walnut Street Philadelphia, PA, 19103 USA
Pseudocode	No	The paper describes the methodology using textual explanations and diagrams (Figure 2) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	All data for the work presented in this paper and the domain-dependent and general classiﬁers will be made publicly with the publication of this article.
Open Datasets	Yes	The data for our experiments comes from the New York Times (NYT) annotated corpus (LDC Catalog No. LDC2008T19). The corpus contains 20 years worthy of NYT editions, along with rich meta-data about the newspaper section in which the article appeared and summaries produced by information scientists for many of the articles. The leads of articles are explicitly marked in the corpus, so extracting the relevant text for further analysis is straightforward.
Dataset Splits	Yes	We perform 10-fold cross-validation experiments on the entire heuristically labeled data. The entire dataset is split into 10 partitions. At each run, ﬁve partitions are used for training ﬁrst-stage classiﬁers and the feature-level combination classiﬁer. Four partitions are used for training the second-stage combination classiﬁer, which uses only the probabilities of the content-dense class from the ﬁrst stage classiﬁers. One partition is used for testing the classiﬁers.
Hardware Specification	No	The paper does not explicitly describe the hardware used for running the experiments. It focuses on the models and data without mentioning specific CPUs, GPUs, or other hardware components.
Software Dependencies	Yes	In the feature-level combination system, we train the binary classiﬁer using Liblinear (R.E. Fan & Lin, 2008) with L2-regularized logistic regression model setting. In the decisionlevel combination experiments, we ﬁrst train binary classiﬁers based on each feature representation using Lib Linear with the same settings. Using the probability outputs (for the content-dense class) of the ﬁrst stage classiﬁers as features, we then train a ﬁnal binary classiﬁer using Lib SVM (Chang & Lin, 2011) with linear kernel. Grid search is used on training and development set to ﬁnd the best hyper-parameters in all models. ... Stanford Core NLP package (Manning, Surdeanu, Bauer, Finkel, Bethard, & Mc Closky, 2014) is used to extract production rules.
Experiment Setup	Yes	In the feature-level combination system, we train the binary classiﬁer using Liblinear (R.E. Fan & Lin, 2008) with L2-regularized logistic regression model setting. In the decisionlevel combination experiments, we ﬁrst train binary classiﬁers based on each feature representation using Lib Linear with the same settings. Using the probability outputs (for the content-dense class) of the ﬁrst stage classiﬁers as features, we then train a ﬁnal binary classiﬁer using Lib SVM (Chang & Lin, 2011) with linear kernel. Grid search is used on training and development set to ﬁnd the best hyper-parameters in all models.