Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Detecting Information-Dense Texts in Multiple News Domains
Authors: Yinfei Yang, Ani Nenkova
AAAI 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train a classifier based on lexical, discourse and unlexicalized syntactic features and test its performance on a set of manually annotated articles from business, U.S. international relations, sports and science domains. Our results indicate that the task is feasible and that both syntactic and lexical features are highly predictive for the distinction. We observe considerable variation of prediction accuracy across domains and find that domain-specific models are more accurate. |
| Researcher Affiliation | Collaboration | Yinfei Yang Amazon Inc. EMAIL Ani Nenkova University of Pennsylvania EMAIL |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper. |
| Open Datasets | Yes | The data for our experiments comes from the New York Times (NYT) corpus (LDC2008T19). This corpus contains 20 years worth of NYT, along with metadata about the newspaper section in which the article appeared and manual summaries for many of the articles. |
| Dataset Splits | Yes | We perform 10-fold cross-validation on the automatically labeled data with all features combined, but also analyze the performance when only a given class of features is used. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | We trained a binary classifier using Lib SVM (R.-E. Fan and Lin 2008) with linear kernel and default parameter settings. |