Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Ask, and Shall You Receive? Understanding Desire Fulfillment in Natural Language Text
Authors: Snigdha Chaturvedi, Dan Goldwasser, Hal Daume III
AAAI 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments with two different datasets demonstrate the importance of understanding the narrative and discourse structure to address this task. Table 4 compares the test set performances using F1 score of the positive (desire fulfilled) class for various models. |
| Researcher Affiliation | Academia | 1University of Maryland, College Park 2Purdue University |
| Pseudocode | Yes | Algorithm 1 Training algorithm for LSNM |
| Open Source Code | No | No explicit statement about releasing the source code for their proposed methodology was found. The paper mentions "BIUTEE: A modular open-source system for recognizing textual entailment" which is a third-party tool they used, not their own code. |
| Open Datasets | Yes | We have used two manually annotated datasets for our experiments: MCTest and Simple Wiki. Both the datasets are available on the first author s webpage. |
| Dataset Splits | Yes | Also, the number of latent states, H, was set to be 2 and 15 for the MCTest and Simple Wiki datasets respectively using cross-validation. |
| Hardware Specification | No | No specific hardware details (such as GPU/CPU models, memory, or cloud instance types) used for running experiments are provided in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers are provided. The paper mentions tools like "Stanford Core NLP coreference resolution system" and "BIUTEE" (Stern and Dagan 2012; Magnini et al. 2014) but without version details. |
| Experiment Setup | No | No specific experimental setup details such as hyperparameters (e.g., learning rate, batch size, number of epochs) or optimizer settings are provided in the main text. |