Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Semantic Proto-Role Labeling
Authors: Adam Teichert, Adam Poliak, Benjamin Van Durme, Matthew Gormley
AAAI 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We achieve a proto-role micro-averaged F1 of 81.7 using gold syntax and explore joint and conditional models of proto-roles and categorical roles. [...] Our experiments use several datasets. [...] Table 2 shows that our SRL model performs well compared to published work on the English Co NLL-2009 task... |
| Researcher Affiliation | Academia | Adam Teichert Johns Hopkins University EMAIL Adam Poliak Johns Hopkins University EMAIL Benjamin Van Durme Johns Hopkins University EMAIL Matthew R. Gormley Carnegie Mellon University EMAIL |
| Pseudocode | No | The paper does not contain pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper mentions: "Our implementation uses the Pacaya library1. 1https://github.com/mgormley/pacaya" This refers to a third-party library used in their implementation, not the source code for the methodology described in the paper itself. |
| Open Datasets | Yes | Prop Bank adds semantic role labels to the syntactic annotations available on the Wall Street Journal (WSJ) portion of the Penn Treebank (Marcus, Marcinkiewicz, and Santorini 1993). [...] Ontonotes 5 (Weischedel et al. 2013; Bonial, Stowe, and Palmer 2013) [...] Co NLL09 is the English SRL data from the Co NLL2009 shared task (Hajiˇc et al. 2009; Surdeanu et al. 2008). |
| Dataset Splits | Yes | We used our evaluation objective (e.g. Labeled SPRL F1) on the dev data for early stopping. [...] With the exception of Co NLL09, we split the datasets on WSJ section boundaries as follows: train (0-18), dev (19-21), test (22-24). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory). |
| Software Dependencies | No | The paper mentions using "Pacaya library" and "Lib Linear" but does not specify their version numbers, which is required for reproducibility. |
| Experiment Setup | Yes | We train our models using stochastic gradient descent (SGD) with the Ada Grad adaptive learning rate and a composite mirror descent objective with ℓ2 regularization following Duchi, Hazan, and Singer (2011). [...] For each random configuration, hyper-parameters were independetly selected from the following ranges: ada Grad Eta [5e-4, 1.0], L2Lambda [1e-10, 10], feat Count Cutoff {1,2,3,4}, sgd Auto Select Lr {True, False}. Continuous parameters were sampled on a log scale and then rounded to 2 significant digits. |