Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A Primal Dual Formulation For Deep Learning With Constraints
Authors: Yatin Nandwani, Abhishek Pathak, Mausam, Parag Singla
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment on the tasks of Semantic Role Labeling (SRL), Named Entity Recognition (NER) tagging, and fine-grained entity typing and show that our constraints not only significantly reduce the number of constraint violations, but can also result in state-of-the-art performance. |
| Researcher Affiliation | Academia | Yatin Nandwani, Abhishek Pathak, Mausam and Parag Singla Department of Computer Science and Engineering Indian Institute of Technology Delhi |
| Pseudocode | Yes | Algorithm 1 presents the pseudocode for our learning algorithm. |
| Open Source Code | Yes | We have made our all our code publicly available at: https://github.com/dair-iitd/dl-with-constraints for future research. |
| Open Datasets | Yes | We use English Ontonotes 5.0 dataset1 using the CONLL 2011/12 shared task format (Pradhan et al. [2012]) as the training data. 1http://cemantix.org/data/ontonotes.html We use the publicly available GMB4 dataset (Bos et al. [2017]) in our experiments. 4https://gmb.let.rug.nl/data.php We work with Typenet5 (Murty et al. [2017]), a publicly available dataset of hierarchical entity types for extremely fine-grained entity typing. 5https://github.com/iesl/Type Net |
| Dataset Splits | Yes | We use the standard train/dev/test split and use the official Perl script to compute span based F1-scores. We randomly split it into 60/20/20 train/dev/test sets respectively. We use the original splits of 90%, 5% and 5% for training, validation and testing, respectively (Murty et al. [2018]). |
| Hardware Specification | No | The paper mentions "IIT Delhi HPC facility" for computational resources but does not provide specific hardware details such as GPU models, CPU specifications, or memory sizes used for the experiments. |
| Software Dependencies | No | The paper mentions "implemented in https://allennlp.org/models#semantic-role-labeling" and refers to "software environments" in the supplement but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | No | The paper states: "The specific details of software environments and hyperparameters are mentioned in the supplement." However, these details are not provided in the main text of the paper. |