Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Primal Dual Formulation For Deep Learning With Constraints

Authors: Yatin Nandwani, Abhishek Pathak, Mausam, Parag Singla

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experiment on the tasks of Semantic Role Labeling (SRL), Named Entity Recognition (NER) tagging, and ﬁne-grained entity typing and show that our constraints not only signiﬁcantly reduce the number of constraint violations, but can also result in state-of-the-art performance.
Researcher Affiliation	Academia	Yatin Nandwani, Abhishek Pathak, Mausam and Parag Singla Department of Computer Science and Engineering Indian Institute of Technology Delhi
Pseudocode	Yes	Algorithm 1 presents the pseudocode for our learning algorithm.
Open Source Code	Yes	We have made our all our code publicly available at: https://github.com/dair-iitd/dl-with-constraints for future research.
Open Datasets	Yes	We use English Ontonotes 5.0 dataset1 using the CONLL 2011/12 shared task format (Pradhan et al. [2012]) as the training data. 1http://cemantix.org/data/ontonotes.html We use the publicly available GMB4 dataset (Bos et al. [2017]) in our experiments. 4https://gmb.let.rug.nl/data.php We work with Typenet5 (Murty et al. [2017]), a publicly available dataset of hierarchical entity types for extremely ﬁne-grained entity typing. 5https://github.com/iesl/Type Net
Dataset Splits	Yes	We use the standard train/dev/test split and use the ofﬁcial Perl script to compute span based F1-scores. We randomly split it into 60/20/20 train/dev/test sets respectively. We use the original splits of 90%, 5% and 5% for training, validation and testing, respectively (Murty et al. [2018]).
Hardware Specification	No	The paper mentions "IIT Delhi HPC facility" for computational resources but does not provide specific hardware details such as GPU models, CPU specifications, or memory sizes used for the experiments.
Software Dependencies	No	The paper mentions "implemented in https://allennlp.org/models#semantic-role-labeling" and refers to "software environments" in the supplement but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup	No	The paper states: "The speciﬁc details of software environments and hyperparameters are mentioned in the supplement." However, these details are not provided in the main text of the paper.