Gradient-Based Inference for Networks with Output Constraints
Authors: Jay Yoon Lee, Sanket Vaibhav Mehta, Michael Wick, Jean-Baptiste Tristan, Jaime Carbonell4147-4154
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study the efficacy of GBI on three tasks with hard constraints: semantic role labeling, syntactic parsing, and sequence transduction. In each case, the algorithm not only satisfies constraints, but improves accuracy, even when the underlying network is stateof-the-art. |
| Researcher Affiliation | Collaboration | 1School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 2Oracle Labs, Burlington, MA |
| Pseudocode | Yes | Algorithm 1 Constrained inference for neural nets |
| Open Source Code | No | The paper mentions using external tools like 'Allen NLP' but does not provide any statement or link for the open-source code of the proposed Gradient-Based Inference (GBI) framework or its implementation. |
| Open Datasets | Yes | For data we use Onto Notes v5.0, which has ground-truth for both SRL and syntactic parsing (Pradhan et al. 2013). We transform the Wall Street Journal (WSJ) portion of the Penn Tree Bank (PTB) into shift-reduce commands... (Marcus et al. 1999). |
| Dataset Splits | Yes | We employ the traditional split of the data with section 22 for dev, section 23 for test, and remaining sections 01-21 for training. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions software components like 'Allen NLP' and 'ELMo embeddings', and architectures like 'GNMT', but does not provide specific version numbers for these or any other software dependencies required for reproducibility. |
| Experiment Setup | Yes | In total, we train five networks Net1-5 for this study... We train our two best baseline models (Net1,2) using a highly competitive seq2seq architecture for machine translation, GNMT (Wu et al. 2016)... And, to study a wider range of accuracies, we train a simpler architecture with different hyper parameters and obtain nets (Net3-5). For all models, we employ Glorot initialization, and basic attention (Bahdanau, Cho, and Bengio 2014). See Table 2 for a summary of the networks, hyper-parameters, and their performance. |