Structured Prediction as Translation between Augmented Natural Languages

Authors: Giovanni Paolini, Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, RISHITA ANUBHAI, Cicero Nogueira dos Santos, Bing Xiang, Stefano Soatto

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks including joint entity and relation extraction, nested named entity recognition, relation classification, semantic role labeling, event extraction, coreference resolution, and dialogue state tracking. ... Our approach can match or outperform task-specific models on all tasks, and in particular, achieves new state-of-the-art results on joint entity and relation extraction (Co NLL04, ADE, NYT, and ACE2005 datasets), relation classification (Few Rel and TACRED), and semantic role labeling (Co NLL-2005 and Co NLL2012). ... 5 EXPERIMENTS In this section, we show that our TANL framework, with the augmented natural languages outlined in Section 4, can effectively solve the structured prediction tasks considered and exceeds the previous state of the art on multiple datasets.
Researcher Affiliation Industry Giovanni Paolini, Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, Rishita Anubhai, Cicero Nogueira dos Santos, Bing Xiang, Stefano Soatto Amazon Web Services {paoling,benathi,kronej,jieman,aachille,ranubhai,cicnog, bxiang,soattos}@amazon.com
Pseudocode No The paper describes steps for decoding structured objects but does not contain a formal 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes The code is available at https://github.com/amazon-research/tanl.
Open Datasets Yes Datasets. We experiment on the following datasets: Co NLL04 (Roth & Yih, 2004), ADE (Gurulingappa et al., 2012), NYT (Riedel et al., 2010), and ACE2005 (Walker et al., 2006).
Dataset Splits Yes The Co NLL04 dataset... we use the training (922 sentences), validation (231 sentences), and test (288 sentences) split by Gupta et al. (2016). ... The NYT dataset ... It consists of 56,195 sentences for training, 5,000 for validation, and 5,000 for testing. ... The English Onto Notes dataset... consists of 59,924 sentences for training, 8,528 for validation, and 8,262 for testing.
Hardware Specification Yes We use: 8 V100 GPUs with a batch size of 8 per GPU;
Software Dependencies No The paper mentions 'a pre-trained T5-base model (Raffel et al., 2019)' and 'the implementation of Hugging Face s Transformers library (Wolf et al., 2019)' but does not provide specific version numbers for these software components or any other libraries.
Experiment Setup Yes To keep our framework as simple as possible, hyperparameters are the same across the majority of our experiments. We use: 8 V100 GPUs with a batch size of 8 per GPU; the Adam W optimizer (Kingma & Ba, 2015; Loshchilov & Hutter, 2019); linear learning rate decay starting from 0.0005; maximum input/output sequence length equal to 256 tokens at training time (longer sequences are truncated), except for relation classification, coreference resolution, and dialogue state tracking (see below). The number of fine-tuning epochs is adjusted depending on the size of the dataset, as described later.