Multi-Task Learning For Parsing The Alexa Meaning Representation Language

Authors: Vittorio Perera, Tagyoung Chung, Thomas Kollar, Emma Strubell

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed models, which use the linearized AMRL parse, multi-task learning, residual connections and embeddings from SLU, decrease the error rates in the prediction of the full AMRL parse by 3.56% absolute. Four metrics are considered to evaluate our models. Table 1 shows the results for different model architectures.
Researcher Affiliation Collaboration Vittorio Perera Carnegie Mellon University vdperera@cs.cmu.edu Tagyoung Chung Amazon Inc. tagyoung@amazon.com Thomas Kollar Amazon Inc. kollart@amazon.com Emma Strubell University of Massachusetts Amherst strubell@cs.umass.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information (e.g., a link or an explicit statement of code release) for the source code of the methodology described.
Open Datasets No The two datasets used in these experiments are (1) a large corpus collected for spoken language understanding (SLU) and (2) a smaller corpus of linearized AMRL parses. The SLU corpus is composed by a total of 2.8m unique sentences. The AMRL corpus is significantly smaller then the SLU corpus and contains only around 350k unique sentences in the linearized representation. The paper describes using these datasets but does not provide access information (link, DOI, or citation for public availability).
Dataset Splits Yes The development set and test set contain around 48k sentences annotated using AMRL.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Each model is trained until it fully converges on the training set, typically this takes around 60 epochs. We use a fixed learning rate of 0.0005 with an L2 penalty of 1e8, and a batch size of 128 sentences each. First we connect the output of each bi-LSTM layer to a drop-out component with retention rate of 80%. For these experiments, we used a decoder with a beam of size 3 and a minimum probability of 10 7.