Training Naturalized Semantic Parsers with Very Little Data

Authors: Subendhu Rongali, Konstantine Arkoudas, Melanie Rubino, Wael Hamza

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that this method delivers new SOTA few-shot performance on the Overnight dataset, particularly in very low-resource settings, and very compelling few-shot results on a new semantic parsing dataset.
Researcher Affiliation Collaboration Subendhu Rongali1,2 , Konstantine Arkoudas2 , Melanie Rubino2 , Wael Hamza2 1University of Massachusetts Amherst 2Amazon Alexa AI, New York srongali@cs.umass.edu, {arkoudk, rubinome, waelhamz}@amazon.com
Pseudocode No No pseudocode or algorithm block is explicitly provided. Figure 1 is a diagram illustrating the joint training process, not pseudocode.
Open Source Code Yes Additional details, including the pizza canonicalization scheme, are provided in the appendix on our project page2, along with our data files. 2https://github.com/amazon-research/resource-constrained-naturalized-semantic-parsing
Open Datasets Yes Pizza is a recently introduced dataset consisting of English utterances that represent orders of pizzas and drinks. 1https://github.com/amazon-research/pizza-semantic-parsingdataset
Dataset Splits No The paper mentions using a 'dev set to choose example for low-resource training' for the Pizza dataset, but does not provide specific train/validation/test splits (percentages or counts) for reproducibility across all datasets used.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory, or specific cloud instances with specs) are mentioned for the experimental setup.
Software Dependencies No The paper mentions using 'BART-Large' and 'Adam optimizer', but does not provide specific version numbers for software libraries or frameworks (e.g., PyTorch, TensorFlow, specific Python version) used to implement or run the models.
Experiment Setup Yes We train all our models with sequence cross entropy loss using the Adam optimizer with β1 = 0.9, β2 = 0.98, ϵ = 1e 9 and the Noam LR scheduler with 500 warmup steps and a learning rate scale factor of 0.15. JT models are trained for 10 epochs, while base models are trained for 100 to 1000 epochs on the low-resource data. We fix the batch size to 512 tokens for all models. We use dropout of 0.1 and freeze the encoder token and position embeddings during training.