Learning Better Structured Representations Using Low-rank Adaptive Label Smoothing

Authors: Asish Ghoshal, Xilun Chen, Sonal Gupta, Luke Zettlemoyer, Yashar Mehdad

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate LORAS on three semantic parsing data sets, and a semantic parsing based question-answering data set, using various pre-trained representations like Ro BERTa Liu et al. (2019) and BART Lewis et al. (2019). On ATIS (Price, 1990) and SNIPS (Coucke et al., 2018), LORAS achieves average absolute improvement of 0.6% and 0.9% respectively in exact match of logical form over vanilla label smoothing across different pre-trained representations.
Researcher Affiliation Industry Asish Ghoshal, Xilun Chen, Sonal Gupta, Luke Zettlemoyer & Yashar Mehdad {aghoshal,xilun,sonalgupta,lsz,mehdad}@fb.com Facebook AI
Pseudocode No The paper describes the LORAS formulation mathematically but does not include a pseudocode or algorithm block.
Open Source Code No The paper does not provide any information about releasing open-source code for the described methodology.
Open Datasets Yes We evaluate LORAS on three semantic parsing data sets: ATIS (Price, 1990), SNIPS (Coucke et al., 2018), and TOPv2, (Chen et al., 2020), and a question answering data set: Overnight (Wang et al., 2015)
Dataset Splits Yes As was done in prior work (Wang et al. (2015), Damonte et al. (2019)) we randomly select 20% of training data from each domain as the validation set which is used for model selection and hyper parameter tuning.
Hardware Specification Yes All the models were trained on Nvidia Telsa GPUs with 16GB of RAM.
Software Dependencies No The paper mentions using 'Adam optimizer with default settings' but does not specify any software names with version numbers for reproducibility.
Experiment Setup Yes For vanilla label smoothing we experiment with α {0.1, 0.2, 0.3} and report the best accuracy. ... For LORAS, we experiment with α {0.1, 0.2, 0.3}, a few different rank and dropout parameters, set η = 0.1 for all the experiments... For BART, LORAS dropout parameter of 0.5 worked best, while for Ro BERTa dropout of 0.6 worked best. For all three data sets a rank parameter of 25 worked best for LORAS. ... We used Adam optimizer with default settings, inverse square root learning rate schedule, and a batch size of 32 for all our experiments.