Measuring and Reducing Model Update Regression in Structured Prediction for NLP

Authors: Deng Cai, Elman Mansimov, Yi-An Lai, Yixuan Su, Lei Shu, Yi Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We carry out a series of experiments to examine the severity of model update regression under various model update scenarios. Experiments show that BCR can better mitigate model update regression than model ensemble and knowledge distillation approaches.
Researcher Affiliation Collaboration Deng Cai The Chinese University of Hong Kong thisisjcykcd@gmail.com Elman Mansimov Amazon AWS AI Labs mansimov@amazon.com Yi-An Lai Amazon AWS AI Labs yianl@amazon.com Yixuan Su University of Cambridge ys484@cam.ac.uk Lei Shu Amazon AWS AI Labs leishu@amazon.com Yi Zhang Amazon AWS AI Labs yizhngn@amazon.com
Pseudocode No The paper describes its methods and approaches in natural language and mathematical formulas, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code No We will release our code upon acceptance.
Open Datasets Yes We use the English EWT treebank from the Universal Dependency (UD2.2) Treebanks.2 We use the TOP dataset (Gupta et al., 2018) for our experiments.
Dataset Splits Yes We adopt the standard training/dev/test splits and use the universal POS tags (Petrov et al., 2012) provided in the treebank.
Hardware Specification Yes With the same inference hardware (one Nvidia V100 GPU) and the same batch size of 32, the decoding and re-ranking speeds of deepbiaf are 171 and 244 sentences per second, and 64 and 221 sentences per second for stackptr.
Software Dependencies No The paper mentions using NeuroNLP2, Fairseq, and Hugging Face for implementing models, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes For BCR, various decoding methods are explored for candidate generation. Specifically, we use k-best spanning trees algorithm... and beam search... We also explore sampling-based decoding methods such as top-k sampling (k ∈ {5, 10, 50, 100}), top-p sampling (p ∈ {0.95, 0.90, 0.85, 0.80}), and dropout-p sampling (p ∈ {0.1, 0.2, 0.3, 0.4}). The number of candidates in BCR is set to 10.