Measuring and Reducing Model Update Regression in Structured Prediction for NLP
Authors: Deng Cai, Elman Mansimov, Yi-An Lai, Yixuan Su, Lei Shu, Yi Zhang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We carry out a series of experiments to examine the severity of model update regression under various model update scenarios. Experiments show that BCR can better mitigate model update regression than model ensemble and knowledge distillation approaches. |
| Researcher Affiliation | Collaboration | Deng Cai The Chinese University of Hong Kong thisisjcykcd@gmail.com Elman Mansimov Amazon AWS AI Labs mansimov@amazon.com Yi-An Lai Amazon AWS AI Labs yianl@amazon.com Yixuan Su University of Cambridge ys484@cam.ac.uk Lei Shu Amazon AWS AI Labs leishu@amazon.com Yi Zhang Amazon AWS AI Labs yizhngn@amazon.com |
| Pseudocode | No | The paper describes its methods and approaches in natural language and mathematical formulas, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | We will release our code upon acceptance. |
| Open Datasets | Yes | We use the English EWT treebank from the Universal Dependency (UD2.2) Treebanks.2 We use the TOP dataset (Gupta et al., 2018) for our experiments. |
| Dataset Splits | Yes | We adopt the standard training/dev/test splits and use the universal POS tags (Petrov et al., 2012) provided in the treebank. |
| Hardware Specification | Yes | With the same inference hardware (one Nvidia V100 GPU) and the same batch size of 32, the decoding and re-ranking speeds of deepbiaf are 171 and 244 sentences per second, and 64 and 221 sentences per second for stackptr. |
| Software Dependencies | No | The paper mentions using NeuroNLP2, Fairseq, and Hugging Face for implementing models, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | For BCR, various decoding methods are explored for candidate generation. Specifically, we use k-best spanning trees algorithm... and beam search... We also explore sampling-based decoding methods such as top-k sampling (k ∈ {5, 10, 50, 100}), top-p sampling (p ∈ {0.95, 0.90, 0.85, 0.80}), and dropout-p sampling (p ∈ {0.1, 0.2, 0.3, 0.4}). The number of candidates in BCR is set to 10. |