reproducibilityindex.ai

Measuring and Reducing Model Update Regression in Structured Prediction for NLP

Authors: Deng Cai, Elman Mansimov, Yi-An Lai, Yixuan Su, Lei Shu, Yi Zhang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We carry out a series of experiments to examine the severity of model update regression under various model update scenarios. Experiments show that BCR can better mitigate model update regression than model ensemble and knowledge distillation approaches.
Researcher Affiliation	Collaboration	Deng Cai The Chinese University of Hong Kong thisisjcykcd@gmail.com Elman Mansimov Amazon AWS AI Labs mansimov@amazon.com Yi-An Lai Amazon AWS AI Labs yianl@amazon.com Yixuan Su University of Cambridge ys484@cam.ac.uk Lei Shu Amazon AWS AI Labs leishu@amazon.com Yi Zhang Amazon AWS AI Labs yizhngn@amazon.com
Pseudocode	No	The paper describes its methods and approaches in natural language and mathematical formulas, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	We will release our code upon acceptance.
Open Datasets	Yes	We use the English EWT treebank from the Universal Dependency (UD2.2) Treebanks.2 We use the TOP dataset (Gupta et al., 2018) for our experiments.
Dataset Splits	Yes	We adopt the standard training/dev/test splits and use the universal POS tags (Petrov et al., 2012) provided in the treebank.
Hardware Specification	Yes	With the same inference hardware (one Nvidia V100 GPU) and the same batch size of 32, the decoding and re-ranking speeds of deepbiaf are 171 and 244 sentences per second, and 64 and 221 sentences per second for stackptr.
Software Dependencies	No	The paper mentions using NeuroNLP2, Fairseq, and Hugging Face for implementing models, but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	For BCR, various decoding methods are explored for candidate generation. Specifically, we use k-best spanning trees algorithm... and beam search... We also explore sampling-based decoding methods such as top-k sampling (k ∈ {5, 10, 50, 100}), top-p sampling (p ∈ {0.95, 0.90, 0.85, 0.80}), and dropout-p sampling (p ∈ {0.1, 0.2, 0.3, 0.4}). The number of candidates in BCR is set to 10.