reproducibilityindex.ai

Multi-Task Self-Supervised Learning for Disfluency Detection

Authors: Shaolei Wang, Wangxiang Che, Qi Liu, Pengda Qin, Ting Liu, William Yang Wang9193-9200

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on the commonly used English Switchboard test set show that our approach can achieve competitive performance compared to the previous systems (trained using the full dataset) by using less than 1% (1000 sentences) of the training data. Our method trained on the full dataset signiﬁcantly outperforms previous methods, reducing the error by 21% on English Switchboard.
Researcher Affiliation	Academia	Shaolei Wang,1 Wanxiang Che,1 Qi Liu,2 Pengda Qin,3 Ting Liu,1 William Yang Wang4 1Center for Social Computing and Information Retrieval, Harbin Institute of Technology, China 2University of Oxford 3Beijing University of Posts and Telecommunications, China 4University of California, Santa Barbara, CA, USA
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper states: 'We directly use the code released by Wang et al. (2017).3' and provides a link: 'https://github.com/hitwsl/transition_disﬂuency'. This refers to the code for a baseline system, not the open-source code for the methodology proposed in this paper.
Open Datasets	Yes	English Switchboard (SWBD) (Godfrey, Holliman, and Mc Daniel 1992) is the standard and largest (1.73 105 sentences for training ) corpus used for disﬂuency detection. We use English Switchboard as main data.
Dataset Splits	Yes	Following the experiment settings in Charniak and Johnson (2001), we split the Switchboard corpus into train, dev and test set as follows: train data consists of all sw[23] .dff ﬁles, dev data consists of all sw4[5-9] .dff ﬁles and test data consists of all sw4[0-1] .dff ﬁles.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running the experiments. It only generally mentions the model parameters were 'limited by devices' in comparison with BERT.
Software Dependencies	No	The paper mentions using a 'transformer architecture', 'GELU activations (Hendrycks and Gimpel 2016)', and the 'Adam optimizer'. However, it does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup	Yes	In all experiments including the transformer-based baseline and our self-supervised method, we use a transformer architecture with 512 hidden units, 8 heads, 6 hidden layers, GELU activations (Hendrycks and Gimpel 2016), and a dropout of 0.1. We train our models with the Adam optimizer. For the joint tagging and sentence classiﬁcation objectives, we use streams of 128 tokens and a mini-batches of size 256. We use learning rate of 1e-4 and epoch of 30. When ﬁne-tuning on gold disﬂuency detection data, most model hyperparameters are the same as in pre-training, with the exception of the batch size, learning rate, and number of training epochs. We use batch size of 32, learning rate of 1e-5, and epoch of 20.