Nonlinear Mixup: Out-Of-Manifold Data Augmentation for Text Classification

Authors: Hongyu Guo4044-4051

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on benchmark sentence classification datasets indicate that our approach significantly improves upon Mixup. Our empirical studies also show that the out-of-manifold samples generated by our strategy encourage training samples in each class to form a tight representation cluster that is far from others.
Researcher Affiliation Academia Hongyu Guo National Research Council Canada 1200 Montreal Road, Ottawa, ON., K1A 0R6 hongyu.guo@nrc-cnrc.gc.ca
Pseudocode No The paper describes the methods using equations and prose but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to the source code for the methodology described in this paper.
Open Datasets Yes TREC is a question dataset with the aim of categorizing a question into six question types (Li and Roth 2002). MR is a movie review dataset aiming to detect positive/negative reviews (Pang and Lee 2005). SST-1 is the Stanford Sentiment Treebank with five categories of very positive, positive, neural, negative and very negative (Socher et al. 2013). SST-2 dataset is the same as SST-1 but with neutral reviews removed and binary labels. Subj is a data set with the aim of classifying a sentence as being subjective or objective (Pang and Lee 2004).
Dataset Splits Yes For datasets without a standard development set we randomly select 10% of training data as development set.
Hardware Specification No The paper does not provide specific hardware details (such as GPU or CPU models, or memory) used for running its experiments.
Software Dependencies No The paper mentions "Adam" as the optimizer and "GloVe" for word embeddings, but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes Specifically, we use filter sizes of 3, 4, and 5, each with 100 feature maps; dropout rate of 0.5 and L2 regularization of 0.2 for the baseline CNN. For datasets without a standard development set we randomly select 10% of training data as development set. Training is done through Adam (Kingma and Ba 2014) over mini-batches of size 50. The pre-trained word embeddings are 300 dimensional Glo Ve (Pennington, Socher, and Manning 2014). For the nonlinear Mixup the mixing policy α is set to the default value of one. The dimension of the label embedding in the nonlinear Mixup is 100. For each dataset, we train each model 10 times each with 80k steps, and compute their mean test errors and standard deviations.