Global Mixup: Eliminating Ambiguity with Clustering
Authors: Xiangjin Xie, Li Yangning, Wang Chen, Kai Ouyang, Zuotong Xie, Hai-Tao Zheng
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments for CNN, LSTM, and BERT on five tasks show that Global Mixup outperforms previous baselines. Further experiments also demonstrate the advantage of Global Mixup in low-resource scenarios. |
| Researcher Affiliation | Collaboration | 1Shenzhen International Graduate School, Tsinghua University 2Google Inc. 3Pengcheng Laboratory |
| Pseudocode | No | No pseudocode or clearly labeled algorithm block was found in the paper. |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the methodology described. |
| Open Datasets | Yes | We conduct experiments on five benchmark text classification tasks and table 1 summarizes the statistical characteristics of the five datasets: 1. YELP: (Yelp 2015), which is a subset of Yelp s businesses, reviews, and user data. 2. SUBJ: (Pang and Lee 2004), which aims to classify the sentences as subjectivity or objectivity. 3. TREC: (Li and Roth 2002), is a question dataset with the aim of categorizing a question into six question types. 4. SST-1: (Socher et al. 2013), is Stanford Sentiment Treebank, five categories of very positive, positive, neutral, negative, and very negative, Data comes from movie reviews and emotional annotations. 5. SST-2: (Socher et al. 2013), is the same as SST-1 but with neutral reviews removed and binary labels, Data comes from movie reviews and emotional annotations. |
| Dataset Splits | Yes | Data Split: We randomly select a subset of training data with N = {500, 2000, 5000} to investigate the performance in few-sample scenario of Global Mixup. Table 1: Summary for the datasets c: number of target labels. N: number of samples. V: valid set size. T: test set size. W means no standard valid split was provided. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or memory. |
| Software Dependencies | Yes | All models are implemented with Pytorch (Paszke et al. 2019) and Python 3.7. |
| Experiment Setup | Yes | For the λ Beta(α, α) parameters, we tune the α from {0.5, 1, 2, 4, 8}. And to demonstrate the effectiveness of Global Mixup on a larger space, we extend λ [ 0.3, 1.3] with uniform distribution. We set the number of samples generated per training sample pair T from{2, 4, 8, 16, 20, 32, 64} and the best performance is obtained when T = 8 is selected. The batch size is chosen from{32, 50, 64, 128, 256, 500}and the learning rate from{1e-3, 1e-4, 4e-4, 2e-5}. For the hyperparameter setting, we set θ from{1/c, 0.5, 0.6, 0.8, 0.9, 1}, c is the number of target labels. γ from {1, 2, 4, 6}, τ and η from {1/T, 1}, ϵ = 1e-5, δ = 1. For the reinforced selector, we use Adam optimizer (Kingma and Ba 2015) for CNN and LSTM, Adam W (Loshchilov and Hutter 2017) for BERT. |