Learning Data Manipulation for Augmentation and Weighting

Authors: Zhiting Hu, Bowen Tan, Russ R. Salakhutdinov, Tom M. Mitchell, Eric P. Xing

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show the resulting algorithms significantly improve the image and text classification performance in low data regime and class-imbalance problems.
Researcher Affiliation Collaboration Zhiting Hu1,2 , Bowen Tan1 , Ruslan Salakhutdinov1, Tom Mitchell1, Eric P. Xing1,2 1Carnegie Mellon University, 2Petuum Inc. {zhitingh,btan2,rsalakhu,tom.mitchell}@cs.cmu.edu, eric.xing@petuum.com
Pseudocode Yes Algorithm 1 Joint Learning of Model and Data Manipulation
Open Source Code Yes Code available at https://github.com/tanyuqian/learning-data-manipulation
Open Datasets Yes For text classification, we use the popular benchmark datasets, including SST-5 for 5-class sentence sentiment [45], IMDB for binary movie review sentiment [31], and TREC for 6-class question types [30]. For image classification, we similarly create a small subset of the CIFAR10 data...
Dataset Splits Yes We subsample a small training set on each task by randomly picking 40 instances for each class. We further create small validation sets, i.e., 2 instances per class for SST-5, and 5 instances per class for IMDB and TREC, respectively. For image classification, we similarly create a small subset of the CIFAR10 data, which includes 40 instances per class for training, and 2 instances per class for validation.
Hardware Specification Yes All experiments were implemented with Py Torch (pytorch.org) and were performed on a Linux machine with 4 GTX 1080Ti GPUs and 64GB RAM.
Software Dependencies No The paper mentions 'Py Torch (pytorch.org)' but does not provide specific version numbers for software dependencies.
Experiment Setup Yes For both the BERT classifier and the augmentation model (which is also based on BERT), we use Adam optimization with an initial learning rate of 4e-5. For Res Nets, we use SGD optimization with a learning rate of 1e-3. For text data augmentation, we augment each minibatch by generating two or three samples for each data points (each with 1, 2 or 3 substitutions), and use both the samples and the original data to train the model. ... we restrict the training to small number (e.g., 5 or 10) of epochs.