Learning Data Manipulation for Augmentation and Weighting
Authors: Zhiting Hu, Bowen Tan, Russ R. Salakhutdinov, Tom M. Mitchell, Eric P. Xing
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show the resulting algorithms significantly improve the image and text classification performance in low data regime and class-imbalance problems. |
| Researcher Affiliation | Collaboration | Zhiting Hu1,2 , Bowen Tan1 , Ruslan Salakhutdinov1, Tom Mitchell1, Eric P. Xing1,2 1Carnegie Mellon University, 2Petuum Inc. {zhitingh,btan2,rsalakhu,tom.mitchell}@cs.cmu.edu, eric.xing@petuum.com |
| Pseudocode | Yes | Algorithm 1 Joint Learning of Model and Data Manipulation |
| Open Source Code | Yes | Code available at https://github.com/tanyuqian/learning-data-manipulation |
| Open Datasets | Yes | For text classification, we use the popular benchmark datasets, including SST-5 for 5-class sentence sentiment [45], IMDB for binary movie review sentiment [31], and TREC for 6-class question types [30]. For image classification, we similarly create a small subset of the CIFAR10 data... |
| Dataset Splits | Yes | We subsample a small training set on each task by randomly picking 40 instances for each class. We further create small validation sets, i.e., 2 instances per class for SST-5, and 5 instances per class for IMDB and TREC, respectively. For image classification, we similarly create a small subset of the CIFAR10 data, which includes 40 instances per class for training, and 2 instances per class for validation. |
| Hardware Specification | Yes | All experiments were implemented with Py Torch (pytorch.org) and were performed on a Linux machine with 4 GTX 1080Ti GPUs and 64GB RAM. |
| Software Dependencies | No | The paper mentions 'Py Torch (pytorch.org)' but does not provide specific version numbers for software dependencies. |
| Experiment Setup | Yes | For both the BERT classifier and the augmentation model (which is also based on BERT), we use Adam optimization with an initial learning rate of 4e-5. For Res Nets, we use SGD optimization with a learning rate of 1e-3. For text data augmentation, we augment each minibatch by generating two or three samples for each data points (each with 1, 2 or 3 substitutions), and use both the samples and the original data to train the model. ... we restrict the training to small number (e.g., 5 or 10) of epochs. |