Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning
Authors: Jiahui Gao, Renjie Pi, LIN Yong, Hang Xu, Jiacheng Ye, Zhiyong Wu, WEIZHONG ZHANG, Xiaodan Liang, Zhenguo Li, Lingpeng Kong
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our main contributions are threefold. First, we propose a novel end-to-end framework to construct high-quality noise-free synthetic dataset, without the aid of any human annotation ( 3). Second, we offer theoretical justification ( 4) and empirical verification ( 5.2) for SUNGEN s ability of recovering a noise-free dataset reliably with synthetic data only. Third, we conduct experiments on eight text classification datasets and show our method outperforms the current baseline by large margins ( 5.2). |
| Researcher Affiliation | Collaboration | 1The University of Hong Kong 2Hong Kong University of Science and Technology 3Huawei Noah s Ark Lab 4Shanghai AI Lab 5Fudan University 6Sun Yat-sen University |
| Pseudocode | Yes | Algorithm 1 Bilevel Robust Sample Reweighting |
| Open Source Code | Yes | Code is available at this link. |
| Open Datasets | Yes | We evaluate SUNGEN on eight text classification tasks, including IMDb (Maas et al., 2011), SST-2 (Socher et al., 2013), Rotten Tomatoes (Pang & Lee, 2005), Amazon (Mc Auley & Leskovec, 2013), Yelp (Zhang et al., 2015), Subj (Pang & Lee, 2004), AGNews (Zhang et al., 2015) and DBpedia (Zhang et al., 2015). |
| Dataset Splits | Yes | We draw a training dataset St syn := {(x, y)(i)}N i=1 and a validation dataset Sv syn := {(x, y)(j)}M j=1 from Ssyn. ... in the inner loop, 1,000k synthetic data are used as the training data; in the outer loop, 50k synthetic samples are randomly sampled as the training data for fast iteration. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments. |
| Software Dependencies | No | The paper mentions 'Hugging Face Transformers library(Wolf et al., 2019)' but does not specify a version number for it or other software dependencies. |
| Experiment Setup | Yes | During the optimization of sample weights, we use Adam optimizer. For selecting the appropriate value of the outer learning rate, we select from {2.5e-1, 1e-1, 1e-2} by looking at the value of RCE loss in the outer loop. ... The bilevel procedure is iterated for 50 times for each task. ... For LSTM, we use Adam optimizer(Kingma & Ba, 2015) with learning rate 1e-3. For Distil BERT-base, we finetune each dataset using Adam optimizer with learning rate 2e-5, and other default hyper-parameters as suggested by Hugging Face Transformers library(Wolf et al., 2019). We run LSTM for 5 epochs, and run Distil BERT-base for 3 epochs for prediction. |