Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning

Authors: Jiahui Gao, Renjie Pi, LIN Yong, Hang Xu, Jiacheng Ye, Zhiyong Wu, WEIZHONG ZHANG, Xiaodan Liang, Zhenguo Li, Lingpeng Kong

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our main contributions are threefold. First, we propose a novel end-to-end framework to construct high-quality noise-free synthetic dataset, without the aid of any human annotation ( 3). Second, we offer theoretical justification ( 4) and empirical verification ( 5.2) for SUNGEN s ability of recovering a noise-free dataset reliably with synthetic data only. Third, we conduct experiments on eight text classification datasets and show our method outperforms the current baseline by large margins ( 5.2).
Researcher Affiliation Collaboration 1The University of Hong Kong 2Hong Kong University of Science and Technology 3Huawei Noah s Ark Lab 4Shanghai AI Lab 5Fudan University 6Sun Yat-sen University
Pseudocode Yes Algorithm 1 Bilevel Robust Sample Reweighting
Open Source Code Yes Code is available at this link.
Open Datasets Yes We evaluate SUNGEN on eight text classification tasks, including IMDb (Maas et al., 2011), SST-2 (Socher et al., 2013), Rotten Tomatoes (Pang & Lee, 2005), Amazon (Mc Auley & Leskovec, 2013), Yelp (Zhang et al., 2015), Subj (Pang & Lee, 2004), AGNews (Zhang et al., 2015) and DBpedia (Zhang et al., 2015).
Dataset Splits Yes We draw a training dataset St syn := {(x, y)(i)}N i=1 and a validation dataset Sv syn := {(x, y)(j)}M j=1 from Ssyn. ... in the inner loop, 1,000k synthetic data are used as the training data; in the outer loop, 50k synthetic samples are randomly sampled as the training data for fast iteration.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments.
Software Dependencies No The paper mentions 'Hugging Face Transformers library(Wolf et al., 2019)' but does not specify a version number for it or other software dependencies.
Experiment Setup Yes During the optimization of sample weights, we use Adam optimizer. For selecting the appropriate value of the outer learning rate, we select from {2.5e-1, 1e-1, 1e-2} by looking at the value of RCE loss in the outer loop. ... The bilevel procedure is iterated for 50 times for each task. ... For LSTM, we use Adam optimizer(Kingma & Ba, 2015) with learning rate 1e-3. For Distil BERT-base, we finetune each dataset using Adam optimizer with learning rate 2e-5, and other default hyper-parameters as suggested by Hugging Face Transformers library(Wolf et al., 2019). We run LSTM for 5 epochs, and run Distil BERT-base for 3 epochs for prediction.