Boosting Few-Shot Text Classification via Distribution Estimation
Authors: Han Liu, Feng Zhang, Xiaotong Zhang, Siyang Zhao, Fenglong Ma, Xiao-Ming Wu, Hongyang Chen, Hong Yu, Xianchao Zhang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on eight few-shot text classification datasets show that the proposed method outperforms state-of-the-art baselines significantly. |
| Researcher Affiliation | Academia | 1 Dalian University of Technology, Dalian, China 2 Peking University, Beijing, China 3 The Pennsylvania State University, Pennsylvania, USA 4 The Hong Kong Polytechnic University, Hong Kong, China 5 Zhejiang Lab, Hangzhou, China |
| Pseudocode | No | The paper describes algorithmic steps in prose and mathematical formulas but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available. |
| Open Datasets | Yes | We follow (Chen et al. 2022) to conduct experiments on eight text classification datasets, including four intent detection datasets: Banking77, HWU64, Clinic150, and Liu57, and four news or review classification datasets: Huff Post, Amazon, Reuters, and 20News. |
| Dataset Splits | Yes | All reported results are from 5 different runs, and in each run the training, validation and testing classes are randomly resampled. |
| Hardware Specification | No | The paper does not specify any particular hardware components (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'bert-base-uncased model' and 'Adam W' optimizer but does not specify version numbers for these or other critical software dependencies like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | We set R = 10 for the news or review classification task, while R = 4 for the intent detection task. For the loss function, we set λ = 0.1, and optimize the model parameters using Adam W (Loshchilov and Hutter 2019) with the initial learning rate 0.00001 and dropout rate 0.1. During distribution sampling, in 1-shot and 5-shot scenarios, we generate 20 and 100 samples per class respectively. |