Manifold-Based Verbalizer Space Re-embedding for Tuning-Free Prompt-Based Classification
Authors: Haochun Wang, Sendong Zhao, Chi Liu, Nuwa Xi, MuZhen Cai, Bing Qin , Ting Liu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results indicate that even without tuning any parameters, our LLE-INC is on par with automated verbalizers with parameter tuning. And with the parameter updating, our approach further enhances prompt-based tuning by up to 3.2%. Furthermore, experiments with the LLa MA-7B, 13B and 65B indicate that LLE-INC is an efficient tuning-free classification approach for the hyperscale language models. |
| Researcher Affiliation | Academia | Haochun Wang, Sendong Zhao*, Chi Liu, Nuwa Xi, Muzhen Cai, Bing Qin, Ting Liu Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, China {hcwang, sdzhao}@ir.hit.edu.cn |
| Pseudocode | No | The paper describes the method in steps with mathematical formulas but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release our code at https://github.com/SCIR-HI/LLE-INC. |
| Open Datasets | Yes | We conduct experiments to demonstrate the effectiveness of our approach with 10 classification datasets (including 4 multi-class datasets) in both English and Chinese from GLUE (Wang et al. 2018), CLUE (Xu et al. 2020) and CBLUE (Zhang et al. 2022a) benchmarks. The datasets include 6 datasets from GLUE: SST-2 (Socher et al. 2013), MRPC (Dolan and Brockett 2005), QQP, QNLI (Rajpurkar et al. 2016) and RTE (Dagan, Glickman, and Magnini 2006), 3 datasets from CBLUE: CHIP-CTC (Zhang et al. 2022a), c Med TC (Zhang et al. 2020) and KUAKE-QIC (Zhang et al. 2022a) and 1 dataset from CLUE: Tnews (Xu et al. 2020). |
| Dataset Splits | Yes | For the GLUE benchmark, we follow Gao et al. (Gao, Fisch, and Chen 2021) to use the original development sets as test sets and randomly select 16 instances for each class from the training set using 5 random seeds in few-shot scenarios and test with the full-size test set. |
| Hardware Specification | No | The paper discusses the computational resources generally required for LLMs and uses specific LLM models (LLaMA-7B, 13B, 65B) but does not specify the hardware (e.g., GPU models, CPU types, memory) used for their own experiments. |
| Software Dependencies | No | The paper mentions using Huggingface Transformers, PyTorch, and Open Prompt toolkit but does not specify their version numbers for reproducibility. |
| Experiment Setup | No | Implementation details including prompt settings and experiment settings are in Appendix . |