$k$NN Prompting: Beyond-Context Learning with Calibration-Free Nearest Neighbor Inference
Authors: Benfeng Xu, Quan Wang, Zhendong Mao, Yajuan Lyu, Qiaoqiao She, Yongdong Zhang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experiments to demonstrate its two-fold superiority: 1) Calibration-Free: k NN Prompting does not directly align LLM output distribution with task-specific label space, instead leverages such distribution to align test and training instances. It significantly outperforms state-of-the-art calibration-based methods under comparable few-shot scenario. 2) Beyond-Context: k NN Prompting can further scale up effectively with as many training data as are available, continually bringing substantial improvements. |
| Researcher Affiliation | Collaboration | 1University of Science and Technology of China, Hefei, China 2Beijing University of Posts and Telecommunications, Beijing, China 3Baidu Inc., Beijing, China 4Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, China |
| Pseudocode | No | The paper describes the k NN Prompting framework in Section 3 with textual explanations and mathematical equations, but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is publicly available1. 1https://github.com/Benfeng Xu/KNNPrompting |
| Open Datasets | Yes | We use 10 established text classification datasets, respectively SST2 (Socher et al., 2013), SUBJ (Pang & Lee, 2004), MPQA (Wiebe et al., 2005), AGNews (Zhang et al., 2015), CB (De Marneffe et al., 2019), CR (Hu & Liu, 2004), DBPedia (Zhang et al., 2015), MR (Pang & Lee, 2005), RTE (Dagan et al., 2005) and TREC (Voorhees & Tice, 2000). |
| Dataset Splits | No | The paper describes using 'training data set T' which is split into 'demonstration set D' and 'anchor set A' for its kNN Prompting method, and discusses 'test instance xtest'. It also refers to 'Num. of Shots' (training data size). However, it does not explicitly state conventional training/validation/test dataset splits (e.g., 80/10/10 split) or mention a dedicated 'validation set' for hyperparameter tuning. The term 'validation' is not used in the context of data splits for reproducibility. |
| Hardware Specification | No | The paper mentions the use of various LLMs (e.g., GPT2, OPT series) with different parameter scales (0.8B to 30B), but it does not provide any specific hardware specifications such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'GPT2 tokenizer' and refers to models like 'GPT2' and 'OPT series', but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, specific library versions). |
| Experiment Setup | Yes | We invariantly set the number of neighbors k to 3. There are no other hyper-parameters as the entire framework is training-free. [...] We set learning rate to 1e-5, batch size to 16, and training steps to 125, 250 or 500, respectively for m {32, 64}, {128, 256}, {512, 1024}. For CB, AGNews and RTE, batch size is adjusted to 8, for DBPedia, batch size is adjusted to 4 to avoid OOM. |