reproducibilityindex.ai

Prompt-Augmented Linear Probing: Scaling beyond the Limit of Few-Shot In-Context Learners

Authors: Hyunsoo Cho, Hyuhng Joon Kim, Junyeob Kim, Sang-Woo Lee, Sang-goo Lee, Kang Min Yoo, Taeuk Kim

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Throughout in-depth investigations on various datasets, we verified that PALP significantly enhances the input representations closing the gap between ICL in the data-hungry scenario and fine-tuning in the data-abundant scenario with little training overhead, potentially making PALP a strong alternative in a black-box scenario.We validate our method with various datasets, demonstrating that it is consistently superior to baselines in both the low-data and full-data settings. From empirical experiments, we observe that exploiting templates that provide hints about the target task or concatenating demonstrations can significantly enhance the extracted representations from PLM, improving the classifiers performance in various scenarios and reducing the gap between ICL and fine-tuning.
Researcher Affiliation	Collaboration	Hyunsoo Cho1, Hyuhng Joon Kim1, Junyeob Kim1, Sang-Woo Lee2, 3, Sang-goo Lee1, Kang Min Yoo1, 2, , Taeuk Kim4,* 1 Seoul National University 2 NAVER Cloud 3 KAIST 4 Hanyang University
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access to source code or explicitly state that source code for the methodology is available.
Open Datasets	Yes	Datasets. To investigate the performance of each method in many different scenarios, we select 15 datasets, as stipulated in Table 1. The selected dataset covers single sentence task to sentence pair tasks, binary to 150 classes (various numbers of class labels), in diverse domains. The detailed list of each dataset and references are covered in Appendix.
Dataset Splits	No	The paper provides 'Train' and 'Test' sample counts in Table 1 for the datasets used but does not explicitly detail the validation dataset splits, such as percentages, counts, or how they were derived for reproduction.
Hardware Specification	Yes	We optimized the hyper-parameters of each classification method on SST2 dataset with 4 Tesla V100 SXM2 32GB GPUs and universally utilized them in different settings.
Software Dependencies	No	The paper does not provide specific version numbers for ancillary software components or libraries (e.g., Python, PyTorch, TensorFlow versions) used in the experiments.
Experiment Setup	No	The paper states, 'We optimized the hyper-parameters of each classification method on SST2 dataset... (Detailed hyper-parameters and implementations are in the Appendix.)', indicating that specific setup details are not present in the main text.