IDEAL: Influence-Driven Selective Annotations Empower In-Context Learners in Large Language Models

Authors: Shaokun Zhang, Xiaobo Xia, Zhaoqing Wang, Ling-Hao Chen, Jiale Liu, Qingyun Wu, Tongliang Liu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments confirm the superiority of the proposed method on various benchmarks, achieving better performance under lower time consumption during subset selection.
Researcher Affiliation Academia Shaokun Zhang1 Xiaobo Xia2 Zhaoqing Wang2 Ling-Hao Chen3 Jiale Liu4 Qingyun Wu1 Tongliang Liu2 1Pennsylvania State University 2The University of Sydney 3Tsinghua University 4Xidian University
Pseudocode Yes Algorithm 1: Subset influence quantification. Algorithm 2: Searching the subset with maximum influence.
Open Source Code Yes The project page is available at https://skzhang1.github.io/IDEAL/. Source codes have been attached for the reproducibility of results.
Open Datasets Yes Following previous work (Su et al., 2023), we employ 9 datasets for the evaluations, which can be categorized into 4 different tasks, including classification, multi-choice, dialogue, and generation. The details of the datasets are provided in Appendix D.1. For each dataset, the original train/dev/test split from the Transformer library (Wolf et al., 2019) is utilized.
Dataset Splits Yes For each dataset, the original train/dev/test split from the Transformer library (Wolf et al., 2019) is utilized.
Hardware Specification Yes We run all our experiments of GPT-J 6B and GPT-Neo 2.7B on a single NVIDIA Tesla V100 (32GB) GPU.
Software Dependencies No The paper mentions software like PyTorch (Paszke et al., 2019) and Huggingface transformer library (Wolf et al., 2019) but does not provide specific version numbers for these software components, which are required for reproducibility.
Experiment Setup Yes The annotation budget is set to 18 and 100 respectively following the same setting as Vote-k. [...] We construct the directed graph for all unlabeled data by connecting each vertex to its 10 nearest successors (k = 10). [...] When quantifying the influence of the subset, we run Algorithm 1 10 times and use the averaged influence value.