reproducibilityindex.ai

IDEAL: Influence-Driven Selective Annotations Empower In-Context Learners in Large Language Models

Authors: Shaokun Zhang, Xiaobo Xia, Zhaoqing Wang, Ling-Hao Chen, Jiale Liu, Qingyun Wu, Tongliang Liu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments confirm the superiority of the proposed method on various benchmarks, achieving better performance under lower time consumption during subset selection.
Researcher Affiliation	Academia	Shaokun Zhang1 Xiaobo Xia2 Zhaoqing Wang2 Ling-Hao Chen3 Jiale Liu4 Qingyun Wu1 Tongliang Liu2 1Pennsylvania State University 2The University of Sydney 3Tsinghua University 4Xidian University
Pseudocode	Yes	Algorithm 1: Subset influence quantification. Algorithm 2: Searching the subset with maximum influence.
Open Source Code	Yes	The project page is available at https://skzhang1.github.io/IDEAL/. Source codes have been attached for the reproducibility of results.
Open Datasets	Yes	Following previous work (Su et al., 2023), we employ 9 datasets for the evaluations, which can be categorized into 4 different tasks, including classification, multi-choice, dialogue, and generation. The details of the datasets are provided in Appendix D.1. For each dataset, the original train/dev/test split from the Transformer library (Wolf et al., 2019) is utilized.
Dataset Splits	Yes	For each dataset, the original train/dev/test split from the Transformer library (Wolf et al., 2019) is utilized.
Hardware Specification	Yes	We run all our experiments of GPT-J 6B and GPT-Neo 2.7B on a single NVIDIA Tesla V100 (32GB) GPU.
Software Dependencies	No	The paper mentions software like PyTorch (Paszke et al., 2019) and Huggingface transformer library (Wolf et al., 2019) but does not provide specific version numbers for these software components, which are required for reproducibility.
Experiment Setup	Yes	The annotation budget is set to 18 and 100 respectively following the same setting as Vote-k. [...] We construct the directed graph for all unlabeled data by connecting each vertex to its 10 nearest successors (k = 10). [...] When quantifying the influence of the subset, we run Algorithm 1 10 times and use the averaged influence value.