reproducibilityindex.ai

A Bayesian Approach to Data Point Selection

Authors: XINNUO XU, Minyoung Kim, Royson Lee, Brais Martinez, Timothy Hospedales

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through controlled experiments in both the vision and language domains, we present the proof-of-concept. Additionally, we demonstrate that our method scales effectively to large language models and facilitates automated per-task optimization for instruction fine-tuning datasets.
Researcher Affiliation	Collaboration	Xinnuo Xu Microsoft Research Cambridge xinnuoxu@microsoft.com Minyoung Kim Samsung AI Center Cambridge, UK mikim21@gmail.com Royson Lee Samsung AI Center Cambridge, UK royson.lee@samsung.com Brais Martinez Samsung AI Center Cambridge, UK brais.mart@samsung.com Timothy Hospedales Samsung AI Center Cambridge, UK University of Edinburgh, UK t.hospedales@ed.ac.uk
Pseudocode	No	While the paper presents update equations (6) and (7) that describe an algorithmic process, these are not formatted as a distinct 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	The code for this paper is available at https://github.com/Xinnuo Xu/BADS.
Open Datasets	Yes	Following the setup in [41], we use the standard MNIST handwritten digit classification dataset [33] to create a class-imbalanced binary classification task. ... Our experiment utilizes the standard CIFAR 10-class classification dataset [30]. ... The English benchmark introduced in Web NLG 2020 [6]... We use the same IFT data as [57, 51] as our train set Dt, which is a mix of FLAN V2 [35], COT [54], DOLLY [10], and OPEN ASSISTANT 1 [31]. Following [57, 5], we focus on four downstream tasks: MMLU [24], which consists of multiple-choice questions across 57 sub-tasks, ARC-challenge/-easy [9], and Hella Swag [61].
Dataset Splits	Yes	A total of 5,000 images from classes 4 and 9 were selected as the train set Dt... A balanced meta set Dm is created by selecting another 25 examples from each of these two classes, ensuring no overlap between Dt and Dm. ... we first create a clean and balanced meta set Dm by randomly sampling 1000 examples from each class in the training data. ... create a single clean and balanced meta set Dm by randomly sampling 30 examples from the Web NLG 2020 validation set in each test domain. ... 5 examples were selected from each sub-task to create the meta set Dm for MMLU, totaling 285 examples. Additionally, following [5], for the other tasks, we randomly chose 25 examples from their validation set to create the respective meta sets.
Hardware Specification	Yes	The training is conducted on a single GPU... Training is performed on a single GPU... Training is on a single GPU... Training uses one A40 GPU... The offline scoring in Ask LLM-O takes around four hours in our setup on a single NVIDIA A40 GPU.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies such as deep learning frameworks (e.g., PyTorch, TensorFlow) or programming languages (e.g., Python).
Experiment Setup	Yes	The training is conducted on a single GPU, using SGD with a fixed learning rate of 1e-3 and a mini-batch size of 100, over a total of 15,000 steps. ... The learning rate for the weight network is 1e-3 and the target sparsity level β is 0.005. Other hyperparameters, including those in the baselines, are detailed in Table 3 (Appendix E).