A Bayesian Approach to Data Point Selection

Authors: XINNUO XU, Minyoung Kim, Royson Lee, Brais Martinez, Timothy Hospedales

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through controlled experiments in both the vision and language domains, we present the proof-of-concept. Additionally, we demonstrate that our method scales effectively to large language models and facilitates automated per-task optimization for instruction fine-tuning datasets.
Researcher Affiliation Collaboration Xinnuo Xu Microsoft Research Cambridge xinnuoxu@microsoft.com Minyoung Kim Samsung AI Center Cambridge, UK mikim21@gmail.com Royson Lee Samsung AI Center Cambridge, UK royson.lee@samsung.com Brais Martinez Samsung AI Center Cambridge, UK brais.mart@samsung.com Timothy Hospedales Samsung AI Center Cambridge, UK University of Edinburgh, UK t.hospedales@ed.ac.uk
Pseudocode No While the paper presents update equations (6) and (7) that describe an algorithmic process, these are not formatted as a distinct 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes The code for this paper is available at https://github.com/Xinnuo Xu/BADS.
Open Datasets Yes Following the setup in [41], we use the standard MNIST handwritten digit classification dataset [33] to create a class-imbalanced binary classification task. ... Our experiment utilizes the standard CIFAR 10-class classification dataset [30]. ... The English benchmark introduced in Web NLG 2020 [6]... We use the same IFT data as [57, 51] as our train set Dt, which is a mix of FLAN V2 [35], COT [54], DOLLY [10], and OPEN ASSISTANT 1 [31]. Following [57, 5], we focus on four downstream tasks: MMLU [24], which consists of multiple-choice questions across 57 sub-tasks, ARC-challenge/-easy [9], and Hella Swag [61].
Dataset Splits Yes A total of 5,000 images from classes 4 and 9 were selected as the train set Dt... A balanced meta set Dm is created by selecting another 25 examples from each of these two classes, ensuring no overlap between Dt and Dm. ... we first create a clean and balanced meta set Dm by randomly sampling 1000 examples from each class in the training data. ... create a single clean and balanced meta set Dm by randomly sampling 30 examples from the Web NLG 2020 validation set in each test domain. ... 5 examples were selected from each sub-task to create the meta set Dm for MMLU, totaling 285 examples. Additionally, following [5], for the other tasks, we randomly chose 25 examples from their validation set to create the respective meta sets.
Hardware Specification Yes The training is conducted on a single GPU... Training is performed on a single GPU... Training is on a single GPU... Training uses one A40 GPU... The offline scoring in Ask LLM-O takes around four hours in our setup on a single NVIDIA A40 GPU.
Software Dependencies No The paper does not provide specific version numbers for software dependencies such as deep learning frameworks (e.g., PyTorch, TensorFlow) or programming languages (e.g., Python).
Experiment Setup Yes The training is conducted on a single GPU, using SGD with a fixed learning rate of 1e-3 and a mini-batch size of 100, over a total of 15,000 steps. ... The learning rate for the weight network is 1e-3 and the target sparsity level β is 0.005. Other hyperparameters, including those in the baselines, are detailed in Table 3 (Appendix E).