Sketchy Moment Matching: Toward Fast and Provable Data Selection for Finetuning

Authors: Yijun Dong, Viet Hoang Phan, Xiang Pan, Qi Lei

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we concretize the variance-bias balance via synthetic experiments and demonstrate the effectiveness of Sk MM for finetuning in real vision tasks. [...] 4 Experiments
Researcher Affiliation Academia Yijun Dong Courant Institute New York University yd1319@nyu.edu Hoang Phan Center of Data Science New York University hvp2011@nyu.edu Xiang Pan Center of Data Science New York University xiangpan@nyu.edu Qi Lei Center of Data Science New York University ql518@nyu.edu
Pseudocode Yes Algorithm 3.1 Sketchy Moment Matching (Sk MM)
Open Source Code Yes Our experiment code for both the synthetic and real data is available at https://anonymous.4open.science/r/data_pruning.
Open Datasets Yes We further validate the effectiveness of Sk MM on UTKFace [76], a real-world regression dataset for age estimation. [...] Stanford Cars [77] [...] CIFAR-10. [...] We consider a set of N = 2000 samples with high-dimensional pre-trained representations ϕ(X) RN r, r = 2400, modeled by a Gaussian mixture model (GMM)
Dataset Splits Yes hyperparameter α tuning via grid search over 100 linearly spaced values in [10-2, 102] with 2-fold cross-validation.
Hardware Specification Yes All the experiments could be done with A40 or even smaller GPUs. We use 4 workers and 32 GB Memory.
Software Dependencies No The paper mentions software components like 'CLIP', 'Adam', and 'Res Net18' but does not specify their version numbers for reproducibility. For example, 'We finetune a randomly initialized classification head on top of the feature representation of CLIP [50] with Adam [75] and learning rate 10-1.' and 'For FT, we finetuning the last two layers of an Image Net-pretrained Res Net18 [84] with a learning rate of 10-2.'.
Experiment Setup Yes We finetune a randomly initialized classification head on top of the feature representation of CLIP [50] with Adam [75] and learning rate 10-1. [...] We optimize (5) via Adam [75] with constraint projection under learning rate 10-7 for 104 iterations and sample S from s N with the lowest objective value. [...] For LP, we learn the last layer over the embeddings from a CLIP-pretrained Vi T-B/32 [50] with a learning rate of 10-1. For FT, we finetuning the last two layers of an Image Net-pretrained Res Net18 [84] with a learning rate of 10-2. In both settings, we optimize via Adam for 50 epochs.