Sketchy Moment Matching: Toward Fast and Provable Data Selection for Finetuning
Authors: Yijun Dong, Viet Hoang Phan, Xiang Pan, Qi Lei
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we concretize the variance-bias balance via synthetic experiments and demonstrate the effectiveness of Sk MM for finetuning in real vision tasks. [...] 4 Experiments |
| Researcher Affiliation | Academia | Yijun Dong Courant Institute New York University yd1319@nyu.edu Hoang Phan Center of Data Science New York University hvp2011@nyu.edu Xiang Pan Center of Data Science New York University xiangpan@nyu.edu Qi Lei Center of Data Science New York University ql518@nyu.edu |
| Pseudocode | Yes | Algorithm 3.1 Sketchy Moment Matching (Sk MM) |
| Open Source Code | Yes | Our experiment code for both the synthetic and real data is available at https://anonymous.4open.science/r/data_pruning. |
| Open Datasets | Yes | We further validate the effectiveness of Sk MM on UTKFace [76], a real-world regression dataset for age estimation. [...] Stanford Cars [77] [...] CIFAR-10. [...] We consider a set of N = 2000 samples with high-dimensional pre-trained representations ϕ(X) RN r, r = 2400, modeled by a Gaussian mixture model (GMM) |
| Dataset Splits | Yes | hyperparameter α tuning via grid search over 100 linearly spaced values in [10-2, 102] with 2-fold cross-validation. |
| Hardware Specification | Yes | All the experiments could be done with A40 or even smaller GPUs. We use 4 workers and 32 GB Memory. |
| Software Dependencies | No | The paper mentions software components like 'CLIP', 'Adam', and 'Res Net18' but does not specify their version numbers for reproducibility. For example, 'We finetune a randomly initialized classification head on top of the feature representation of CLIP [50] with Adam [75] and learning rate 10-1.' and 'For FT, we finetuning the last two layers of an Image Net-pretrained Res Net18 [84] with a learning rate of 10-2.'. |
| Experiment Setup | Yes | We finetune a randomly initialized classification head on top of the feature representation of CLIP [50] with Adam [75] and learning rate 10-1. [...] We optimize (5) via Adam [75] with constraint projection under learning rate 10-7 for 104 iterations and sample S from s N with the lowest objective value. [...] For LP, we learn the last layer over the embeddings from a CLIP-pretrained Vi T-B/32 [50] with a learning rate of 10-1. For FT, we finetuning the last two layers of an Image Net-pretrained Res Net18 [84] with a learning rate of 10-2. In both settings, we optimize via Adam for 50 epochs. |