Towards Accelerated Model Training via Bayesian Data Selection
Authors: Zhijie Deng, Peng Cui, Jun Zhu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive empirical studies on challenging benchmarks with considerable data noise and imbalance in the online batch selection scenario, and observe superior training efficiency over competitive baselines. Notably, on the challenging Web Vision benchmark, our method can achieve similar predictive performance with significantly fewer training iterations than leading data selection methods. |
| Researcher Affiliation | Collaboration | Zhijie Deng1 , Peng Cui2 , Jun Zhu2 1Qing Yuan Research Institute, Shanghai Jiao Tong University 2Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University, Beijing, 100084 China |
| Pseudocode | Yes | Algorithm 1 Bayesian data selection to accelerate the training of deterministic deep models. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We first empirically evaluate the proposed method on clean CIFAR-10/100 [24] and noisy CIFAR-10/100 with 10% symmetric label noise. |
| Dataset Splits | Yes | For a fair comparison with RHO-LOSS [31], only half of the training set is used for model training. |
| Hardware Specification | No | The paper does not specify the exact hardware components (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | Yes | For all experiments, Res Net18 models are trained from scratch using Py Torch 2.0.0. |
| Experiment Setup | Yes | We use the same optimizer (Adam W [27]) and hyperparameters (e.g., learning rate 0.001, weight decay of 0.01, nb = 32 and nb/n B = 0.1) as RHO-LOSS. Unless specified otherwise, we use Res Net18 (RN18) [14] as the deterministic model and specify the zero-shot predictor with CLIP-RN50. We select the trade-off coefficient α from {0.1, 0.2, 0.3, 0.4} and the number of effective data ne from {100, 200, 500, 1000}. In most cases, we set the prior precision τ0 to 1. |