Diversified Batch Selection for Training Acceleration
Authors: Feng Hong, Yueming Lyu, Jiangchao Yao, Ya Zhang, Ivor Tsang, Yanfeng Wang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments across various tasks demonstrate the significant superiority of Div BS in the performance-speedup trade-off. |
| Researcher Affiliation | Collaboration | 1Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China 2CFAR, Agency for Science, Technology and Research (A*STAR), Singapore 3IHPC, Agency for Science, Technology and Research (A*STAR), Singapore 4Shanghai AI Laboratory, Shanghai, China 5College of Computing and Data Science, NTU, Singapore. |
| Pseudocode | Yes | Algorithm 1 The greedy algorithm. |
| Open Source Code | Yes | The code is publicly available. |
| Open Datasets | Yes | Datasets. We conduct experiments to evaluate our Div BS on CIFAR-10 (Krizhevsky et al., 2009), CIFAR100 (Krizhevsky et al., 2009), and Tiny Image Net (Le & Yang, 2015) for image classification. |
| Dataset Splits | No | The paper mentions using standard datasets like CIFAR-10, CIFAR-100, and Tiny Image Net, but does not explicitly state the training/validation/test splits, nor does it refer to specific predefined validation splits with citations for these datasets. |
| Hardware Specification | No | The paper mentions general 'hardware techniques' in the discussion but does not provide specific details such as GPU models, CPU types, or other hardware specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using specific models (e.g., ResNet, DeepLabV3, MobileNet) and optimizers (SGD, AdamW), but does not list any software dependencies with specific version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Models are trained using SGD with momentum of 0.9 and weight decay of 0.005 as the optimizer. The initial learning rate is set to 0.1. We train the model for 200 epochs with the cosine learning-rate scheduling. |