Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning
Authors: Patrik Okanovic, Roger Waleffe, Vasilis Mageirakos, Konstantinos Nikolakakis, Amin Karbasi, Dionysios Kalogerias, Nezihe Merve Gürel, Theodoros Rekatsinas
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test RS2 against thirty-two state-of-the-art data pruning and distillation methods across four datasets including Image Net. Our results demonstrate that RS2 significantly reduces time-to-accuracy, particularly in practical regimes where accuracy, but not runtime, is similar to that of training on full dataset. |
| Researcher Affiliation | Collaboration | 1ETH Zürich 2University of Wisconsin-Madison 3Yale 4Google Research 5TU Delft |
| Pseudocode | Yes | Algorithm 1 RS2 General Algorithm |
| Open Source Code | Yes | Source code: https://github.com/PatrikOkanovic/RS2 |
| Open Datasets | Yes | We benchmark RS2 against baseline methods using CIFAR10 (Krizhevsky et al., 2009), CIFAR100 (Krizhevsky et al., 2009), Image Net30 (a subset of Image Net) (Hendrycks et al., 2019), and Image Net (Russakovsky et al., 2015) itself. |
| Dataset Splits | No | The paper uses standard public datasets like CIFAR10 and ImageNet, which have predefined splits, but it does not explicitly state the training, validation, or test split percentages or sample counts within the paper. It refers to a 'normal test set' but no full split breakdown. |
| Hardware Specification | Yes | We train all methods from scratch on NVIDIA 3090 GPUs and use all baselines which do not give GPU out-of-memory. For the experiments reported here, we run baseline as they were originally proposed (i.e., with static subset selection). This allows us to quantify the overhead of selecting a single subset with existing methods compared to repeatedly selecting many random subsets with RS2. We show the time-to-accuracy on CIFAR10 in Figure 3a and on Image Net in Figure 3b using r = 10% for both datasets. We run GPT2 experiments using AWS P3 GPU instances with eight NVIDIA V100 GPUs (as GPT2 experiments require more compute power). |
| Software Dependencies | No | The paper mentions using SGD as an optimizer and implies common deep learning frameworks but does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | For CIFAR10 and CIFAR100 experiments, we use SGD as the optimizer with batch size 128, initial learning rate 0.1, a cosine decay learning rate schedule (Loshchilov & Hutter, 2016), momentum 0.9, weight decay 0.0005, and 200 training epochs. |