KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training
Authors: Truong Thao Nguyen, Balazs Gerofi, Edgar Josafat Martinez-Noriega, François Trahay, Mohamed Wahib
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on various large-scale datasets and models used directly in image classification and segmentation show that while the withreplacement importance sampling algorithm performs poorly on large datasets, our method can reduce total training time by up to 22% impacting accuracy only by 0.4% compared to the baseline. |
| Researcher Affiliation | Collaboration | 1 National Institute of Advanced Industrial Science and Technology (AIST), Japan 2 Intel Corporation, USA 3 Télécom Sud Paris, Institut Polytechnique de Paris, France 4 RIKEN Center for Computational Science, Japan |
| Pseudocode | No | The paper describes the workflow of KAKURENBO in Section 3 and Figure 1, but it does not include a formally structured pseudocode or algorithm block labeled as such. |
| Open Source Code | Yes | Code available at https://github.com/ Truong Thao Nguyen/kakurenbo |
| Open Datasets | Yes | We use Resnet50 [36] and Efficient Net [37] on Image Net-1K [19], and Deep CAM [4], a scientific image segmentation model with its accompanying dataset. To confirm the correctness of the baseline algorithms we also use Wide Res Net-28-10 on the CIFAR-100 dataset. ... CIFAR-100 dataset is available at https://www.cs.toronto.edu/ kriz/cifar.html. |
| Dataset Splits | Yes | We also test the trained model on the validation set of 50,000 samples. |
| Hardware Specification | Yes | We run our experiments on a supercomputer with 1000s of compute nodes, each equipped with 2 Intel Xeon Gold 6148 CPUs, 384 Gi B of RAM, 4 NVidia V100 GPUs, and Infiniband EDR NICs (100Gbps 2). |
| Software Dependencies | Yes | We train Res NET-50 and Efficient Net-b3 provided by torchvision v0.12.0 on Image Net-1K dataset. ... implemented by timm |
| Experiment Setup | Yes | Table 8: Hyper-parameters used for different training in the paper and the baseline top-1 testing accuracy. ... Specifically, We follow the guideline of Torch Vision to train the Res Net-50 that uses the Cosine LR learning rate scheduler 4, auto augments, and random erasing, etc [38]. Specifically, we use the base learning rate of 0.025 k, momentum 0.9, and weight decay 0.0005. |