Swift Sampler: Efficient Learning of Sampler by 10 Parameters
Authors: Jiawei Yao, Chuming Li, Canran Xiao
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on various tasks demonstrate that SS powered sampling can achieve obvious improvements (e.g., 1.5% on Image Net) and transfer among different neural networks. |
| Researcher Affiliation | Academia | 1 University of Washington 2 Shanghai Artifcial Intelligence Laboratory 3 The University of Sydney 4 Central South University jwyao@uw.edu, chli3951@uni.sydney.edu.au, xiaocanran@csu.edu.cn |
| Pseudocode | Yes | Algorithm 1 SS |
| Open Source Code | Yes | Project page: https://github.com/Alexander-Yao/Swift-Sampler. |
| Open Datasets | Yes | We apply SS to training neural networks with various sizes, including Res Net-18 and SE-Res Next101, with training data from different data sets including Image Net [Russakovsky et al., 2015], CIFAR10 and CIFAR100 [Krizhevsky et al., 2009]. |
| Dataset Splits | Yes | For a target task, e.g., image classification, its training set and validation set are respectively denoted by Dt and Dv, and the parameters of the target model are denote by w. ... Specifically, the network with parameters w (τ) obtained from the inner loop is used for searching the sampler τ that has the best score P (Dv; w (τ)) on validation set Dv |
| Hardware Specification | Yes | We set the number of segments S as 4 in all cases and utilize 8 NVIDIA A100 GPUs to ensure efficient processing. |
| Software Dependencies | No | The paper mentions software components and optimizers like 'SGD with Nesterov' and 'L2 regularization' but does not provide specific version numbers for any libraries or frameworks (e.g., TensorFlow, PyTorch, or Python). |
| Experiment Setup | Yes | In all experiments, the optimization step Eo is fixed as 40, and the fine-tune epochs Ef are set to 5. We set the number of segments S as 4 in all cases... We set batch size as 128 and the L2 regularization as 1e-3. The training process lasts 80 epochs, and the learning rate is initialized as 0.1 and decays by time 0.1 at the 40-th and 80-th epoch. We adopt mini-batch SGD with Nesterov and set the momentum as 0.9. |