Swift Sampler: Efficient Learning of Sampler by 10 Parameters

Authors: Jiawei Yao, Chuming Li, Canran Xiao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on various tasks demonstrate that SS powered sampling can achieve obvious improvements (e.g., 1.5% on Image Net) and transfer among different neural networks.
Researcher Affiliation Academia 1 University of Washington 2 Shanghai Artifcial Intelligence Laboratory 3 The University of Sydney 4 Central South University jwyao@uw.edu, chli3951@uni.sydney.edu.au, xiaocanran@csu.edu.cn
Pseudocode Yes Algorithm 1 SS
Open Source Code Yes Project page: https://github.com/Alexander-Yao/Swift-Sampler.
Open Datasets Yes We apply SS to training neural networks with various sizes, including Res Net-18 and SE-Res Next101, with training data from different data sets including Image Net [Russakovsky et al., 2015], CIFAR10 and CIFAR100 [Krizhevsky et al., 2009].
Dataset Splits Yes For a target task, e.g., image classification, its training set and validation set are respectively denoted by Dt and Dv, and the parameters of the target model are denote by w. ... Specifically, the network with parameters w (τ) obtained from the inner loop is used for searching the sampler τ that has the best score P (Dv; w (τ)) on validation set Dv
Hardware Specification Yes We set the number of segments S as 4 in all cases and utilize 8 NVIDIA A100 GPUs to ensure efficient processing.
Software Dependencies No The paper mentions software components and optimizers like 'SGD with Nesterov' and 'L2 regularization' but does not provide specific version numbers for any libraries or frameworks (e.g., TensorFlow, PyTorch, or Python).
Experiment Setup Yes In all experiments, the optimization step Eo is fixed as 40, and the fine-tune epochs Ef are set to 5. We set the number of segments S as 4 in all cases... We set batch size as 128 and the L2 regularization as 1e-3. The training process lasts 80 epochs, and the learning rate is initialized as 0.1 and decays by time 0.1 at the 40-th and 80-th epoch. We adopt mini-batch SGD with Nesterov and set the momentum as 0.9.