reproducibilityindex.ai

Swift Sampler: Efficient Learning of Sampler by 10 Parameters

Authors: Jiawei Yao, Chuming Li, Canran Xiao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments on various tasks demonstrate that SS powered sampling can achieve obvious improvements (e.g., 1.5% on Image Net) and transfer among different neural networks.
Researcher Affiliation	Academia	1 University of Washington 2 Shanghai Artifcial Intelligence Laboratory 3 The University of Sydney 4 Central South University jwyao@uw.edu, chli3951@uni.sydney.edu.au, xiaocanran@csu.edu.cn
Pseudocode	Yes	Algorithm 1 SS
Open Source Code	Yes	Project page: https://github.com/Alexander-Yao/Swift-Sampler.
Open Datasets	Yes	We apply SS to training neural networks with various sizes, including Res Net-18 and SE-Res Next101, with training data from different data sets including Image Net [Russakovsky et al., 2015], CIFAR10 and CIFAR100 [Krizhevsky et al., 2009].
Dataset Splits	Yes	For a target task, e.g., image classification, its training set and validation set are respectively denoted by Dt and Dv, and the parameters of the target model are denote by w. ... Specifically, the network with parameters w (τ) obtained from the inner loop is used for searching the sampler τ that has the best score P (Dv; w (τ)) on validation set Dv
Hardware Specification	Yes	We set the number of segments S as 4 in all cases and utilize 8 NVIDIA A100 GPUs to ensure efficient processing.
Software Dependencies	No	The paper mentions software components and optimizers like 'SGD with Nesterov' and 'L2 regularization' but does not provide specific version numbers for any libraries or frameworks (e.g., TensorFlow, PyTorch, or Python).
Experiment Setup	Yes	In all experiments, the optimization step Eo is fixed as 40, and the fine-tune epochs Ef are set to 5. We set the number of segments S as 4 in all cases... We set batch size as 128 and the L2 regularization as 1e-3. The training process lasts 80 epochs, and the learning rate is initialized as 0.1 and decays by time 0.1 at the 40-th and 80-th epoch. We adopt mini-batch SGD with Nesterov and set the momentum as 0.9.