Submodular Batch Selection for Training Deep Neural Networks

Authors: K J Joseph, Vamshi Teja R, Krishnakant Singh, Vineeth N Balasubramanian

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments on standard datasets show that the deep models trained using the proposed batch selection strategy provide better generalization than Stochastic Gradient Descent as well as a popular baseline sampling strategy across different learning rates, batch sizes, and distance metrics.
Researcher Affiliation Academia K J Joseph , Vamshi Teja R , Krishnakant Singh , Vineeth N Balasubramanian Indian Institute of Technology, Hyderabad {cs17m18p100001,ee15btech11023,cs15mtech11007,vineethnb}@iith.ac.in
Pseudocode Yes Algorithm 1 Algorithm GETMINIBATCH; Algorithm 2 Algorithm SUBMODULAR SGD
Open Source Code Yes Source code and supplementary material which includes additional results is available here: https://josephkj.in/projects/SMDL
Open Datasets Yes We study the performance on the standard image classification task (as used in related earlier efforts) with SVHN [Netzer et al., 2011], CIFAR-10 and CIFAR-100 [Krizhevsky and Hinton, 2009] datasets.
Dataset Splits No The paper describes using SVHN, CIFAR-10, and CIFAR-100 datasets and evaluates performance based on 'test accuracy' and 'test loss'. However, it does not explicitly mention or specify a separate validation dataset split (e.g., 'X% training, Y% validation, Z% test') for hyperparameter tuning or early stopping.
Hardware Specification No The paper does not provide specific details about the hardware used for the experiments, such as GPU models, CPU types, or memory specifications. It only mentions software like PyTorch.
Software Dependencies No The paper mentions 'Py Torch [Paszke et al., 2017]' as a tool used, but it does not specify any version numbers for PyTorch or any other software libraries or dependencies, which is required for reproducibility.
Experiment Setup Yes After a grid search and an empirical study, we use the following values for the co-efficients of the terms in the objective function: λ1 = 0.2, λ2 = 0.1, λ3 = 0.5, λ4 = 0.2. All the experiments are run for 100 epochs with a batch size of 50, a momentum parameter of 0.9 and weight decay of 0.0001. We use a refresh rate of 5 for all the experiments. The partition size (m in Algorithm 1 and 2) is set to 10.