Submodular Batch Selection for Training Deep Neural Networks
Authors: K J Joseph, Vamshi Teja R, Krishnakant Singh, Vineeth N Balasubramanian
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments on standard datasets show that the deep models trained using the proposed batch selection strategy provide better generalization than Stochastic Gradient Descent as well as a popular baseline sampling strategy across different learning rates, batch sizes, and distance metrics. |
| Researcher Affiliation | Academia | K J Joseph , Vamshi Teja R , Krishnakant Singh , Vineeth N Balasubramanian Indian Institute of Technology, Hyderabad {cs17m18p100001,ee15btech11023,cs15mtech11007,vineethnb}@iith.ac.in |
| Pseudocode | Yes | Algorithm 1 Algorithm GETMINIBATCH; Algorithm 2 Algorithm SUBMODULAR SGD |
| Open Source Code | Yes | Source code and supplementary material which includes additional results is available here: https://josephkj.in/projects/SMDL |
| Open Datasets | Yes | We study the performance on the standard image classification task (as used in related earlier efforts) with SVHN [Netzer et al., 2011], CIFAR-10 and CIFAR-100 [Krizhevsky and Hinton, 2009] datasets. |
| Dataset Splits | No | The paper describes using SVHN, CIFAR-10, and CIFAR-100 datasets and evaluates performance based on 'test accuracy' and 'test loss'. However, it does not explicitly mention or specify a separate validation dataset split (e.g., 'X% training, Y% validation, Z% test') for hyperparameter tuning or early stopping. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for the experiments, such as GPU models, CPU types, or memory specifications. It only mentions software like PyTorch. |
| Software Dependencies | No | The paper mentions 'Py Torch [Paszke et al., 2017]' as a tool used, but it does not specify any version numbers for PyTorch or any other software libraries or dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | After a grid search and an empirical study, we use the following values for the co-efficients of the terms in the objective function: λ1 = 0.2, λ2 = 0.1, λ3 = 0.5, λ4 = 0.2. All the experiments are run for 100 epochs with a batch size of 50, a momentum parameter of 0.9 and weight decay of 0.0001. We use a refresh rate of 5 for all the experiments. The partition size (m in Algorithm 1 and 2) is set to 10. |