Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Submodular Batch Selection for Training Deep Neural Networks
Authors: K J Joseph, Vamshi Teja R, Krishnakant Singh, Vineeth N Balasubramanian
IJCAI 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments on standard datasets show that the deep models trained using the proposed batch selection strategy provide better generalization than Stochastic Gradient Descent as well as a popular baseline sampling strategy across different learning rates, batch sizes, and distance metrics. |
| Researcher Affiliation | Academia | K J Joseph , Vamshi Teja R , Krishnakant Singh , Vineeth N Balasubramanian Indian Institute of Technology, Hyderabad EMAIL |
| Pseudocode | Yes | Algorithm 1 Algorithm GETMINIBATCH; Algorithm 2 Algorithm SUBMODULAR SGD |
| Open Source Code | Yes | Source code and supplementary material which includes additional results is available here: https://josephkj.in/projects/SMDL |
| Open Datasets | Yes | We study the performance on the standard image classification task (as used in related earlier efforts) with SVHN [Netzer et al., 2011], CIFAR-10 and CIFAR-100 [Krizhevsky and Hinton, 2009] datasets. |
| Dataset Splits | No | The paper describes using SVHN, CIFAR-10, and CIFAR-100 datasets and evaluates performance based on 'test accuracy' and 'test loss'. However, it does not explicitly mention or specify a separate validation dataset split (e.g., 'X% training, Y% validation, Z% test') for hyperparameter tuning or early stopping. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for the experiments, such as GPU models, CPU types, or memory specifications. It only mentions software like PyTorch. |
| Software Dependencies | No | The paper mentions 'Py Torch [Paszke et al., 2017]' as a tool used, but it does not specify any version numbers for PyTorch or any other software libraries or dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | After a grid search and an empirical study, we use the following values for the co-efficients of the terms in the objective function: λ1 = 0.2, λ2 = 0.1, λ3 = 0.5, λ4 = 0.2. All the experiments are run for 100 epochs with a batch size of 50, a momentum parameter of 0.9 and weight decay of 0.0001. We use a refresh rate of 5 for all the experiments. The partition size (m in Algorithm 1 and 2) is set to 10. |