Weighted Sampling without Replacement for Deep Top-$k$ Classification

Authors: Dieqiao Feng, Yuanqi Du, Carla P Gomes, Bart Selman

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results reveal that our method consistently outperforms all other methods on the top-k metric for noisy datasets, has more robustness on extreme testing scenarios, and achieves competitive results on training with limited data.
Researcher Affiliation Academia 1Department of Computer Science, Cornell University, Ithaca, U.S.. Correspondence to: Dieqiao Feng <dqfeng@cs.cornell.edu>, Yuanqi Du <yd392@cornell.edu>, Carla P. Gomes <gomes@cs.cornell.edu>, Bart Selman <selman@cs.cornell.edu>.
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about open-sourcing the code for the described methodology or links to a code repository.
Open Datasets Yes We conducted an empirical evaluation of our method on the CIFAR-100 dataset with label noise, as well as subsets of the Image Net-1K when training from scratch (Krizhevsky et al., 2009; Deng et al., 2009).
Dataset Splits Yes The Image Net-1K dataset is comprised of over 1.28 million training images and 50,000 validation images, organized into 1000 distinct categories... the unmodified validation dataset of 50,000 images is used for testing.
Hardware Specification Yes All experiments were performed using Nvidia V100 GPUs. ... All experiments were performed using Nvidia A100 GPUs.
Software Dependencies No The paper mentions using ResNet-18 and stochastic gradient descent but does not specify version numbers for any software libraries or frameworks (e.g., PyTorch, TensorFlow, CUDA).
Experiment Setup Yes We used stochastic gradient descent with momentum 0.9 and weight decay 0.001 as the optimizer. We trained the networks for 200 total epochs with batch size 128. The initial learning rate was set to 0.1 and decayed by 0.2 after epoch 60, 120, and 160. In addition, we also added 1 warm-up epoch at the beginning of training to stabilize the initial training.