Weighted Sampling without Replacement for Deep Top-$k$ Classification
Authors: Dieqiao Feng, Yuanqi Du, Carla P Gomes, Bart Selman
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results reveal that our method consistently outperforms all other methods on the top-k metric for noisy datasets, has more robustness on extreme testing scenarios, and achieves competitive results on training with limited data. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Cornell University, Ithaca, U.S.. Correspondence to: Dieqiao Feng <dqfeng@cs.cornell.edu>, Yuanqi Du <yd392@cornell.edu>, Carla P. Gomes <gomes@cs.cornell.edu>, Bart Selman <selman@cs.cornell.edu>. |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about open-sourcing the code for the described methodology or links to a code repository. |
| Open Datasets | Yes | We conducted an empirical evaluation of our method on the CIFAR-100 dataset with label noise, as well as subsets of the Image Net-1K when training from scratch (Krizhevsky et al., 2009; Deng et al., 2009). |
| Dataset Splits | Yes | The Image Net-1K dataset is comprised of over 1.28 million training images and 50,000 validation images, organized into 1000 distinct categories... the unmodified validation dataset of 50,000 images is used for testing. |
| Hardware Specification | Yes | All experiments were performed using Nvidia V100 GPUs. ... All experiments were performed using Nvidia A100 GPUs. |
| Software Dependencies | No | The paper mentions using ResNet-18 and stochastic gradient descent but does not specify version numbers for any software libraries or frameworks (e.g., PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | We used stochastic gradient descent with momentum 0.9 and weight decay 0.001 as the optimizer. We trained the networks for 200 total epochs with batch size 128. The initial learning rate was set to 0.1 and decayed by 0.2 after epoch 60, 120, and 160. In addition, we also added 1 warm-up epoch at the beginning of training to stabilize the initial training. |