Generalized DataWeighting via Class-Level Gradient Manipulation
Authors: Can Chen, Shuhao Zheng, Xi Chen, Erqun Dong, Xue (Steve) Liu, Hao Liu, Dejing Dou
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments in various settings verify the effectiveness of GDW. |
| Researcher Affiliation | Collaboration | 1Mc Gill University, 2The Hong Kong University of Science and Technology, 3Baidu Research |
| Pseudocode | Yes | Algorithm 1 Generalized Data Weighting via Class-Level Gradients Manipulation |
| Open Source Code | Yes | Our code is available at https://github.com/GGchen1997/GDW-NIPS2021. |
| Open Datasets | Yes | We randomly select 100 clean images per class from CIFAR10 [47] as the meta set (1000 images in total). Similarly, we select a total of 1000 images from CIFAR100 as its meta set. ... Long-Tailed CIFAR [47] are created by reducing the number of training instances per class... |
| Dataset Splits | Yes | In most classification tasks, there is a training set Dtrain = {(xi, yi)}N i=1 and we assume there is also a clean unbiased meta set Dmeta = {(xv i , yv i )}M i=1. We aim to alleviate label noise and class imbalance in Dtrain with Dmeta. ... We randomly select 100 clean images per class from CIFAR10 [47] as the meta set (1000 images in total). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper mentions using specific models like Res Net-32 and Res Net-18, but does not provide version numbers for any software dependencies or libraries (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | Most of our training settings follow [23] and we use the cosine learning rate decay schedule [49] for a total of 80 epochs for all methods. See Appendix C for details. ... All methods are trained for 5 epochs via SGD with a 0.9 momentum, a 10 3 initial learning rate, a 10 3 weight decay, and a 128 batchsize. See Appendix E for details. |