Learning to Auto Weight: Entirely Data-Driven and Highly Efficient Weighting Framework

Authors: Zhenmao Li, Yichao Wu, Ken Chen, Yudong Wu, Shunfeng Zhou, Jiaheng Liu, Jiaheng Liu, Junjie Yan4788-4795

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate the superiority of weighting policy explored by LAW over standard training pipeline.
Researcher Affiliation Collaboration 1Sense Time, 2BUAA {lizhenmao, wuyichiao, wuyudong, zhoushunfeng, yanjunjie}@sensetime.com kenchen1024@gmail.com, liujiaheng@buaa.edu.cn
Pseudocode Yes Algorithm 1 LAW: Update the strategy model
Open Source Code No The paper does not provide any statement or link indicating the availability of open-source code for the described methodology.
Open Datasets Yes We demonstrate the effectiveness of LAW on image classification dataset CIFAR-10, CIFAR-100 (Krizhevsky and Hinton 2009), and Image Net (Deng et al. 2009).
Dataset Splits Yes CIFAR: CIFAR-10 and CIFAR-100 consist of 50,000 training and 10,000 validation color images of 32 32 resolution with 10 classes and 100 classes receptively. They are balanced datasets where each class holds the same number of images. To search the weighting strategy, we use a part of the training dataset like 20,000 for training and 5,000 for validation.
Hardware Specification No The paper mentions training 'in one GPU' but does not specify the model or any other hardware details.
Software Dependencies No The paper mentions optimizers like 'SGD' and 'Adam (Kingma and Ba 2014)' but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes All the networks are trained to convergence from scratch utilizing SGD optimizer with a batch-size of 128. The weight decay is set to 2e 5 and the momentum is set to 0.9. The initial learning rate is 0.1, and then the learning rate is divided by 10 when the stage is 10,13 and 16. The total number of training epochs is 200, thus the classification network would be harmed by biased datasets if no weighting strategy was applied.