Learning to Reweight with Deep Interactions

Authors: Yang Fan, Yingce Xia, Lijun Wu, Shufang Xie, Weiqing Liu, Jiang Bian, Tao Qin, Xiang-Yang Li7385-7393

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
Researcher Affiliation Collaboration Yang Fan1, Yingce Xia2, Lijun Wu2, Shufang Xie2, Weiqing Liu2, Jiang Bian2, Tao Qin2, Xiang-Yang Li1 1University of Science and Technology of China 2Microsoft Research Asia fyabc@mail.ustc.edu.cn, xiangyangli@ustc.edu.cn {yingce.xia, lijuwu, shufxi, Weiqing.Liu, Jiang.Bian, taoqin}@microsoft.com
Pseudocode Yes Algorithm 1: The gradients of the validation metric w.r.t. the parameters of the teacher. 1 Input: Teacher model backpropagation interval B; parameters and momentum of the student model θK and v K; learning rates {ηt}K 1 t=K B; momentum coefficient µ (> 0); minibatches of data {Dt}K 1 t=K B; 2 Initialization: dθ = θK M(Dvalid; θK); dv = ηKdθK; dω 0; θ θK; v v K; 3 for t K 1 : 1 : K B do 4 θ θ + ηtv; g θ w t ℓ(Dt; θ) + λR(θ) ; v v g 5 dω dω + ω(g dv); dθ dθ + θ(g dv); dv ηtdθ + µdv; 6 Return dω.
Open Source Code No The paper does not contain any explicit statement about releasing source code for their methodology, nor does it provide a link to a code repository.
Open Datasets Yes Experimental results on CIFAR-10 and CIFAR-100 (Krizhevsky, Hinton et al. 2009) with both clean labels and noisy labels demonstrate the effectiveness of our algorithm. We also conduct a group of experiments on IWSLT German English translation to demonstrate the effectiveness of our method on sequence generation tasks.
Dataset Splits Yes We split 5000 samples from the training dataset as Dvalid and the remaining 45000 samples are used as Dtrain.
Hardware Specification Yes All the models are trained on a single P40 GPU.
Software Dependencies No The paper mentions Py Torch (Paszke et al. 2019) and the fairseq official implementation, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We use momentum SGD with learning rate 0.1 and divide the learning rate by 10 at the 80-th and 120-th epoch. The momentum coefficient µ is 0.9. The K and B in Algorithm 1 are set as 20 and 2 respectively. We train the models for 300 epochs to ensure convergence. The minibatch size is 128.