Learning to Reweight with Deep Interactions
Authors: Yang Fan, Yingce Xia, Lijun Wu, Shufang Xie, Weiqing Liu, Jiang Bian, Tao Qin, Xiang-Yang Li7385-7393
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods. |
| Researcher Affiliation | Collaboration | Yang Fan1, Yingce Xia2, Lijun Wu2, Shufang Xie2, Weiqing Liu2, Jiang Bian2, Tao Qin2, Xiang-Yang Li1 1University of Science and Technology of China 2Microsoft Research Asia fyabc@mail.ustc.edu.cn, xiangyangli@ustc.edu.cn {yingce.xia, lijuwu, shufxi, Weiqing.Liu, Jiang.Bian, taoqin}@microsoft.com |
| Pseudocode | Yes | Algorithm 1: The gradients of the validation metric w.r.t. the parameters of the teacher. 1 Input: Teacher model backpropagation interval B; parameters and momentum of the student model θK and v K; learning rates {ηt}K 1 t=K B; momentum coefficient µ (> 0); minibatches of data {Dt}K 1 t=K B; 2 Initialization: dθ = θK M(Dvalid; θK); dv = ηKdθK; dω 0; θ θK; v v K; 3 for t K 1 : 1 : K B do 4 θ θ + ηtv; g θ w t ℓ(Dt; θ) + λR(θ) ; v v g 5 dω dω + ω(g dv); dθ dθ + θ(g dv); dv ηtdθ + µdv; 6 Return dω. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for their methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Experimental results on CIFAR-10 and CIFAR-100 (Krizhevsky, Hinton et al. 2009) with both clean labels and noisy labels demonstrate the effectiveness of our algorithm. We also conduct a group of experiments on IWSLT German English translation to demonstrate the effectiveness of our method on sequence generation tasks. |
| Dataset Splits | Yes | We split 5000 samples from the training dataset as Dvalid and the remaining 45000 samples are used as Dtrain. |
| Hardware Specification | Yes | All the models are trained on a single P40 GPU. |
| Software Dependencies | No | The paper mentions Py Torch (Paszke et al. 2019) and the fairseq official implementation, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We use momentum SGD with learning rate 0.1 and divide the learning rate by 10 at the 80-th and 120-th epoch. The momentum coefficient µ is 0.9. The K and B in Algorithm 1 are set as 20 and 2 respectively. We train the models for 300 epochs to ensure convergence. The minibatch size is 128. |