reproducibilityindex.ai

Learning to Reweight with Deep Interactions

Authors: Yang Fan, Yingce Xia, Lijun Wu, Shufang Xie, Weiqing Liu, Jiang Bian, Tao Qin, Xiang-Yang Li7385-7393

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on image classiﬁcation with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes signiﬁcant improvement over previous methods.
Researcher Affiliation	Collaboration	Yang Fan1, Yingce Xia2, Lijun Wu2, Shufang Xie2, Weiqing Liu2, Jiang Bian2, Tao Qin2, Xiang-Yang Li1 1University of Science and Technology of China 2Microsoft Research Asia fyabc@mail.ustc.edu.cn, xiangyangli@ustc.edu.cn {yingce.xia, lijuwu, shufxi, Weiqing.Liu, Jiang.Bian, taoqin}@microsoft.com
Pseudocode	Yes	Algorithm 1: The gradients of the validation metric w.r.t. the parameters of the teacher. 1 Input: Teacher model backpropagation interval B; parameters and momentum of the student model θK and v K; learning rates {ηt}K 1 t=K B; momentum coefﬁcient µ (> 0); minibatches of data {Dt}K 1 t=K B; 2 Initialization: dθ = θK M(Dvalid; θK); dv = ηKdθK; dω 0; θ θK; v v K; 3 for t K 1 : 1 : K B do 4 θ θ + ηtv; g θ w t ℓ(Dt; θ) + λR(θ) ; v v g 5 dω dω + ω(g dv); dθ dθ + θ(g dv); dv ηtdθ + µdv; 6 Return dω.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code for their methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	Experimental results on CIFAR-10 and CIFAR-100 (Krizhevsky, Hinton et al. 2009) with both clean labels and noisy labels demonstrate the effectiveness of our algorithm. We also conduct a group of experiments on IWSLT German English translation to demonstrate the effectiveness of our method on sequence generation tasks.
Dataset Splits	Yes	We split 5000 samples from the training dataset as Dvalid and the remaining 45000 samples are used as Dtrain.
Hardware Specification	Yes	All the models are trained on a single P40 GPU.
Software Dependencies	No	The paper mentions Py Torch (Paszke et al. 2019) and the fairseq official implementation, but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We use momentum SGD with learning rate 0.1 and divide the learning rate by 10 at the 80-th and 120-th epoch. The momentum coefﬁcient µ is 0.9. The K and B in Algorithm 1 are set as 20 and 2 respectively. We train the models for 300 epochs to ensure convergence. The minibatch size is 128.