reproducibilityindex.ai

Evolving Attention with Residual Convolutions

Authors: Yujing Wang, Yaming Yang, Jiangang Bai, Mingliang Zhang, Jing Bai, Jing Yu, Ce Zhang, Gao Huang, Yunhai Tong

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments have demonstrated consistent improvement in various natural language and computer vision tasks.
Researcher Affiliation	Collaboration	1Peking University 2Microsoft Research 3Institute of Information Engineering, Chinese Academy of Sciences 4ETH Zurich 5Tsinghua University.
Pseudocode	No	The paper does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code	Yes	The code is available at https://github.com/pkuyym/Evolving Attention
Open Datasets	Yes	We choose GLUE benchmark (Wang et al., 2018) for an empirical study.
Dataset Splits	Yes	We leverage 10% training data to choose the hyper-parameters and perform evaluation on the development set.
Hardware Specification	Yes	All models are trained by 1.28 million training images for 100 epochs on 8 TESLA V100 GPUs.
Software Dependencies	No	The paper mentions using the Adam optimizer but does not specify version numbers for any software libraries, frameworks (e.g., TensorFlow, PyTorch), or programming languages used.
Experiment Setup	Yes	Major hyper-parameters are as follows: optimizer is SGD with momentum 0.9, batch size is 32 per worker, weight decay is 1e-4. For the ﬁrst 5 epochs, the learning rate is scaled linearly from 0 to 0.128, and then it is divided by 10 at epoch 30, 60, 80 and 90.