reproducibilityindex.ai

Go Wider Instead of Deeper

Authors: Fuzhao Xue, Ziji Shi, Futao Wei, Yuxuan Lou, Yong Liu, Yang You8779-8787

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate our plug-and-run framework, we design Wide Net and conduct comprehensive experiments on popular computer vision and natural language processing benchmarks. On Image Net-1K, our best model outperforms Vision Transformer (Vi T) by 1.5% with 0.72 trainable parameters.
Researcher Affiliation	Academia	Department of Computer Science, National University of Singapore, Singapore {f.xue,ziji.shi}@u.nus.edu, weifutao2019@gmail.com, yuxuanlou@u.nus.edu, {liuyong,youy}@comp.nus.edu.sg
Pseudocode	No	No pseudocode or algorithm blocks are present in the paper.
Open Source Code	No	The paper does not provide a statement about releasing code or a link to a source code repository.
Open Datasets	Yes	We use ILSVRC-2012 Image Net (Deng et al. 2009) and Cifar10 as platforms to evaluate our framework.
Dataset Splits	No	While the paper mentions using a "development set" for GLUE, it does not provide specific split ratios or sample counts for training, validation, or test sets needed to reproduce the data partitioning. It also references following hyperparameters of baselines for fine-tuning, implying standard splits, but no explicit details are given.
Hardware Specification	Yes	We pretrain our models on 256 TPUv3 cores.
Software Dependencies	Yes	We ﬁrst reimplement Vi T by Tensorﬂow 2.x and tune it to a reasonable performance.
Experiment Setup	Yes	For Mo E based models (i.e., Vi T-Mo E and Wide Net), we set the weight of load balance loss λ as 0.01. Without special instructions, we use 4 experts in total and Top 2 experts selected in each transformer block. The capacity ratio C is set as 1.2 for a trade-off between accuracy and speed. ... The learning rate is 0.00176, which is the same as ALBERT claimed (You et al. 2019a). During ﬁnetuning, we still follow (Dosovitskiy et al. 2020) and use SGD optimizer with momentum. Compared with pretraining on Image Net-1K, label smoothing and warm-up are removed.