reproducibilityindex.ai

Transformers from an Optimization Perspective

Authors: Yongyi Yang, zengfeng Huang, David P Wipf

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To this end, we implement a Transformer model as described previously, up to known limitations like symmetric weights. We apply this model to two benchmarks, IMDB [30] and SST2 [40], which are both commonly-used sentiment classification datasets that rely on Glove-840b-300d [33] as the word embedding. Figures 5 and 6 display the energy of the output of each layer of a Transformer (as defined in (8)) averaged over 200 randomly chosen samples in the test set.
Researcher Affiliation	Collaboration	Yongyi Yang University of Michigan yongyi@umich.edu Zengfeng Huang Fudan University huangzf@fudan.edu.cn David Wipf Amazon Web Services davidwipf@gmail.com Work completed during an internship at the AWS Shanghai AI Lab.
Pseudocode	Yes	Algorithm 1 For the t-th iteration, execute u(t) = y(t) α1 f(y(t)); y(t+1) = u(t) α2 g(u(t)).
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets	Yes	We apply this model to two benchmarks, IMDB [30] and SST2 [40], which are both commonly-used sentiment classification datasets that rely on Glove-840b-300d [33] as the word embedding.
Dataset Splits	No	The paper mentions using IMDB and SST2 datasets and evaluating on "200 randomly chosen samples in the test set", but does not explicitly provide specific training/validation/test split percentages, sample counts, or explicit instructions on how to reproduce the data partitioning.
Hardware Specification	No	The paper states under its 'Questions for Paper Analysis' section that information regarding the total amount of compute and type of resources used is '[N/A]'.
Software Dependencies	No	The paper does not explicitly list any software dependencies with specific version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Figure 5 uses randomly initialized weights while Figure 6 involves weights trained for 2000 steps with SGD and learning rate 0.01.