reproducibilityindex.ai

Understanding How Encoder-Decoder Architectures Attend

Authors: Kyle Aitken, Vinay Ramasesh, Yuan Cao, Niru Maheswaranathan

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we investigate how encoder-decoder networks solve different sequence-to-sequence tasks. We introduce a way of decomposing hidden states over a sequence into temporal (independent of input) and inputdriven (independent of sequence position) components. This reveals how attention matrices are formed: depending on the task requirements, networks rely more heavily on either the temporal or input-driven components. These ﬁndings hold across both recurrent and feed-forward architectures despite their differences in forming the temporal components. Overall, our results provide new insight into the inner workings of attention-based encoder-decoder networks.
Researcher Affiliation	Collaboration	Kyle Aitken Department of Physics University of Washington Seattle, Washington, USA kaitken17@gmail.com Vinay V Ramasesh Google Research, Blueshift Team Mountain View, California, USA Yuan Cao Google, Inc. Mountain View, California, USA Niru Maheswaranathan Google Research, Brain Team Mountain View, California, USA
Pseudocode	No	The paper does not contain any sections, figures, or blocks explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code	No	The checklist section of the paper explicitly states: 'Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No]'
Open Datasets	Yes	We train the AED and AO architectures on this natural language task using a subset of the para_crawl dataset Bañón et al. (2020) consisting of over 30 million parallel sentences.
Dataset Splits	No	The checklist section of the paper explicitly states: 'Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [No]'. The paper mentions using a 'test set of size M' for estimating components, but does not provide details about train/validation/test splits, percentages, or counts.
Hardware Specification	No	The paper does not specify any particular GPU models, CPU types, or other hardware components used for running the experiments. The checklist section explicitly states: 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No]'
Software Dependencies	No	The paper mentions types of RNN cells (LSTMs, GRUs, UGRNNs) but does not provide specific software library names or version numbers (e.g., TensorFlow version, PyTorch version, Python version, specific solver versions) needed to reproduce the experiments.
Experiment Setup	No	The paper does not provide specific details about experimental setup, such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or optimizer settings. The checklist section explicitly states: 'Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [No]'