reproducibilityindex.ai

Proxyformer: Nyström-Based Linear Transformer with Trainable Proxy Tokens

Authors: Sangho Lee, Hayun Lee, Dongkun Shin

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments To compare the performance of Proxyformer with those of other efﬁcient transformer models, we carried out experiments using the Long Range Arena (LRA) benchmark (Tay et al. 2021).
Researcher Affiliation	Academia	Sangho Lee, Hayun Lee, Dongkun Shin* Sungkyunkwan University ilena7440@skku.edu, lhy920806@skku.edu, dongkun@skku.edu
Pseudocode	Yes	Algorithm 1 Nystr omformer attention
Open Source Code	No	The paper does not contain an explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	To compare the performance of Proxyformer with those of other efﬁcient transformer models, we carried out experiments using the Long Range Arena (LRA) benchmark (Tay et al. 2021).
Dataset Splits	No	The paper uses the Long Range Arena (LRA) benchmark but does not explicitly provide the specific training, validation, or test dataset splits or a citation that clearly defines them.
Hardware Specification	Yes	We recorded the memory usage per sequence and throughput on a single NVIDIA GeForce RTX 3090 GPU.
Software Dependencies	No	The paper mentions "Nystr omformer s LRA Py Torch implementation" but does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	We used Nystr omformer s LRA Py Torch implementation, which employs a two-layer transformer model with 64 embedding dimensions, 128 feed-forward dimensions, and two attention heads. To ensure similar computational complexity across all variants, we set the projection dimension (e.g., # of proxy tokens and # of landmarks) to 128 for all projection-based variants. For Reformer and Bigbird, we used 2-hashing functions and a block size of 64, respectively. The temperature parameter for contrastive loss and dropout probability were set to 0.07 and 0.1, respectively.