Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Proxyformer: Nyström-Based Linear Transformer with Trainable Proxy Tokens
Authors: Sangho Lee, Hayun Lee, Dongkun Shin
AAAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments To compare the performance of Proxyformer with those of other efficient transformer models, we carried out experiments using the Long Range Arena (LRA) benchmark (Tay et al. 2021). |
| Researcher Affiliation | Academia | Sangho Lee, Hayun Lee, Dongkun Shin* Sungkyunkwan University EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Nystr omformer attention |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | To compare the performance of Proxyformer with those of other efficient transformer models, we carried out experiments using the Long Range Arena (LRA) benchmark (Tay et al. 2021). |
| Dataset Splits | No | The paper uses the Long Range Arena (LRA) benchmark but does not explicitly provide the specific training, validation, or test dataset splits or a citation that clearly defines them. |
| Hardware Specification | Yes | We recorded the memory usage per sequence and throughput on a single NVIDIA GeForce RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions "Nystr omformer s LRA Py Torch implementation" but does not specify version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We used Nystr omformer s LRA Py Torch implementation, which employs a two-layer transformer model with 64 embedding dimensions, 128 feed-forward dimensions, and two attention heads. To ensure similar computational complexity across all variants, we set the projection dimension (e.g., # of proxy tokens and # of landmarks) to 128 for all projection-based variants. For Reformer and Bigbird, we used 2-hashing functions and a block size of 64, respectively. The temperature parameter for contrastive loss and dropout probability were set to 0.07 and 0.1, respectively. |