Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Catformer: Designing Stable Transformers via Sensitivity Analysis
Authors: Jared Q Davis, Albert Gu, Krzysztof Choromanski, Tri Dao, Christopher Re, Chelsea Finn, Percy Liang
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove that Catformers are less sensitive than other Transformer variants and demonstrate that this leads to more stable training. On DMLab30, a suite of high-dimension reinforcement tasks, Catformer outperforms other transformers, including Gated Transformer-XL the state-of-the-art architecture designed to address stability by 13%. |
| Researcher Affiliation | Collaboration | 1Stanford University 2Deep Mind 3Google Brain Robotics 4Columbia University. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code (e.g., specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described. |
| Open Datasets | Yes | Finally, we apply the Catformer on visual-navigation tasks from Deep Mind Lab (Beattie et al., 2016), a suite of challenging RL tasks with high-dimensional observations, complex action spaces, and partial observability. |
| Dataset Splits | No | Table 2 shows final validation losses for several models from Table 1. However, the paper does not specify the exact percentages or sample counts for train/validation/test splits, nor does it reference predefined splits with explicit citations for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Adam optimizer' but does not specify version numbers for any software components, programming languages, or libraries used in the experiments. |
| Experiment Setup | Yes | We use N = 2, 4, and 6 layer Transformer architectures. All models use pre-Layer Norm unless otherwise specified. For all baselines with constant dimension size in their layers... we use the standard Transformer architecture with d = 512 and inner dimension 4d = 2048... The Catformer model is parameter controlled with the technique described in Appendix D with ea = 2, ef = 4. Models are trained with the Adam optimizer. Table 3 and Table 4 provide specific learning rates (0.1, 0.2, 0.4) and noise std. (0.0005, 0.001, 0.002) respectively. |