Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Flowformer: Linearizing Transformers with Conservation Flows
Authors: Haixu Wu, Jialong Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To testify the effectiveness and generality of Flowformer, we extensively experiment on five well-established benchmarks, covering long sequence modeling, language processing, computer vision, time series and reinforcement learning. |
| Researcher Affiliation | Academia | 1School of Software, BNRist, Tsinghua University. Haixu Wu <EMAIL>. Correspondence to: Mingsheng Long <EMAIL>. |
| Pseudocode | Yes | We present the pseudo-code of normal Flow-Attention in Algorithm 1 and the causal version in Algorithm 2. |
| Open Source Code | Yes | The code and settings are available at this repository: https://github.com/thuml/Flowformer. |
| Open Datasets | Yes | Long-Range Arena (LRA, Tay et al. 2020c), Wiki Text-103 (Merity et al., 2017), Image Net-1K (Deng et al., 2009), UEA Time Series Classification Archive (Bagnall et al., 2018), D4RL benchmark (Fu et al., 2020) |
| Dataset Splits | No | The paper references various datasets and benchmarks, but does not explicitly provide the train/validation/test dataset splits (e.g., percentages or sample counts) within its text. |
| Hardware Specification | Yes | All the experiments are conducted on 2 NVIDIA 2080 Ti GPUs. (LRA); All the models are trained from scratch without pre-training on 4 NVIDIA TITAN RTX 24GB GPUs for 150K updates after a 6K-steps warm-up. (WikiText-103); All the experiments are conducted on 8 NVIDIA TITAN RTX 24GB GPUs for 300 epochs. (ImageNet-1K); All the experiments are conducted on one single NVIDIA TITAN RTX 24GB GPU for 100 epochs. (UEA); We repeat each experiment three times with different seeds on one single NVIDIA 2080 Ti GPU for 10 epochs. (D4RL) |
| Software Dependencies | No | The paper mentions frameworks like JAX and Fairseq, but does not list specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow, or detailed library versions) used in the experiments. |
| Experiment Setup | Yes | The model architecture consists of 6 decoder layers with 8 heads and 512 hidden channels for attention mechanism (Ott et al., 2019). (Language Modeling); We present Flowformer with 19 layers in a four-stage hierarchical structure, where the channels are in {96, 192, 384, 768} and the input sequence length for each stage is in {3136, 784, 196, 49} correspondingly. (Image Recognition); All the models are trained from scratch without pre-training on 4 NVIDIA TITAN RTX 24GB GPUs for 150K updates after a 6K-steps warm-up. (Language Modeling); We use 2 layers for Transformer-based models with 512 hidden channels and 8 heads for the attention mechanism. (Time Series); We adopt 3 layers with 256 hidden channels and 4 heads in all experiments for Flowformer and other Transformers. (RL) |